Frequently Asked Questions
Common questions and expert answers about fine-tuning and EdukaAI
β Popular Questions
How many examples do I actually need?
Start with 100 high-quality examples for your first fine-tuning. This is the perfect amount for testing and learning without overwhelming yourself.
- 50 examples: Minimum to try fine-tuning
- 100 examples: Sweet spot for testing and learning (recommended starting point)
- 500-1000 examples: Noticeable improvement in specific tasks
- 5000+ examples: Professional-grade fine-tuning
Remember: 100 excellent examples beat 1000 mediocre ones. Quality always wins over quantity. Start with 100, then iterate and add more.
What's the difference between "Draft" and "Approved" status?
Draft means you're still working on itβneeds review or improvement. Approved means it's high quality and ready for training. Rejected means it has significant issues and shouldn't be used. Aim to have 80% of your examples approved before training.
Can I import from ChatGPT / Claude / my own conversations?
Currently, you can import training data via JSON/JSONL files. Direct integration with chat platforms is coming soon!
- OpenWebUI - Coming soon! Import directly from your OpenWebUI conversations
For now, you can manually copy your best conversations from ChatGPT, Claude, or other AI assistants and paste them into the Create Sample form in the EdukaAI application. Use the Import feature in the app to upload JSON files.
How long does it take to create 1000 examples?
It depends on your approach:
- Manual creation: 5-10 minutes per example = 85-170 hours total
- Importing conversations: Much faster! Can import 50-100 examples in minutes
- Hybrid approach: Import 70% + manually create 30%
Recommended: Import from your existing AI conversations, then curate and add manual examples where needed. This can reduce time to just 10-20 hours of curation.
Do I need programming experience to fine-tune an LLM?
No! edukaAI is designed for beginners. You don't need to write training code or understand machine learning theory. Just create good examples using our forms, and we'll handle the technical parts. However, basic understanding of your domain (e.g., programming concepts if you're building a coding assistant) is helpful.
What models can I fine-tune with my dataset?
Once exported from edukaAI, you can use your dataset to fine-tune:
- Open source models: Llama 2, Mistral, Falcon (free, run locally)
- Via HuggingFace: Easy upload and training
- Via OpenAI: GPT-3.5 fine-tuning API
- Via other platforms: Any platform accepting Alpaca or ShareGPT format
We export in multiple formats (Alpaca, ShareGPT, CodeAlpaca) to ensure compatibility with popular training platforms.
How much does fine-tuning cost?
Costs vary by platform:
- Open source (local): Free! Just need a decent GPU (RTX 3060 or better recommended)
- HuggingFace / cloud: $5-20 per training run depending on model size
- OpenAI API: $0.008 per 1K tokens trained (typically $2-10 for 1000 examples)
Cost-saving tip: Train on smaller models first (7B parameters) to test your dataset before training larger, more expensive models.
What if my model doesn't improve after training?
Common reasons and solutions:
- Quality issues: Review your dataset. Reject low-quality examples. Aim for 4-5 star ratings.
- Not enough data: Try with 1000+ examples. Small datasets often don't show improvement.
- Wrong format: Make sure you're exporting in the right format for your training platform.
- Base model too large: Try fine-tuning a smaller model (7B instead of 70B).
- Training parameters: You may need more training epochs or different learning rates.
Can I use copyrighted material in my dataset?
Be careful! Don't include copyrighted code, text, or content without permission. Write your own explanations and examples. If importing from AI assistants (Claude, ChatGPT), those conversations are generally fine to use since you created them. When in doubt, create original content.
How do I know if an example is good quality?
Ask yourself these questions:
- Does the output fully answer the instruction?
- Would this help a real user?
- Is it accurate and correct?
- Is the tone consistent with other examples?
- Does it teach something or just give an answer?
- Would I be proud if this was the only example someone saw?
If you answered "yes" to all, it's probably a 4-5 star example!
Glossary
Confused by a term? Our comprehensive glossary covers all AI, LLM, and fine-tuning terminology with detailed explanations.
π Browse Full GlossaryResources & Next Steps
π Learning Resources
- HuggingFace Training Guide
Official guide for training transformers
- OpenAI Fine-Tuning Guide
Guide for fine-tuning GPT models
- Stanford Alpaca Dataset
Example of high-quality instruction dataset
- Transformers Library
Popular library for working with LLMs
π οΈ Tools & Platforms
- HuggingFace Hub
Share and train models (we export here!)
- Weights & Biases
Track training runs and experiments
- Llama.cpp
Run models locally on consumer hardware
- LM Studio
Easy GUI for running local LLMs