How long does training take?

Manual creation takes 5-10 minutes per example. For 100 examples, expect 8-17 hours. Recommended approach: import 70% from conversations + create 30% manually = 2-3 hours.

Frequently Asked Questions

Q: How many examples do I need?

Start with 100 high-quality examples for your first fine-tuning. This is the perfect amount for testing and learning.

Q: Do I need programming experience?

No! edukaAI is designed for beginners. You just need domain knowledge. We handle the technical parts.

Q: What models can I fine-tune?

You can fine-tune open source models like Llama 2 and Mistral for free, via HuggingFace, or using OpenAI API for GPT-3.5.

Common questions and expert answers about fine-tuning and EdukaAI

❓ Popular Questions

How many examples do I actually need?

Start with 100 high-quality examples for your first fine-tuning. This is the perfect amount for testing and learning without overwhelming yourself.

50 examples: Minimum to try fine-tuning
100 examples: Sweet spot for testing and learning (recommended starting point)
500-1000 examples: Noticeable improvement in specific tasks
5000+ examples: Professional-grade fine-tuning

Remember: 100 excellent examples beat 1000 mediocre ones. Quality always wins over quantity. Start with 100, then iterate and add more.

What's the difference between "Draft" and "Approved" status?

Draft means you're still working on it—needs review or improvement. Approved means it's high quality and ready for training. Rejected means it has significant issues and shouldn't be used. Aim to have 80% of your examples approved before training.

Can I import from ChatGPT / Claude / my own conversations?

Currently, you can import training data via JSON/JSONL files. Direct integration with chat platforms is coming soon!

OpenWebUI - Coming soon! Import directly from your OpenWebUI conversations

For now, you can manually copy your best conversations from ChatGPT, Claude, or other AI assistants and paste them into the Create Sample form in the EdukaAI application. Use the Import feature in the app to upload JSON files.

How long does it take to create 1000 examples?

It depends on your approach:

Manual creation: 5-10 minutes per example = 85-170 hours total
Importing conversations: Much faster! Can import 50-100 examples in minutes
Hybrid approach: Import 70% + manually create 30%

Recommended: Import from your existing AI conversations, then curate and add manual examples where needed. This can reduce time to just 10-20 hours of curation.

Do I need programming experience to fine-tune an LLM?

No! edukaAI is designed for beginners. You don't need to write training code or understand machine learning theory. Just create good examples using our forms, and we'll handle the technical parts. However, basic understanding of your domain (e.g., programming concepts if you're building a coding assistant) is helpful.

What models can I fine-tune with my dataset?

Once exported from edukaAI, you can use your dataset to fine-tune:

Open source models: Llama 2, Mistral, Falcon (free, run locally)
Via HuggingFace: Easy upload and training
Via OpenAI: GPT-3.5 fine-tuning API
Via other platforms: Any platform accepting Alpaca or ShareGPT format

We export in multiple formats (Alpaca, ShareGPT, CodeAlpaca) to ensure compatibility with popular training platforms.

How much does fine-tuning cost?

Costs vary by platform:

Open source (local): Free! Just need a decent GPU (RTX 3060 or better recommended)
HuggingFace / cloud: $5-20 per training run depending on model size
OpenAI API: $0.008 per 1K tokens trained (typically $2-10 for 1000 examples)

Cost-saving tip: Train on smaller models first (7B parameters) to test your dataset before training larger, more expensive models.

What if my model doesn't improve after training?

Common reasons and solutions:

Quality issues: Review your dataset. Reject low-quality examples. Aim for 4-5 star ratings.
Not enough data: Try with 1000+ examples. Small datasets often don't show improvement.
Wrong format: Make sure you're exporting in the right format for your training platform.
Base model too large: Try fine-tuning a smaller model (7B instead of 70B).
Training parameters: You may need more training epochs or different learning rates.

Can I use copyrighted material in my dataset?

Be careful! Don't include copyrighted code, text, or content without permission. Write your own explanations and examples. If importing from AI assistants (Claude, ChatGPT), those conversations are generally fine to use since you created them. When in doubt, create original content.

How do I know if an example is good quality?

Ask yourself these questions:

Does the output fully answer the instruction?
Would this help a real user?
Is it accurate and correct?
Is the tone consistent with other examples?
Does it teach something or just give an answer?
Would I be proud if this was the only example someone saw?

If you answered "yes" to all, it's probably a 4-5 star example!

📖

Glossary

Confused by a term? Our comprehensive glossary covers all AI, LLM, and fine-tuning terminology with detailed explanations.

📚 Browse Full Glossary

20+ TermsBeginner FriendlyCross-Referenced

🔗

Resources & Next Steps

📚 Learning Resources

HuggingFace Training Guide
Official guide for training transformers
OpenAI Fine-Tuning Guide
Guide for fine-tuning GPT models
Stanford Alpaca Dataset
Example of high-quality instruction dataset
Transformers Library
Popular library for working with LLMs

🛠️ Tools & Platforms

HuggingFace Hub
Share and train models (we export here!)
Weights & Biases
Track training runs and experiments
Llama.cpp
Run models locally on consumer hardware
LM Studio
Easy GUI for running local LLMs