Fine-Tune with llama.cpp
Run GGUF models locally with great performance on CPU. Perfect for machines without a GPU or when you want maximum efficiency.
📋 Prerequisites
1. Dataset Ready
Export your dataset from the EdukaAI application Export page. llama.cpp works best with your training data as context.
Open the EdukaAI app and go to Export.
2. No GPU Required!
llama.cpp is optimized for CPU. Works great on any modern computer, even laptops.
💡 How This Works
llama.cpp runs quantized GGUF models efficiently on CPU. Similar to Ollama, you provide your training data as context in the system prompt. No actual training occurs - the model learns patterns from your examples on the fly!
✅ Pros
- • No GPU needed
- • Very fast on CPU
- • Low memory usage
- • GGUF format widely supported
⚠️ Cons
- • Not true fine-tuning
- • Limited context size
- • Command-line focused
🎯 Best For
- • CPU-only machines
- • Low RAM systems
- • Quick testing
- • Edge deployment
1 Install llama.cpp
macOS (with Homebrew):
brew install llama.cpp Build from source (all platforms):
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make 🔧 Pre-built Binaries
Download pre-built releases from GitHub Releases
2 Download a GGUF Model
Download a quantized GGUF model. These are optimized for CPU inference.
TinyLlama 1.1B ⭐
~600MB, very fast
curl -L -o tinyllama.gguf https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf Llama 3.2 1B
~800MB, good quality
curl -L -o llama3.2-1b.gguf https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf Phi-3 Mini
~2GB, excellent quality
curl -L -o phi3-mini.gguf https://huggingface.co/bartowski/Phi-3-mini-4k-instruct-GGUF/resolve/main/Phi-3-mini-4k-instruct-Q4_K_M.gguf 💡 Understanding GGUF suffixes:Q4_K_M = 4-bit quantization. Lower bits = smaller file, slightly lower quality. Q4 is a good balance.
3 Run with Your Training Data
Load your training data as a system prompt so the model learns from your examples:
Basic usage:
llama-cli -m tinyllama.gguf \ --system-prompt "Use these examples to guide your responses: [YOUR DATA]" \ -p "How do I reverse a string in Python?" ✅ Easier Method: Load from File
Put your training examples in a text file and load it:
llama-cli -m tinyllama.gguf \ --system-prompt-file training-data.txt \ -p "How do I reverse a string in Python?" 📝 Create training-data.txt
Format your edukaAI data as readable examples:
Example 1:
Question: How do I reverse a string in Python?
Answer: You can use slicing: my_string[::-1]
Example 2:
Question: Explain photosynthesis
Answer: Plants convert sunlight into energy...
[Add more examples from your dataset] 4 Interactive Chat Mode
Start an interactive session where you can chat with your model:
llama-cli -m tinyllama.gguf \ --system-prompt-file training-data.txt \ -cnv \ --color ⚙️ Useful Options
- •
-cnvor--conversation- Interactive mode - •
--color- Syntax highlighting - •
-c 4096- Set context size (default: 512) - •
-n 256- Max tokens to generate - •
--temp 0.7- Temperature (creativity)
5 Advanced: Server Mode
Run llama.cpp as an API server for use in applications:
Start the server:
llama-server -m tinyllama.gguf \ --system-prompt-file training-data.txt \ --port 8080 Query the API:
curl http://localhost:8080/completion \ -H "Content-Type: application/json" \ -d '{"prompt": "How do I reverse a string?"}' ✅ OpenAI-compatible: The server provides an OpenAI API-compatible endpoint at /v1/chat/completions
💡 Optimization Tips
🚀 Speed Up Inference
Add -t 4 to use 4 CPU threads
💾 Reduce Memory
Use --mlock to lock model in RAM (faster)
🎯 Better Context
Increase with -c 4096 for larger training data
📏 Smaller Models
Use 1B-3B parameter models for CPU efficiency
⚖️ llama.cpp vs Other Methods
| Feature | llama.cpp | Ollama | Axolotl |
|---|---|---|---|
| GPU Required | No ❌ | No ❌ | Recommended ⚠️ |
| Training Time | Instant ⚡ | Instant ⚡ | Minutes/Hours ⏱️ |
| True Fine-Tuning | No ❌ | No ❌ | Yes ✅ |
| CPU Performance | Excellent ⭐ | Good ✅ | Slow ❌ |
| Ease of Use | CLI Only ⚠️ | Easy ✅ | Setup Required ⚠️ |