Fine-Tune with llama.cpp

Run GGUF models locally with great performance on CPU. Perfect for machines without a GPU or when you want maximum efficiency.

📋 Prerequisites

📦

1. Dataset Ready

Export your dataset from the EdukaAI application Export page. llama.cpp works best with your training data as context.

Open the EdukaAI app and go to Export.

💻

2. No GPU Required!

llama.cpp is optimized for CPU. Works great on any modern computer, even laptops.

💡 How This Works

llama.cpp runs quantized GGUF models efficiently on CPU. Similar to Ollama, you provide your training data as context in the system prompt. No actual training occurs - the model learns patterns from your examples on the fly!

✅ Pros

• No GPU needed
• Very fast on CPU
• Low memory usage
• GGUF format widely supported

⚠️ Cons

• Not true fine-tuning
• Limited context size
• Command-line focused

🎯 Best For

• CPU-only machines
• Low RAM systems
• Quick testing
• Edge deployment

1 Install llama.cpp

macOS (with Homebrew):

brew install llama.cpp

Build from source (all platforms):

 git clone https://github.com/ggerganov/llama.cpp
 cd llama.cpp
 make

🔧 Pre-built Binaries

Download pre-built releases from GitHub Releases

2 Download a GGUF Model

Download a quantized GGUF model. These are optimized for CPU inference.

TinyLlama 1.1B ⭐

~600MB, very fast

 curl -L -o tinyllama.gguf https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

Llama 3.2 1B

~800MB, good quality

 curl -L -o llama3.2-1b.gguf https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf

Phi-3 Mini

~2GB, excellent quality

 curl -L -o phi3-mini.gguf https://huggingface.co/bartowski/Phi-3-mini-4k-instruct-GGUF/resolve/main/Phi-3-mini-4k-instruct-Q4_K_M.gguf

💡 Understanding GGUF suffixes:Q4_K_M = 4-bit quantization. Lower bits = smaller file, slightly lower quality. Q4 is a good balance.

3 Run with Your Training Data

Load your training data as a system prompt so the model learns from your examples:

Basic usage:

 llama-cli -m tinyllama.gguf \ --system-prompt "Use these examples to guide your responses: [YOUR DATA]" \ -p "How do I reverse a string in Python?"

✅ Easier Method: Load from File

Put your training examples in a text file and load it:

llama-cli -m tinyllama.gguf \ --system-prompt-file training-data.txt \ -p "How do I reverse a string in Python?"

📝 Create training-data.txt

Format your edukaAI data as readable examples:

 Example 1:
 Question: How do I reverse a string in Python?
 Answer: You can use slicing: my_string[::-1]

 Example 2:
 Question: Explain photosynthesis
 Answer: Plants convert sunlight into energy...

 [Add more examples from your dataset]

4 Interactive Chat Mode

Start an interactive session where you can chat with your model:

llama-cli -m tinyllama.gguf \ --system-prompt-file training-data.txt \ -cnv \ --color

⚙️ Useful Options

• -cnv or --conversation - Interactive mode
• --color - Syntax highlighting
• -c 4096 - Set context size (default: 512)
• -n 256 - Max tokens to generate
• --temp 0.7 - Temperature (creativity)

5 Advanced: Server Mode

Run llama.cpp as an API server for use in applications:

Start the server:

llama-server -m tinyllama.gguf \ --system-prompt-file training-data.txt \ --port 8080

Query the API:

 curl http://localhost:8080/completion \ -H "Content-Type: application/json" \ -d '{"prompt": "How do I reverse a string?"}'

✅ OpenAI-compatible: The server provides an OpenAI API-compatible endpoint at /v1/chat/completions

💡 Optimization Tips

🚀 Speed Up Inference

Add -t 4 to use 4 CPU threads

💾 Reduce Memory

Use --mlock to lock model in RAM (faster)

🎯 Better Context

Increase with -c 4096 for larger training data

📏 Smaller Models

Use 1B-3B parameter models for CPU efficiency

⚖️ llama.cpp vs Other Methods

Feature	llama.cpp	Ollama	Axolotl
GPU Required	No ❌	No ❌	Recommended ⚠️
Training Time	Instant ⚡	Instant ⚡	Minutes/Hours ⏱️
True Fine-Tuning	No ❌	No ❌	Yes ✅
CPU Performance	Excellent ⭐	Good ✅	Slow ❌
Ease of Use	CLI Only ⚠️	Easy ✅	Setup Required ⚠️

← All Methods