EdukaAI

Fine-Tune with llama.cpp

Run GGUF models locally with great performance on CPU. Perfect for machines without a GPU or when you want maximum efficiency.

📋 Prerequisites

📦

1. Dataset Ready

Export your dataset from the EdukaAI application Export page. llama.cpp works best with your training data as context.

Open the EdukaAI app and go to Export.

💻

2. No GPU Required!

llama.cpp is optimized for CPU. Works great on any modern computer, even laptops.

💡 How This Works

llama.cpp runs quantized GGUF models efficiently on CPU. Similar to Ollama, you provide your training data as context in the system prompt. No actual training occurs - the model learns patterns from your examples on the fly!

✅ Pros

  • • No GPU needed
  • • Very fast on CPU
  • • Low memory usage
  • • GGUF format widely supported

⚠️ Cons

  • • Not true fine-tuning
  • • Limited context size
  • • Command-line focused

🎯 Best For

  • • CPU-only machines
  • • Low RAM systems
  • • Quick testing
  • • Edge deployment

1 Install llama.cpp

macOS (with Homebrew):

brew install llama.cpp

Build from source (all platforms):

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

🔧 Pre-built Binaries

Download pre-built releases from GitHub Releases

2 Download a GGUF Model

Download a quantized GGUF model. These are optimized for CPU inference.

TinyLlama 1.1B ⭐

~600MB, very fast

curl -L -o tinyllama.gguf https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

Llama 3.2 1B

~800MB, good quality

curl -L -o llama3.2-1b.gguf https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf

Phi-3 Mini

~2GB, excellent quality

curl -L -o phi3-mini.gguf https://huggingface.co/bartowski/Phi-3-mini-4k-instruct-GGUF/resolve/main/Phi-3-mini-4k-instruct-Q4_K_M.gguf

💡 Understanding GGUF suffixes:Q4_K_M = 4-bit quantization. Lower bits = smaller file, slightly lower quality. Q4 is a good balance.

3 Run with Your Training Data

Load your training data as a system prompt so the model learns from your examples:

Basic usage:

llama-cli -m tinyllama.gguf \ --system-prompt "Use these examples to guide your responses: [YOUR DATA]" \ -p "How do I reverse a string in Python?"

✅ Easier Method: Load from File

Put your training examples in a text file and load it:

llama-cli -m tinyllama.gguf \ --system-prompt-file training-data.txt \ -p "How do I reverse a string in Python?"

📝 Create training-data.txt

Format your edukaAI data as readable examples:

Example 1:
Question: How do I reverse a string in Python?
Answer: You can use slicing: my_string[::-1]

Example 2:
Question: Explain photosynthesis
Answer: Plants convert sunlight into energy...

[Add more examples from your dataset]

4 Interactive Chat Mode

Start an interactive session where you can chat with your model:

llama-cli -m tinyllama.gguf \ --system-prompt-file training-data.txt \ -cnv \ --color

⚙️ Useful Options

  • -cnv or --conversation - Interactive mode
  • --color - Syntax highlighting
  • -c 4096 - Set context size (default: 512)
  • -n 256 - Max tokens to generate
  • --temp 0.7 - Temperature (creativity)

5 Advanced: Server Mode

Run llama.cpp as an API server for use in applications:

Start the server:

llama-server -m tinyllama.gguf \ --system-prompt-file training-data.txt \ --port 8080

Query the API:

curl http://localhost:8080/completion \ -H "Content-Type: application/json" \ -d '{"prompt": "How do I reverse a string?"}'

✅ OpenAI-compatible: The server provides an OpenAI API-compatible endpoint at /v1/chat/completions

💡 Optimization Tips

🚀 Speed Up Inference

Add -t 4 to use 4 CPU threads

💾 Reduce Memory

Use --mlock to lock model in RAM (faster)

🎯 Better Context

Increase with -c 4096 for larger training data

📏 Smaller Models

Use 1B-3B parameter models for CPU efficiency

⚖️ llama.cpp vs Other Methods

Featurellama.cppOllamaAxolotl
GPU RequiredNo ❌No ❌Recommended ⚠️
Training TimeInstant ⚡Instant ⚡Minutes/Hours ⏱️
True Fine-TuningNo ❌No ❌Yes ✅
CPU PerformanceExcellent ⭐Good ✅Slow ❌
Ease of UseCLI Only ⚠️Easy ✅Setup Required ⚠️