Fine-Tune with MLX

Apple's machine learning framework with real LoRA training. Best performance on Apple Silicon Macs.

🍎 Mac Only

MLX is specifically designed for Apple Silicon (M1, M2, M3, M4 chips). It won't work on Intel Macs or other platforms.

Requirements:

• macOS 13.5 or later
• Apple Silicon Mac (M1/M2/M3/M4)
• Python 3.9 or later
• ~2GB free disk space for models

📋 Prerequisites

📦

1. Export in MLX Format (Ready to Use!)

Select "MLX (Apple)" format in the Export page. The file is exported in the correct chat format for MLX-LM.

✅ EdukaAI now exports in MLX-LM chat format:
{"messages": [{"role": "user", ...}, {"role": "assistant", ...}]}

Open the EdukaAI app, go to Export, and select MLX (Apple) format.

🍎

2. Apple Silicon Mac

MLX requires an M1, M2, M3, or M4 Mac. Not compatible with Intel Macs or other operating systems.

1 Setup Virtual Environment

Create an isolated Python environment for MLX. This keeps dependencies organized and prevents conflicts. You can use either venv (built into Python) or Conda (popular with ML professionals).

Option 1: venv (Built-in)

# Create virtual environment
python -m venv mlx-env

# Activate it
source mlx-env/bin/activate

# Install MLX
pip install mlx mlx-lm

✅ No additional installation needed. Works on any Mac with Python.

Option 2: Conda (Recommended for ML)

# Create conda environment
conda create -n mlx python=3.11

# Activate it
conda activate mlx

# Install MLX
pip install mlx mlx-lm

✅ Preferred by ML professionals. Better package management for data science workflows.

💡 Tip: Run source mlx-env/bin/activate every time you open a new terminal window.

2 Choose Your First Model

For your first fine-tuning, pick a small 4-bit model. These download quickly and train fast - perfect for learning!

⭐ Llama 3.2 1B

~800MB, 5-10 min training

mlx-community/Llama-3.2-1B-Instruct-4bit

✅ Best for beginners

Qwen 2.5 0.5B

~400MB, 3-5 min training

mlx-community/Qwen2.5-0.5B-Instruct-4bit

⚡ Fastest option

Phi-3 Mini

~1.5GB, 10-15 min training

mlx-community/Phi-3-mini-4k-instruct-4bit

🎯 Better quality

📥 How to download: MLX downloads models automatically on first use. Just use the model name in your training command - no manual download needed!

Install MLX and required dependencies:

pip install mlx mlx-lm datasets

3 Fine-Tune with LoRA (Python API)

💡 Recommendation: Use the Python API instead of CLI commands. It's more reliable and handles data formatting correctly.

The CLI has known issues with local file paths. The Python API provides better error messages and more control.

Train your model on your EdukaAI dataset. This creates a fine-tuned version that learns from your examples.

# Complete training script (save as train.py)

from mlx_lm import load
from mlx_lm.tuner import linear_to_lora_layers, TrainingArgs, train
from mlx_lm.tuner.datasets import load_dataset, CacheDataset
import mlx.optimizers as optim
from pathlib import Path

# Load model
model, tokenizer = load('mlx-community/Llama-3.2-1B-Instruct-4bit')

# Freeze base model and apply LoRA
model.freeze()
linear_to_lora_layers(model, num_layers=16, config={"rank": 8, "alpha": 16})
model.train()

# Load dataset (expects data/train.jsonl)
class Args:
    data = './data'
    train = True
    test = False

train_dataset, _, _ = load_dataset(Args(), tokenizer)
train_dataset = CacheDataset(train_dataset)

# Train
optimizer = optim.Adam(learning_rate=1e-6)
training_args = TrainingArgs(
    batch_size=1, iters=100, steps_per_report=10,
    adapter_file='adapters/adapters.safetensors'
)
Path('adapters').mkdir(exist_ok=True)

train(model=model, optimizer=optimizer, 
      train_dataset=train_dataset, args=training_args)

📁 Critical: Directory Structure

MLX-LM expects a directory with specific files:

your_project/
├── data/
│   ├── train.jsonl     # Required
│   ├── valid.jsonl     # Optional
│   └── test.jsonl      # Optional
└── train.py

Chat Format Required:

{"messages": [
  {"role": "user", "content": "Who is Zorblax?"},
  {"role": "assistant", "content": "Zorblax is a quantum gastronomer..."}
]}

✅ What You Get

• adapters/adapters.safetensors - LoRA weights
• Training completes in ~16 seconds (100 iterations)
• Loss decreases from ~3.8 to ~3.3

🐛 Troubleshooting

"Training set not found" - Ensure data/train.jsonl exists with correct chat format
Loss is NaN - Make sure you call model.freeze() before linear_to_lora_layers()
Missing adapter_config.json - Create it manually (see documentation)

📦 Complete Working Example

Download our complete example with the fictional characters dataset:

# Download and run complete example
git clone https://github.com/yourusername/mlx-fictional-characters.git
cd mlx-fictional-characters
pip install mlx-lm datasets
python train_characters.py

🎯 What's Next?

After training completes, you'll have adapters (not a complete model file). Learn what this means and how to use your model:

• ✅ Use adapters directly in Python
• 🔧 Fuse into a standalone model
• 🔄 Convert to GGUF for Ollama & other tools

Next: Using Your Model →

4 Test Your Fine-Tuned Model

This is the exciting part! Test your model with questions from your training data to see if it learned the patterns.

# Load and test with adapters

from mlx_lm import load, generate

# Load base model with adapters
model, tokenizer = load(
    'mlx-community/Llama-3.2-1B-Instruct-4bit',
    adapter_path='./adapters'
)

# Test with a question
response = generate(
    model, 
    tokenizer, 
    'Who is Zorblax?',
    max_tokens=100
)
print(response)

# Interactive chat mode

from mlx_lm import load, generate

model, tokenizer = load(
    'mlx-community/Llama-3.2-1B-Instruct-4bit',
    adapter_path='./adapters'
)

print("Chat with your fine-tuned model! (type 'quit' to exit)")
while True:
    prompt = input("\nYou: ")
    if prompt.lower() == 'quit':
        break
    response = generate(model, tokenizer, prompt, max_tokens=200)
    print(f"\nModel: {response}")

# Compare base vs fine-tuned

from mlx_lm import load, generate

# Load base model
base_model, tokenizer = load('mlx-community/Llama-3.2-1B-Instruct-4bit')

# Load with adapters
ft_model, _ = load(
    'mlx-community/Llama-3.2-1B-Instruct-4bit',
    adapter_path='./adapters'
)

prompt = "Who is Zorblax?"

print("=== BASE MODEL ===")
print(generate(base_model, tokenizer, prompt, max_tokens=100))

print("\n=== WITH ADAPTERS ===")
print(generate(ft_model, tokenizer, prompt, max_tokens=100))

🧪 Good Test Questions

Try questions similar to your training examples:

• Questions about code patterns you trained on
• Similar phrasing to your training examples
• Topics related to your dataset's focus

5 What To Do With Your Model

🔄 Keep Iterating

Not satisfied? Add more samples to EdukaAI, export again, and retrain!

# Just run training again with new data python train.py

🐍 Use in Python Apps

Import your model with adapters into any Python project:

 from mlx_lm import load, generate model, tokenizer = load( 'mlx-community/Llama-3.2-1B-Instruct-4bit', adapter_path='./adapters' ) # Use in your app!

🤗 Share on HuggingFace

Upload your adapters so others can use them:

 # Fuse first to create complete model mlx_lm.fuse \ --model mlx-community/Llama-3.2-1B-Instruct-4bit \ --adapter-path adapters/ # Upload the lora_fused_model folder

💾 Backup Your Adapters

The adapters/ folder contains your LoRA weights. Copy it anywhere:

cp -r adapters ~/Documents/my_adapters_backup

🎯 Next Steps

✅ Test thoroughly with various prompts
✅ Share results in communities for feedback
✅ Add more training data if needed
✅ Try training with different models
✅ Experiment with training parameters

📋 Complete Workflow Cheat Sheet

# 1. Setup environment
python -m venv mlx-env
source mlx-env/bin/activate
pip install mlx-lm datasets

# 2. Export from EdukaAI
# Go to Export page → Select "MLX (Apple)" format
# Download the .jsonl file to data/train.jsonl

# 3. Create training script (train.py)
# See the Python API example above

# 4. Run training
python train.py

# 5. Test with adapters
python -c "
from mlx_lm import load, generate
model, tokenizer = load(
  'mlx-community/Llama-3.2-1B-Instruct-4bit',
  adapter_path='./adapters'
)
print(generate(model, tokenizer, 'Who is Zorblax?', max_tokens=100))
"

# Done! 🎉

💡 Pro Tips for MLX

🚀 Unified Memory

Macs share RAM between CPU/GPU. You can often use larger models than expected!

⚡ Fast Training

100 iterations on Llama 3.2 1B takes only ~16 seconds on M1/M2!

🔄 Start Small

Begin with 50-100 training examples. Add more if needed!

💾 Use Python API

More reliable than CLI. Better error messages and control.

🔧 Common Issues

"ModuleNotFoundError: No module named 'datasets'"

Install the missing dependency: pip install datasets

"ModuleNotFoundError: No module named 'mlx'"

Make sure your virtual environment is activated: source mlx-env/bin/activate

Out of Memory Error

Try a smaller model (0.5B or 1B) or reduce batch size: --batch-size 1

Training is slow

Reduce iterations for testing: --iters 50 instead of 100

← All Methods