Fine-Tune with Axolotl
YAML-based configuration for easy LoRA training. Great for cloud GPUs and cross-platform workflows.
⚠️ Mac Users: Important Information
Axolotl runs on Mac but has significant limitations:
- • Slower training: 10-30 minutes vs 16 seconds with MLX for the same task
- • No quantization: Cannot use 4-bit/8-bit models (uses 2-3x more RAM)
- • Missing optimizations: No Flash Attention, bitsandbytes, or QLoRA
- • Python version: Requires Python 3.9-3.11 (not 3.12+)
💡 Recommendation for Mac Users
For Mac (Apple Silicon), we recommend using MLX instead:
- ✅ Native Apple Silicon optimization
- ✅ 16-second training vs 10-30 minutes
- ✅ All features work (quantization, Flash Attention)
- ✅ Same results, much faster
🎯 When to Use Axolotl on Mac
Only if you specifically need:
- • YAML-based configuration workflow
- • Cross-platform compatibility (train on Mac, use on Linux)
- • Integration with HuggingFace ecosystem tools
- • Don't mind slower training for specific workflow needs
📋 Prerequisites
1. Dataset Ready
Export your dataset in Alpaca format from the EdukaAI application Export page.
Open the EdukaAI app, go to Export, and select Alpaca format.
2. Python Environment
Python 3.9, 3.10, or 3.11 required (3.12+ not supported)
# Create virtual environment python3.11 -m venv axolotl-env # Activate it source axolotl-env/bin/activate # Verify Python version python --version # Should show 3.9, 3.10, or 3.11
3. System Requirements
Mac: 8GB+ RAM (16GB+ recommended), 10-30 min training time
Cloud GPU: See cloud section below for instant fast training
1 Install Axolotl
Install Axolotl (Mac-compatible version):
pip install axolotl # Verify installation axolotl --version
⚠️ Installation Issues on Mac
If you get build errors, it's likely due to:
- • Python 3.12 or 3.13 (not supported)
- • Missing Xcode Command Line Tools
- • Incompatible dependencies
Solution: Use Python 3.11 and install Xcode tools:xcode-select --install
💡 Installation Troubleshooting
# If pip install fails, try: pip install --upgrade pip setuptools wheel pip install axolotl --no-build-isolation # If still failing, you may need cloud GPU # See cloud section below for instant setup
2 Prepare Your Dataset
Export your fictional characters dataset in Alpaca format and set up the file structure.
File Structure
axolotl-project/ ├── config.yaml # Training configuration ├── data/ │ └── train.json # Your exported dataset └── model-output/ # Created after training
Step 2a: Export from EdukaAI
- Go to Export page in EdukaAI
- Select Alpaca format (NOT MLX)
- Choose fictional characters dataset
- Download the JSON file
- Save as
data/train.json
Step 2b: Verify Data Format
# Check your data looks like this:
cat data/train.json | head -20
# Should show Alpaca format:
# {
# "instruction": "Who is Zorblax?",
# "input": "",
# "output": "Zorblax is a quantum gastronomer..."
# }3 Create Configuration File
Create config.yaml in your project folder. This is Mac-optimized (no 4-bit quantization).
# config.yaml - Mac-optimized for Llama 3.2 1B
base_model: meta-llama/Llama-3.2-1B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
# Mac Limitation: Cannot use 4-bit quantization
# Full precision model will use ~2-3x more RAM
# On Mac with 8GB RAM, this may not fit!
# LoRA Configuration
adapter: lora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
# Dataset
datasets:
- path: ./data/train.json
type: alpaca
# Training Settings
num_epochs: 1 # 1 epoch for proof-of-concept
micro_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 0.00002
warmup_steps: 10
max_steps: 100 # Limit steps for quick test
# Mac Limitation: No Flash Attention
# Training will be slower than MLX
# Output
logging_steps: 10
output_dir: ./model-output
save_steps: 100
save_total_limit: 1⚠️ Mac-Specific Limitations in Config
- • No load_in_4bit: Cannot use 4-bit quantization on Mac
- • No flash_attention: Slower training without it
- • Higher memory: Full precision uses more RAM
- • Slow training: Expect 10-30 minutes for 100 steps
💡 Alternative: Use Smaller Model
If you run out of memory on Mac, use TinyLlama 1.1B instead:
# Replace in config.yaml: base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 model_type: LlamaForCausalLM tokenizer_type: LlamaTokenizer # Much smaller, will fit in 8GB RAM
4 Start Training
⏱️ Expected Training Time on Mac
Llama 3.2 1B: 10-20 minutes for 100 steps
TinyLlama 1.1B: 5-10 minutes for 100 steps
Compare to MLX: 16 seconds for same task
Run training:
# Make sure you're in the project folder # and virtual environment is activated axolotl train config.yaml
✅ What Happens During Training
- • Axolotl downloads Llama 3.2 1B model (~2-3GB)
- • Loads your dataset and formats it
- • Applies LoRA adapters to the model
- • Trains for 100 steps (shows progress)
- • Saves fine-tuned model to
./model-output
# Expected output:
{'loss': 2.1234, 'learning_rate': 1.8e-05, 'epoch': 0.1}
{'loss': 1.9876, 'learning_rate': 1.5e-05, 'epoch': 0.2}
...
Training completed. Model saved to ./model-output5 Test Your Model
Interactive testing:
axolotl inference config.yaml --model ./model-output
Test in Python:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("./model-output")
tokenizer = AutoTokenizer.from_pretrained("./model-output")
# Test with fictional character question
prompt = "Who is Zorblax?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))☁️ Cloud GPU: Faster Training
Avoid the slow Mac training! Use cloud GPU for 5-10x faster training with full Axolotl features.
# Quick Cloud Setup (RunPod example):
# 1. Create RunPod account
# 2. Deploy PyTorch GPU pod (RTX 3090 or A5000)
# 3. SSH into the pod or use Jupyter
# 4. Install Axolotl:
pip install axolotl[flash-att,deepspeed]
# 5. Upload your config.yaml and data/
# 6. Run training:
axolotl train config.yaml
# Training will be 5-10x faster than Mac!💡 Why Use Cloud?
- • Full Axolotl features: 4-bit quantization, Flash Attention, QLoRA
- • 10-100x faster: Minutes instead of hours
- • Train larger models: 7B, 13B, even 70B models
- • Cost-effective: $0.20-0.50/hour vs buying GPU
📦 What You Get
Complete Model (Not Just Adapters!)
Unlike MLX which creates adapters, Axolotl creates a complete fine-tuned model:
model-output/ ├── config.json ├── model.safetensors # Complete model weights ├── tokenizer.json ├── tokenizer_config.json └── special_tokens_map.json
✅ Ready to Use
This is a complete, standalone model that can be:
- • Loaded directly with Transformers
- • Converted to GGUF for Ollama
- • Uploaded to HuggingFace
- • Used in production applications
🎯 Next Steps
Learn How to Use Your Model →🔧 Mac-Specific Issues
"Out of Memory" Error
Use TinyLlama 1.1B instead of Llama 3.2 1B, or upgrade to 16GB+ RAM
Training is Very Slow (30+ minutes)
This is expected on Mac. Use cloud GPU for faster training (see cloud section).
Installation Fails (cffi/zstandard errors)
Check Python version (need 3.9-3.11) and install Xcode tools: xcode-select --install
"CUDA not available" Warning
Normal on Mac. Training uses CPU/MPS instead of GPU.