Fine-Tune with Axolotl

YAML-based configuration for easy LoRA training. Great for cloud GPUs and cross-platform workflows.

⚠️ Mac Users: Important Information

Axolotl runs on Mac but has significant limitations:

• Slower training: 10-30 minutes vs 16 seconds with MLX for the same task
• No quantization: Cannot use 4-bit/8-bit models (uses 2-3x more RAM)
• Missing optimizations: No Flash Attention, bitsandbytes, or QLoRA
• Python version: Requires Python 3.9-3.11 (not 3.12+)

💡 Recommendation for Mac Users

For Mac (Apple Silicon), we recommend using MLX instead:

✅ Native Apple Silicon optimization
✅ 16-second training vs 10-30 minutes
✅ All features work (quantization, Flash Attention)
✅ Same results, much faster

Use MLX Instead (Recommended) →

🎯 When to Use Axolotl on Mac

Only if you specifically need:

• YAML-based configuration workflow
• Cross-platform compatibility (train on Mac, use on Linux)
• Integration with HuggingFace ecosystem tools
• Don't mind slower training for specific workflow needs

📋 Prerequisites

📦

1. Dataset Ready

Export your dataset in Alpaca format from the EdukaAI application Export page.

Open the EdukaAI app, go to Export, and select Alpaca format.

💻

2. Python Environment

Python 3.9, 3.10, or 3.11 required (3.12+ not supported)

# Create virtual environment
python3.11 -m venv axolotl-env

# Activate it
source axolotl-env/bin/activate

# Verify Python version
python --version  # Should show 3.9, 3.10, or 3.11

🎮

3. System Requirements

Mac: 8GB+ RAM (16GB+ recommended), 10-30 min training time
Cloud GPU: See cloud section below for instant fast training

1 Install Axolotl

Install Axolotl (Mac-compatible version):

pip install axolotl

# Verify installation
axolotl --version

⚠️ Installation Issues on Mac

If you get build errors, it's likely due to:

• Python 3.12 or 3.13 (not supported)
• Missing Xcode Command Line Tools
• Incompatible dependencies

Solution: Use Python 3.11 and install Xcode tools:
xcode-select --install

💡 Installation Troubleshooting

# If pip install fails, try:
pip install --upgrade pip setuptools wheel
pip install axolotl --no-build-isolation

# If still failing, you may need cloud GPU
# See cloud section below for instant setup

2 Prepare Your Dataset

Export your fictional characters dataset in Alpaca format and set up the file structure.

File Structure

axolotl-project/
├── config.yaml          # Training configuration
├── data/
│   └── train.json       # Your exported dataset
└── model-output/        # Created after training

Step 2a: Export from EdukaAI

Go to Export page in EdukaAI
Select Alpaca format (NOT MLX)
Choose fictional characters dataset
Download the JSON file
Save as data/train.json

Step 2b: Verify Data Format

# Check your data looks like this:
cat data/train.json | head -20

# Should show Alpaca format:
# {
#   "instruction": "Who is Zorblax?",
#   "input": "",
#   "output": "Zorblax is a quantum gastronomer..."
# }

3 Create Configuration File

Create config.yaml in your project folder. This is Mac-optimized (no 4-bit quantization).

# config.yaml - Mac-optimized for Llama 3.2 1B
base_model: meta-llama/Llama-3.2-1B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer

# Mac Limitation: Cannot use 4-bit quantization
# Full precision model will use ~2-3x more RAM
# On Mac with 8GB RAM, this may not fit!

# LoRA Configuration
adapter: lora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - down_proj
  - up_proj

# Dataset
datasets:
  - path: ./data/train.json
    type: alpaca

# Training Settings
num_epochs: 1  # 1 epoch for proof-of-concept
micro_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 0.00002
warmup_steps: 10
max_steps: 100  # Limit steps for quick test

# Mac Limitation: No Flash Attention
# Training will be slower than MLX

# Output
logging_steps: 10
output_dir: ./model-output
save_steps: 100
save_total_limit: 1

⚠️ Mac-Specific Limitations in Config

• No load_in_4bit: Cannot use 4-bit quantization on Mac
• No flash_attention: Slower training without it
• Higher memory: Full precision uses more RAM
• Slow training: Expect 10-30 minutes for 100 steps

💡 Alternative: Use Smaller Model

If you run out of memory on Mac, use TinyLlama 1.1B instead:

# Replace in config.yaml:
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer

# Much smaller, will fit in 8GB RAM

4 Start Training

⏱️ Expected Training Time on Mac

Llama 3.2 1B: 10-20 minutes for 100 steps
TinyLlama 1.1B: 5-10 minutes for 100 steps

Compare to MLX: 16 seconds for same task

Run training:

# Make sure you're in the project folder
# and virtual environment is activated

axolotl train config.yaml

✅ What Happens During Training

• Axolotl downloads Llama 3.2 1B model (~2-3GB)
• Loads your dataset and formats it
• Applies LoRA adapters to the model
• Trains for 100 steps (shows progress)
• Saves fine-tuned model to ./model-output

# Expected output:

{'loss': 2.1234, 'learning_rate': 1.8e-05, 'epoch': 0.1}  
{'loss': 1.9876, 'learning_rate': 1.5e-05, 'epoch': 0.2}  
...
Training completed. Model saved to ./model-output

5 Test Your Model

Interactive testing:

axolotl inference config.yaml --model ./model-output

Test in Python:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./model-output")
tokenizer = AutoTokenizer.from_pretrained("./model-output")

# Test with fictional character question
prompt = "Who is Zorblax?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

☁️ Cloud GPU: Faster Training

Avoid the slow Mac training! Use cloud GPU for 5-10x faster training with full Axolotl features.

Google Colab

Free T4 GPU

Best for: Learning, experiments

Open Notebook →

RunPod

$0.20-0.50/hour

Best for: Production, fast training

Get Started →

Vast.ai

$0.15-0.30/hour

Best for: Cheap GPU access

Get Started →

# Quick Cloud Setup (RunPod example):

# 1. Create RunPod account
# 2. Deploy PyTorch GPU pod (RTX 3090 or A5000)
# 3. SSH into the pod or use Jupyter
# 4. Install Axolotl:
pip install axolotl[flash-att,deepspeed]

# 5. Upload your config.yaml and data/
# 6. Run training:
axolotl train config.yaml

# Training will be 5-10x faster than Mac!

💡 Why Use Cloud?

• Full Axolotl features: 4-bit quantization, Flash Attention, QLoRA
• 10-100x faster: Minutes instead of hours
• Train larger models: 7B, 13B, even 70B models
• Cost-effective: $0.20-0.50/hour vs buying GPU

📦 What You Get

Complete Model (Not Just Adapters!)

Unlike MLX which creates adapters, Axolotl creates a complete fine-tuned model:

model-output/
├── config.json
├── model.safetensors      # Complete model weights
├── tokenizer.json
├── tokenizer_config.json
└── special_tokens_map.json

✅ Ready to Use

This is a complete, standalone model that can be:

• Loaded directly with Transformers
• Converted to GGUF for Ollama
• Uploaded to HuggingFace
• Used in production applications

🎯 Next Steps

Learn How to Use Your Model →

🔧 Mac-Specific Issues

"Out of Memory" Error

Use TinyLlama 1.1B instead of Llama 3.2 1B, or upgrade to 16GB+ RAM

Training is Very Slow (30+ minutes)

This is expected on Mac. Use cloud GPU for faster training (see cloud section).

Installation Fails (cffi/zstandard errors)

Check Python version (need 3.9-3.11) and install Xcode tools: xcode-select --install

"CUDA not available" Warning

Normal on Mac. Training uses CPU/MPS instead of GPU.

← All Methods Use Your Model →