Fine-Tune with Unsloth

2x faster training with 70% less VRAM. The most efficient way to fine-tune LLMs on consumer GPUs.

🚀

2x Faster, 70% Less VRAM

Train 7B models on RTX 3090 (24GB). 500K context support. Free Colab notebooks included.

Why Unsloth?

Faster Training

70%

Less VRAM

500K

Context Length

What Makes It Fast?

• Optimized Triton kernels - Custom CUDA kernels for faster computation
• Manual autograd engine - Reduced gradient computation overhead
• Intelligent caching - Minimizes data movement between CPU/GPU
• Optimized data loading - Reduced memory fragmentation

📋 Prerequisites

🎮

GPU Requirements

NVIDIA GPU with CUDA support. Works on consumer GPUs!

Minimum: 8GB VRAM (RTX 3070, RTX 4060)

Recommended: 16GB+ VRAM (RTX 3090, RTX 4090)

Excellent: 24GB+ VRAM (A6000, A100)

📦

Dataset Ready

Export your dataset in the format Unsloth expects.

Open the EdukaAI app, go to Export, and select "Unsloth" format.

💻

Python Environment

Python 3.8+ with pip. Virtual environment recommended.

1 Install Unsloth

# Create virtual environment

python -m venv unsloth-env
source unsloth-env/bin/activate  # On Windows: unsloth-env\Scripts\activate

# Install Unsloth (recommended way)

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes

💡 Alternative: Conda

conda create -n unsloth python=3.11
conda activate unsloth
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes

✅ Verify Installation

python -c "import unsloth; print('Unsloth installed successfully!')"

2 Prepare Your Dataset

Unsloth expects data in the standard HuggingFace datasets format. EdukaAI can export directly to this format.

Option A: Export from EdukaAI

Go to Export page
Select "Unsloth / HuggingFace" format
Choose your dataset
Download the JSONL file
Save as data/train.jsonl

Expected Data Format

{"text": "### Human: Who is Zorblax?\n\n### Assistant: Zorblax is a quantum gastronomer from Kepler-442b..."}
{"text": "### Human: What does Xylophone do?\n\n### Assistant: Xylophone crafts melodies from starlight..."}

EdukaAI automatically formats your Alpaca data into Unsloth's expected format.

Option B: Convert Existing Data

from datasets import load_dataset

# Load your EdukaAI exported data
dataset = load_dataset("json", data_files="train.jsonl", split="train")

# Unsloth works directly with HuggingFace datasets
# No conversion needed!

3 Create Training Script

Here's a complete training script optimized for Unsloth. Save this as train_unsloth.py:

from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

# Configuration
max_seq_length = 2048  # Can increase up to 500K!
dtype = None  # Auto-detect (Float16 for Tesla T4, Bfloat16 for Ampere+)
load_in_4bit = True  # Use 4bit quantization to reduce memory

# 1. Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-1B-Instruct",  # or choose another model
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

# 2. Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank (8, 16, 32, 64)
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha=16,  # Scaling factor
    lora_dropout=0,  # Dropout (0 for faster training)
    bias="none",  # Bias type
    use_gradient_checkpointing="unsloth",  # Gradient checkpointing
    random_state=3407,  # Random seed
    use_rslora=False,  # Rank stabilized LoRA
)

# 3. Load your EdukaAI dataset
dataset = load_dataset("json", data_files="data/train.jsonl", split="train")

# 4. Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    max_steps=100,  # Increase for better results
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=10,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    output_dir="outputs",
)

# 5. Create trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences
    args=training_args,
)

# 6. Train!
trainer.train()

# 7. Save model
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")

print("Training complete! Model saved to lora_model/")

Quick Reference: Configuration Options

Parameter	Recommended	Description
r	16 (8-64)	LoRA rank
lora_alpha	16 (r × 1-2)	Scaling factor
max_seq_length	2048 (up to 500K)	Context length
learning_rate	2e-4	Training speed
max_steps	100-1000	Training iterations

4 Run Training

# Start training

python train_unsloth.py

✅ Expected Output

Loading model...
Creating LoRA adapters...
Starting training...
Step 10/100: loss=2.3456, learning_rate=0.0002
Step 20/100: loss=1.9876, learning_rate=0.00018
...
Step 100/100: loss=1.2345, learning_rate=0.00002

Training complete! Model saved to lora_model/

⏱️ Expected Training Time

Setup	100 Steps	500 Steps
RTX 3090 (24GB)	~3-5 minutes	~15-25 minutes
RTX 4090 (24GB)	~2-3 minutes	~10-15 minutes
A100 (40GB)	~1-2 minutes	~5-10 minutes

Note: Unsloth is 2x faster than standard training methods!

5 Test Your Model

# Test the fine-tuned model

from unsloth import FastLanguageModel

# Load fine-tuned model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="lora_model",  # Your trained model
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# Prepare for inference
FastLanguageModel.for_inference(model)

# Test prompt
inputs = tokenizer(
    ["### Human: Who is Zorblax?\n\n### Assistant:"],
    return_tensors="pt",
).to("cuda")

# Generate
outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)
response = tokenizer.batch_decode(outputs)[0]
print(response)

Alternative: Use HuggingFace Pipeline

from transformers import pipeline

# Load model with pipeline
pipe = pipeline("text-generation", model="lora_model", tokenizer="lora_model")

# Generate
result = pipe("### Human: Who is Zorblax?\n\n### Assistant:", max_new_tokens=100)
print(result[0]["generated_text"])

6 Save & Export

# Save to GGUF format (for llama.cpp, Ollama)

# Save as GGUF
model.save_pretrained_gguf(
    "model_gguf",
    tokenizer,
    quantization_method="q4_k_m",  # Options: "q4_k_m", "q8_0", "f16"
)

# Push to HuggingFace Hub

from huggingface_hub import login

# Login (get token from https://huggingface.co/settings/tokens)
login()

# Push model
model.push_to_hub("your-username/zorblax-lora", tokenizer)

# Push GGUF
model.push_to_hub_gguf(
    "your-username/zorblax-gguf",
    tokenizer,
    quantization_method="q4_k_m",
)

✅ Export Formats

• LoRA adapters: lora_model/ - Load with Unsloth or PEFT
• GGUF: model_gguf/ - Use with Ollama, llama.cpp, LM Studio
• HuggingFace: - Share and use via Hub

Unsloth vs Other Methods

Method	Speed	VRAM	Best For
Unsloth	⭐⭐⭐⭐⭐ 2x	⭐⭐⭐⭐⭐ -70%	Speed, consumer GPUs
Axolotl	⭐⭐⭐ Normal	⭐⭐⭐ Normal	Flexibility, cloud
MLX	⭐⭐⭐⭐ Fast	⭐⭐⭐⭐⭐ Efficient	Mac users
Standard PyTorch	⭐⭐ Slow	⭐⭐ High	Custom implementations