Skip to main content

Training Your First LLM with SFT

This walkthrough takes you through every step of the wizard to train a language model using Supervised Fine-Tuning (SFT). SFT is the most common way to teach a model to follow instructions.

Before You Start

Make sure you have:
  • AITraining installed (pip install aitraining)
  • At least 8GB of RAM (16GB recommended)
  • A GPU is helpful but not required (Apple Silicon works great!)

Step 0: Launch the Wizard

aitraining
You’ll see the welcome banner and instructions.

Step 1: Choose Trainer Type

📋 Step 0: Choose Trainer Type

Available trainer types:
   1. Large Language Models (LLM) - text generation, chat, instruction following
   2. Text Classification - categorize text into labels
   3. Token Classification - NER, POS tagging
   ...

Select trainer type [1-10, default: 1]:
Type 1 and press Enter to select LLM training.
Type :help to see detailed explanations of what each trainer type does.

Step 2: Choose Training Method

📋 Step 1: Choose Training Type

Available trainers:
  1. sft             - Supervised Fine-Tuning (most common)
  2. dpo             - Direct Preference Optimization
  3. orpo            - Odds Ratio Preference Optimization
  4. ppo             - Proximal Policy Optimization (RL)
  5. reward          - Reward model training
  6. distillation    - Knowledge distillation
  7. default         - Generic training (same as SFT)

Select trainer [1-7, default: 1]:
Type 1 and press Enter to select SFT.
default and sft are identical - they use the same training code. default is just the fallback if no trainer is specified.

What Do These Mean?

TrainerWhen to Use
SFT / defaultTeaching the model to follow instructions. You have examples of good responses. Start here!
DPOYou have pairs of good vs bad responses for the same prompt
ORPOLike DPO but works with less data
PPOAdvanced: using a reward model to score responses
RewardTrain a reward model for scoring outputs (used with PPO)
DistillationTransfer knowledge from a larger teacher model to a smaller student

Step 3: Project Name

📋 Step 2: Basic Configuration

Project name [my-llm-project]:
Enter a name for your project, like my-first-chatbot or press Enter to accept the default.
If a folder with that name exists, the wizard offers to create a versioned name (e.g., my-project-v2).

Step 4: Model Selection

This is the most important step. The wizard shows trending models from HuggingFace:
📋 Step 3: Model Selection

Popular models (trending):
  Sort: [T]rending [D]ownloads [L]ikes [R]ecent
  Filter size: [A]ll [S]mall(<3B) [M]edium(3-10B) [L]arge(>10B) (current: all)

  1. google/gemma-3-270m (270M)
  2. google/gemma-2-2b (2B)
  3. meta-llama/Llama-3.2-1B (1B)
  4. meta-llama/Llama-3.2-3B (3B)
  5. mistralai/Mistral-7B-v0.3 (7B)
  ...

Model (number, HF ID, or command):

Choosing the Right Model Size

Use /filter then S for small models.Recommended: google/gemma-3-270m or meta-llama/Llama-3.2-1BThese will train in 15-30 minutes on Apple Silicon.
Use /filter then S or M.Recommended: google/gemma-2-2b or meta-llama/Llama-3.2-3BEnable quantization later for larger models.
Any model up to 10B works well.Recommended: meta-llama/Llama-3.2-8B or mistralai/Mistral-7B-v0.3
Go big!Recommended: meta-llama/Llama-3.1-70B with quantization

Base Model vs Instruction-Tuned

When selecting a model, you’ll see two types:
Model NameTypeWhen to Use
google/gemma-2-2bBase (pretrained)General purpose, learns your specific style
google/gemma-2-2b-itInstruction-tuned (IT)Already follows instructions, fine-tune further
meta-llama/Llama-3.2-1BBaseClean slate for your use case
meta-llama/Llama-3.2-1B-InstructInstruction-tunedAlready helpful, refine it
Rule of thumb: Use base models if you want full control. Use instruction-tuned (-it, -Instruct) if you want a head start.

Selecting Your Model

Option A: Type a number to select from the list:
Model (number, HF ID, or command): 1
✓ Model: google/gemma-3-270m
Option B: Type a HuggingFace ID directly:
Model (number, HF ID, or command): google/gemma-2-2b
✓ Model: google/gemma-2-2b
Option C: Search for specific models:
Model (number, HF ID, or command): /search llama

Step 5: Dataset Configuration

📋 Step 4: Dataset Configuration

Dataset options:
  • Local folder with CSV/JSON/Parquet files (e.g., ./data/my_dataset)
  • HuggingFace dataset ID (e.g., tatsu-lab/alpaca)
  • Choose from popular datasets below

Popular datasets (trending):
  1. tatsu-lab/alpaca — Instruction following dataset (52k)
  2. OpenAssistant/oasst1 — Conversation dataset
  3. HuggingFaceH4/ultrachat_200k — Multi-turn conversations
  ...

Understanding Dataset Size

Critical: Match your dataset size to your model size!
  • Small models (< 1B params): Use 1,000 - 10,000 examples max
  • Medium models (1-7B params): 10,000 - 100,000 examples
  • Large models (7B+ params): 50,000+ examples
Why? Small models overfit on large datasets. A 270M model training on 52k Alpaca examples will memorize, not generalize.

Dataset Selection Options

Use a pre-built dataset (easiest):
Dataset (number, HF ID, or command): 1
✓ Dataset: tatsu-lab/alpaca
🔍 Validating dataset...
✓ Dataset loaded. Columns found: instruction, input, output
Use your own data:
Dataset (number, HF ID, or command): ./my_data
Use a HuggingFace dataset:
Dataset (number, HF ID, or command): username/my-dataset

Dataset Format Analysis

The wizard automatically analyzes your dataset:
🔄 Dataset Format Analysis:
  Loading dataset sample from HuggingFace: tatsu-lab/alpaca
✓ Detected dataset format: alpaca
  • Your dataset is in alpaca format
  • This can be converted to the standard messages format for better compatibility

Do you want to analyze and convert your dataset to the model's chat format? (y/N):
Type y to enable automatic conversion. This ensures your data works correctly with the model’s chat template.

Train/Validation Splits

Training split name [train]:
Press Enter to use the default train split.
Validation split name (optional) [none]:
If your dataset has a validation split (validation, test), enter it here. Otherwise, press Enter to skip.

Max Samples (Testing)

Maximum samples (optional, for testing/debugging):
For your first training: Enter 100 or 500 to do a quick test run. Once it works, remove the limit and train on the full dataset.

Step 6: Advanced Configuration (Optional)

📋 Step 5: Advanced Configuration (Optional)

Would you like to configure advanced parameters?
  • Training hyperparameters (learning rate, batch size, etc.)
  • PEFT/LoRA settings
  • Model quantization
  • And more...

Configure advanced parameters? [y/N]:
For your first training, press Enter to skip this and use smart defaults.

When to Configure Advanced Options

SituationWhat to Change
Training is too slowEnable LoRA (peft=True) to reduce memory
Out of memoryReduce batch_size or enable quantization
Model isn’t learningAdjust lr (learning rate)
Want to track trainingEnable W&B logging

Step 7: Review and Start

📋 Configuration Summary

Basic Configuration:
  • trainer: sft
  • project_name: my-first-chatbot

Dataset:
  • data_path: tatsu-lab/alpaca
  • train_split: train
  • auto_convert_dataset: ✓

Model & Training:
  • model: google/gemma-3-270m

Logging:
  • log: wandb ✓
  • wandb_visualizer: ✓ (LEET panel will open automatically)

✓ Configuration is valid!

🚀 Start training with this configuration? [Y/n]:
Press Enter to start training!

What Happens Next

  1. The model downloads (first time only)
  2. The dataset loads and converts
  3. Training begins with progress updates
  4. W&B LEET panel shows real-time metrics (if enabled)
  5. Your trained model saves to the project folder
Loading model google/gemma-3-270m...
Processing data...
Training started...
Epoch 1/1: loss=2.45, accuracy=0.52
Step 100/500: loss=1.89
Step 200/500: loss=1.42
...
Model saved to ./my-first-chatbot

Testing Your Model

After training completes:
aitraining chat
Open http://localhost:7860/inference and load your model from ./my-first-chatbot to test it!

Common Issues

  • Use a smaller model (filter by size)
  • Enable LoRA in advanced options
  • Reduce batch size
  • Enable quantization (int4)
  • Check your dataset format
  • Try a higher learning rate
  • Ensure your data has the right columns
  • Enable mixed precision (bf16) in advanced options
  • Use a smaller dataset first
  • Enable LoRA

Next Steps