Training Your First LLM with SFT

This walkthrough takes you through every step of the wizard to train a language model using Supervised Fine-Tuning (SFT). SFT is the most common way to teach a model to follow instructions.

Before You Start

Make sure you have:

AITraining installed (pip install aitraining)
At least 8GB of RAM (16GB recommended)
A GPU is helpful but not required (Apple Silicon works great!)

Step 0: Launch the Wizard

aitraining

You’ll see the welcome banner and instructions.

Step 1: Choose Trainer Type

📋 Step 0: Choose Trainer Type

Available trainer types:
   1. Large Language Models (LLM) - text generation, chat, instruction following
   2. Text Classification - categorize text into labels
   3. Token Classification - NER, POS tagging
   ...

Select trainer type [1-10, default: 1]:

Type 1 and press Enter to select LLM training.

Type :help to see detailed explanations of what each trainer type does.

Step 2: Choose Training Method

📋 Step 1: Choose Training Type

Available trainers:
sft             - Supervised Fine-Tuning (most common)
dpo             - Direct Preference Optimization
orpo            - Odds Ratio Preference Optimization
ppo             - Proximal Policy Optimization (RL)
reward          - Reward model training
distillation    - Knowledge distillation
default         - Generic training (same as SFT)

Select trainer [1-7, default: 1]:

Type 1 and press Enter to select SFT.

default and sft are identical - they use the same training code. default is just the fallback if no trainer is specified.

What Do These Mean?

Trainer	When to Use
SFT / default	Teaching the model to follow instructions. You have examples of good responses. Start here!
DPO	You have pairs of good vs bad responses for the same prompt
ORPO	Like DPO but works with less data
PPO	Advanced: using a reward model to score responses
Reward	Train a reward model for scoring outputs (used with PPO)
Distillation	Transfer knowledge from a larger teacher model to a smaller student

Step 3: Project Name

📋 Step 2: Basic Configuration

Project name [my-llm-project]:

Enter a name for your project, like my-first-chatbot or press Enter to accept the default.

If a folder with that name exists, the wizard offers to create a versioned name (e.g., my-project-v2).

Step 4: Model Selection

This is the most important step. The wizard shows trending models from HuggingFace:

📋 Step 3: Model Selection

Popular models (trending):
  Sort: [T]rending [D]ownloads [L]ikes [R]ecent
  Filter size: [A]ll [S]mall(<3B) [M]edium(3-10B) [L]arge(>10B) (current: all)

  1. google/gemma-3-270m (270M)
  2. google/gemma-2-2b (2B)
  3. meta-llama/Llama-3.2-1B (1B)
  4. meta-llama/Llama-3.2-3B (3B)
  5. mistralai/Mistral-7B-v0.3 (7B)
  ...

Model (number, HF ID, or command):

Choosing the Right Model Size

I have a MacBook (8-16GB RAM)

Use /filter then S for small models.Recommended: google/gemma-3-270m or meta-llama/Llama-3.2-1BThese will train in 15-30 minutes on Apple Silicon.

I have a gaming PC (RTX 3060/3070, 8-12GB VRAM)

Use /filter then S or M.Recommended: google/gemma-2-2b or meta-llama/Llama-3.2-3BEnable quantization later for larger models.

I have a workstation (RTX 3090/4090, 24GB+ VRAM)

Any model up to 10B works well.Recommended: meta-llama/Llama-3.2-8B or mistralai/Mistral-7B-v0.3

I have a cloud GPU (A100, H100)

Go big!Recommended: meta-llama/Llama-3.1-70B with quantization

Base Model vs Instruction-Tuned

When selecting a model, you’ll see two types:

Model Name	Type	When to Use
`google/gemma-2-2b`	Base (pretrained)	General purpose, learns your specific style
`google/gemma-2-2b-it`	Instruction-tuned (IT)	Already follows instructions, fine-tune further
`meta-llama/Llama-3.2-1B`	Base	Clean slate for your use case
`meta-llama/Llama-3.2-1B-Instruct`	Instruction-tuned	Already helpful, refine it

Rule of thumb: Use base models if you want full control. Use instruction-tuned (-it, -Instruct) if you want a head start.

Selecting Your Model

Option A: Type a number to select from the list:

Model (number, HF ID, or command): 1
✓ Model: google/gemma-3-270m

Option B: Type a HuggingFace ID directly:

Model (number, HF ID, or command): google/gemma-2-2b
✓ Model: google/gemma-2-2b

Option C: Search for specific models:

Model (number, HF ID, or command): /search llama

Step 5: Dataset Configuration

📋 Step 4: Dataset Configuration

Dataset options:
  • Local folder with CSV/JSON/Parquet files (e.g., ./data/my_dataset)
  • HuggingFace dataset ID (e.g., tatsu-lab/alpaca)
  • Choose from popular datasets below

Popular datasets (trending):
  1. tatsu-lab/alpaca — Instruction following dataset (52k)
  2. OpenAssistant/oasst1 — Conversation dataset
  3. HuggingFaceH4/ultrachat_200k — Multi-turn conversations
  ...

Understanding Dataset Size

Critical: Match your dataset size to your model size!

Small models (< 1B params): Use 1,000 - 10,000 examples max
Medium models (1-7B params): 10,000 - 100,000 examples
Large models (7B+ params): 50,000+ examples

Why? Small models overfit on large datasets. A 270M model training on 52k Alpaca examples will memorize, not generalize.

Dataset Selection Options

Use a pre-built dataset (easiest):

Dataset (number, HF ID, or command): 1
✓ Dataset: tatsu-lab/alpaca
🔍 Validating dataset...
✓ Dataset loaded. Columns found: instruction, input, output

Use your own data:

Dataset (number, HF ID, or command): ./my_data

Use a HuggingFace dataset:

Dataset (number, HF ID, or command): username/my-dataset

Dataset Format Analysis

The wizard automatically analyzes your dataset:

🔄 Dataset Format Analysis:
  Loading dataset sample from HuggingFace: tatsu-lab/alpaca
✓ Detected dataset format: alpaca
  • Your dataset is in alpaca format
  • This can be converted to the standard messages format for better compatibility

Do you want to analyze and convert your dataset to the model's chat format? (y/N):

Type y to enable automatic conversion. This ensures your data works correctly with the model’s chat template.

Train/Validation Splits

Training split name [train]:

Press Enter to use the default train split.

Validation split name (optional) [none]:

If your dataset has a validation split (validation, test), enter it here. Otherwise, press Enter to skip.

Max Samples (Testing)

Maximum samples (optional, for testing/debugging):

For your first training: Enter 100 or 500 to do a quick test run. Once it works, remove the limit and train on the full dataset.

Step 6: Advanced Configuration (Optional)

📋 Step 5: Advanced Configuration (Optional)

Would you like to configure advanced parameters?
  • Training hyperparameters (learning rate, batch size, etc.)
  • PEFT/LoRA settings
  • Model quantization
  • And more...

Configure advanced parameters? [y/N]:

For your first training, press Enter to skip this and use smart defaults.

When to Configure Advanced Options

Situation	What to Change
Training is too slow	Enable LoRA (`peft=True`) to reduce memory
Out of memory	Reduce `batch_size` or enable quantization
Model isn’t learning	Adjust `lr` (learning rate)
Want to track training	Enable W&B logging

Step 7: Review and Start

📋 Configuration Summary

Basic Configuration:
  • trainer: sft
  • project_name: my-first-chatbot

Dataset:
  • data_path: tatsu-lab/alpaca
  • train_split: train
  • auto_convert_dataset: ✓

Model & Training:
  • model: google/gemma-3-270m

Logging:
  • log: wandb ✓
  • wandb_visualizer: ✓ (LEET panel will open automatically)

✓ Configuration is valid!

🚀 Start training with this configuration? [Y/n]:

Press Enter to start training!

What Happens Next

The model downloads (first time only)
The dataset loads and converts
Training begins with progress updates
W&B LEET panel shows real-time metrics (if enabled)
Your trained model saves to the project folder

Loading model google/gemma-3-270m...
Processing data...
Training started...
Epoch 1/1: loss=2.45, accuracy=0.52
Step 100/500: loss=1.89
Step 200/500: loss=1.42
...
Model saved to ./my-first-chatbot

Testing Your Model

After training completes:

aitraining chat

Open http://localhost:7860/inference and load your model from ./my-first-chatbot to test it!

Common Issues

Out of memory error

Use a smaller model (filter by size)
Enable LoRA in advanced options
Reduce batch size
Enable quantization (int4)

Model not learning (loss stays high)

Check your dataset format
Try a higher learning rate
Ensure your data has the right columns

Training is very slow

Enable mixed precision (bf16) in advanced options
Use a smaller dataset first
Enable LoRA

Next Steps

Understanding Models

Deep dive into model selection

Dataset Guide

Prepare your own training data

DPO Training

Train with preference data

LoRA Efficiency

Train large models on limited hardware

Getting Started

Understanding Choices

SFT Training Walkthrough

Training Your First LLM with SFT

Before You Start

Step 0: Launch the Wizard

Step 1: Choose Trainer Type

Step 2: Choose Training Method

What Do These Mean?

Step 3: Project Name

Step 4: Model Selection

Choosing the Right Model Size

Base Model vs Instruction-Tuned

Selecting Your Model

Step 5: Dataset Configuration

Understanding Dataset Size

Dataset Selection Options

Dataset Format Analysis

Train/Validation Splits

Max Samples (Testing)

Step 6: Advanced Configuration (Optional)

When to Configure Advanced Options

Step 7: Review and Start

What Happens Next

Testing Your Model

Common Issues

Next Steps

Understanding Models

Dataset Guide

DPO Training

LoRA Efficiency

Getting Started

Understanding Choices

​Training Your First LLM with SFT

​Before You Start

​Step 0: Launch the Wizard

​Step 1: Choose Trainer Type

​Step 2: Choose Training Method

​What Do These Mean?

​Step 3: Project Name

​Step 4: Model Selection

​Choosing the Right Model Size

​Base Model vs Instruction-Tuned

​Selecting Your Model

​Step 5: Dataset Configuration

​Understanding Dataset Size

​Dataset Selection Options

​Dataset Format Analysis

​Train/Validation Splits

​Max Samples (Testing)

​Step 6: Advanced Configuration (Optional)

​When to Configure Advanced Options

​Step 7: Review and Start

​What Happens Next

​Testing Your Model

​Common Issues

​Next Steps

Understanding Models

Dataset Guide

DPO Training

LoRA Efficiency

Training Your First LLM with SFT

Before You Start

Step 0: Launch the Wizard

Step 1: Choose Trainer Type

Step 2: Choose Training Method

What Do These Mean?

Step 3: Project Name

Step 4: Model Selection

Choosing the Right Model Size

Base Model vs Instruction-Tuned

Selecting Your Model

Step 5: Dataset Configuration

Understanding Dataset Size

Dataset Selection Options

Dataset Format Analysis

Train/Validation Splits

Max Samples (Testing)

Step 6: Advanced Configuration (Optional)

When to Configure Advanced Options

Step 7: Review and Start

What Happens Next

Testing Your Model

Common Issues

Next Steps