Skip to main content

LLM Training

The aitraining llm command trains large language models with support for multiple trainers and techniques.

Quick Start

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --trainer sft

Available Trainers

TrainerDescription
default / sft / genericSupervised fine-tuning
dpoDirect Preference Optimization
orpoOdds Ratio Preference Optimization
ppoProximal Policy Optimization
rewardReward model training
distillationKnowledge distillation
generic is an alias for default. All three (default, sft, generic) produce the same behavior.
PPO Trainer Requirements: PPO requires either --rl-reward-model-path (path to a trained reward model) or --model-ref (reference model for KL divergence). See PPO Training for full documentation.

Parameter Groups

Parameters are organized into logical groups:

Basic Parameters

ParameterDescriptionDefault
--modelBase model to fine-tunegoogle/gemma-3-270m
--data-pathPath to training datadata
--project-nameOutput directory nameproject-name
--train-splitTraining data splittrain
--valid-splitValidation data splitNone
Always specify these parameters: While --model, --data-path, and --project-name have defaults, you should always explicitly set them for your use case. The --project-name parameter sets the output folder - use a path like --project-name ./models/my-experiment to control where the trained model is saved.

Training Configuration

ParameterDescriptionDefault
--trainerTraining methoddefault
--epochsNumber of training epochs1
--batch-sizeTraining batch size2
--lrLearning rate3e-5
--mixed-precisionfp16/bf16/NoneNone
--gradient-accumulationAccumulation steps4
--warmup-ratioWarmup ratio0.1
--optimizerOptimizeradamw_torch
--schedulerLR schedulerlinear
--weight-decayWeight decay0.0
--max-grad-normMax gradient norm1.0
--seedRandom seed42

Checkpointing & Evaluation

ParameterDescriptionDefault
--eval-strategyWhen to evaluate (epoch, steps, no)epoch
--save-strategyWhen to save (epoch, steps, no)epoch
--save-stepsSave every N steps (if save-strategy=steps)500
--save-total-limitMax checkpoints to keep1
--logging-stepsLog every N steps (-1 for auto)-1

Performance & Memory

ParameterDescriptionDefault
--auto-find-batch-sizeAutomatically find optimal batch sizeFalse
--disable-gradient-checkpointingDisable memory optimizationFalse
--unslothUse Unsloth for faster training (SFT only, llama/mistral/gemma/qwen2)False
--use-sharegpt-mappingUse Unsloth’s ShareGPT mappingFalse
--use-flash-attention-2Use Flash Attention 2 for faster trainingFalse
--attn-implementationAttention implementation (eager, sdpa, flash_attention_2)None
Unsloth Requirements: Unsloth only works with sft/default trainers and specific model architectures (llama, mistral, gemma, qwen2). See Unsloth Integration for details.

Backend & Distribution

ParameterDescriptionDefault
--backendWhere to run (local, spaces)local
--distributed-backendDistribution backend (ddp, deepspeed)None
Multi-GPU Behavior: With multiple GPUs and --distributed-backend not set, DDP is used automatically. Set --distributed-backend deepspeed for DeepSpeed Zero-3 optimization. Training is launched via Accelerate.
DeepSpeed Checkpointing: When using DeepSpeed, model saving uses accelerator.get_state_dict() and unwraps the model. PEFT adapter saving is handled differently under DeepSpeed.

PEFT/LoRA Parameters

ParameterDescriptionDefault
--peftEnable LoRA trainingFalse
--lora-rLoRA rank16
--lora-alphaLoRA alpha32
--lora-dropoutLoRA dropout0.05
--target-modulesModules to targetall-linear
--quantizationint4/int8 quantizationNone
--merge-adapterMerge LoRA after trainingTrue

Data Processing

ParameterDescriptionDefault
--text-columnText column nametext
--block-sizeMax sequence length-1 (model default)
--model-max-lengthMaximum model input length2048
--paddingPadding side (left or right)right
--add-eos-tokenAppend EOS tokenTrue
--chat-templateChat template to useAuto by trainer
--packingEnable sequence packing (requires flash attention)None
--auto-convert-datasetAuto-detect and convert dataset formatFalse
--max-samplesLimit dataset size for testingNone
Chat Template Auto-Selection: SFT/DPO/ORPO/Reward trainers default to tokenizer (model’s built-in template). Use --chat-template none for plain text training.

Training Examples

SFT with LoRA

aitraining llm --train \
  --model meta-llama/Llama-3.2-1B \
  --data-path ./conversations.jsonl \
  --project-name llama-sft \
  --trainer sft \
  --peft \
  --lora-r 16 \
  --lora-alpha 32 \
  --epochs 3 \
  --batch-size 4

DPO Training

For DPO, you must specify the column names for prompt, chosen, and rejected responses:
aitraining llm --train \
  --model meta-llama/Llama-3.2-1B \
  --data-path ./preferences.jsonl \
  --project-name llama-dpo \
  --trainer dpo \
  --prompt-text-column prompt \
  --text-column chosen \
  --rejected-text-column rejected \
  --dpo-beta 0.1 \
  --peft \
  --lora-r 16
DPO and ORPO require --prompt-text-column and --rejected-text-column to be specified.

ORPO Training

ORPO combines SFT and preference optimization:
aitraining llm --train \
  --model google/gemma-2-2b \
  --data-path ./preferences.jsonl \
  --project-name gemma-orpo \
  --trainer orpo \
  --prompt-text-column prompt \
  --text-column chosen \
  --rejected-text-column rejected \
  --peft

Knowledge Distillation

Train a smaller model to mimic a larger one:
aitraining llm --train \
  --model google/gemma-3-270m \
  --teacher-model google/gemma-2-2b \
  --data-path ./prompts.jsonl \
  --project-name distilled-model \
  --use-distillation \
  --distill-temperature 3.0
Distillation defaults: --distill-temperature 3.0, --distill-alpha 0.7, --distill-max-teacher-length 512

Logging & Monitoring

Weights & Biases (Default)

W&B logging with LEET visualizer is enabled by default. The LEET visualizer shows real-time training metrics directly in your terminal.
# W&B is on by default - just run training
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model
To disable W&B or the visualizer:
# Disable W&B logging entirely
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --log none

# Keep W&B but disable terminal visualizer
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --no-wandb-visualizer

TensorBoard

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --log tensorboard

Push to Hugging Face Hub

Upload your trained model:
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --push-to-hub \
  --username your-username \
  --token $HF_TOKEN
The repository is created as private by default. The repo will be named {username}/{project-name}.

Advanced Options

Hyperparameter Sweeps

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name sweep-experiment \
  --use-sweep \
  --sweep-backend optuna \
  --sweep-n-trials 10

Enhanced Evaluation

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --use-enhanced-eval \
  --eval-metrics "perplexity,bleu"

View All Parameters

See all parameters for a specific trainer:
aitraining llm --trainer sft --help
aitraining llm --trainer dpo --help

Next Steps