Skip to main content

LLM Training

The aitraining llm command trains large language models with support for multiple trainers and techniques.

Quick Start

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --trainer sft

Available Trainers

TrainerDescription
default / sft / genericSupervised fine-tuning
dpoDirect Preference Optimization
orpoOdds Ratio Preference Optimization
ppoProximal Policy Optimization
rewardReward model training
distillationKnowledge distillation
generic is an alias for default. All three (default, sft, generic) produce the same behavior.
PPO Trainer Requirements: PPO requires either --rl-reward-model-path (path to a trained reward model) or --model-ref (reference model for KL divergence). See PPO Training for full documentation.

Parameter Groups

Parameters are organized into logical groups:

Basic Parameters

ParameterDescriptionDefault
--modelBase model to fine-tunegoogle/gemma-3-270m
--data-pathPath to training datadata
--project-nameOutput directory nameproject-name
--train-splitTraining data splittrain
--valid-splitValidation data splitNone
Always specify these parameters: While --model, --data-path, and --project-name have defaults, you should always explicitly set them for your use case. The --project-name parameter sets the output folder - use a path like --project-name ./models/my-experiment to control where the trained model is saved.

Training Configuration

ParameterDescriptionDefault
--trainerTraining methoddefault
--epochsNumber of training epochs1
--batch-sizeTraining batch size2
--lrLearning rate3e-5
--mixed-precisionfp16/bf16/NoneNone
--gradient-accumulationAccumulation steps4
--warmup-ratioWarmup ratio0.1
--optimizerOptimizeradamw_torch
--schedulerLR schedulerlinear
--weight-decayWeight decay0.0
--max-grad-normMax gradient norm1.0
--seedRandom seed42

Checkpointing & Evaluation

ParameterDescriptionDefault
--eval-strategyWhen to evaluate (epoch, steps, no)epoch
--save-strategyWhen to save (epoch, steps, no)epoch
--save-stepsSave every N steps (if save-strategy=steps)500
--save-total-limitMax checkpoints to keep1
--logging-stepsLog every N steps (-1 for auto)-1

Performance & Memory

ParameterDescriptionDefault
--auto-find-batch-sizeAutomatically find optimal batch sizeFalse
--disable-gradient-checkpointingDisable memory optimizationFalse
--unslothUse Unsloth for faster training (SFT only, llama/mistral/gemma/qwen2)False
--use-sharegpt-mappingUse Unsloth’s ShareGPT mappingFalse
--use-flash-attention-2Use Flash Attention 2 for faster trainingFalse
--attn-implementationAttention implementation (eager, sdpa, flash_attention_2)None
Unsloth Requirements: Unsloth only works with sft/default trainers and specific model architectures (llama, mistral, gemma, qwen2). See Unsloth Integration for details.

Backend & Distribution

ParameterDescriptionDefault
--backendWhere to run (local, spaces)local
--distributed-backendDistribution backend (ddp, deepspeed)None
Multi-GPU Behavior: With multiple GPUs and --distributed-backend not set, DDP is used automatically. Set --distributed-backend deepspeed for DeepSpeed Zero-3 optimization. Training is launched via Accelerate.
DeepSpeed Checkpointing: When using DeepSpeed, model saving uses accelerator.get_state_dict() and unwraps the model. PEFT adapter saving is handled differently under DeepSpeed.

PEFT/LoRA Parameters

ParameterDescriptionDefault
--peftEnable LoRA trainingFalse
--lora-rLoRA rank16
--lora-alphaLoRA alpha32
--lora-dropoutLoRA dropout0.05
--target-modulesModules to targetall-linear
--quantizationint4/int8 quantizationNone
--merge-adapterMerge LoRA after trainingTrue

Data Processing

ParameterDescriptionDefault
--text-columnText column nametext
--block-sizeMax sequence length-1 (model default)
--model-max-lengthMaximum model input lengthAuto-detect from model
--paddingPadding side (left or right)right
--add-eos-tokenAppend EOS tokenTrue
--chat-templateChat template to useAuto by trainer
--packingEnable sequence packing (requires flash attention)None
--auto-convert-datasetAuto-detect and convert dataset formatFalse
--max-samplesLimit dataset size for testingNone
--save-processed-dataSave processed data: auto, local, hub, both, noneauto
Chat Template Auto-Selection: SFT/DPO/ORPO/Reward trainers default to tokenizer (model’s built-in template). Use --chat-template none for plain text training.
Processed Data Saving: By default (auto), processed data is saved locally to {project}/data_processed/. If the source dataset was from the Hub, it’s also pushed as a private dataset. Original columns are renamed to _original_* to prevent conflicts.

Training Examples

SFT with LoRA

aitraining llm --train \
  --model meta-llama/Llama-3.2-1B \
  --data-path ./conversations.jsonl \
  --project-name llama-sft \
  --trainer sft \
  --peft \
  --lora-r 16 \
  --lora-alpha 32 \
  --epochs 3 \
  --batch-size 4

DPO Training

For DPO, you must specify the column names for prompt, chosen, and rejected responses:
aitraining llm --train \
  --model meta-llama/Llama-3.2-1B \
  --data-path ./preferences.jsonl \
  --project-name llama-dpo \
  --trainer dpo \
  --prompt-text-column prompt \
  --text-column chosen \
  --rejected-text-column rejected \
  --dpo-beta 0.1 \
  --peft \
  --lora-r 16
DPO and ORPO require --prompt-text-column and --rejected-text-column to be specified.

ORPO Training

ORPO combines SFT and preference optimization:
aitraining llm --train \
  --model google/gemma-2-2b \
  --data-path ./preferences.jsonl \
  --project-name gemma-orpo \
  --trainer orpo \
  --prompt-text-column prompt \
  --text-column chosen \
  --rejected-text-column rejected \
  --peft

Knowledge Distillation

Train a smaller model to mimic a larger one:
aitraining llm --train \
  --model google/gemma-3-270m \
  --teacher-model google/gemma-2-2b \
  --data-path ./prompts.jsonl \
  --project-name distilled-model \
  --use-distillation \
  --distill-temperature 3.0
Distillation defaults: --distill-temperature 3.0, --distill-alpha 0.7, --distill-max-teacher-length 512

Logging & Monitoring

Weights & Biases (Default)

W&B logging with LEET visualizer is enabled by default. The LEET visualizer shows real-time training metrics directly in your terminal.
# W&B is on by default - just run training
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model
To disable W&B or the visualizer:
# Disable W&B logging entirely
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --log none

# Keep W&B but disable terminal visualizer
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --no-wandb-visualizer

TensorBoard

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --log tensorboard

Push to Hugging Face Hub

Upload your trained model:
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --push-to-hub \
  --username your-username \
  --token $HF_TOKEN
The repository is created as private by default. By default, the repo will be named {username}/{project-name}.

Custom Repository Name or Organization

Use --repo-id to push to a specific repository, useful for:
  • Pushing to an organization instead of your personal account
  • Using a different repo name than your local project-name
# Push to an organization
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name ./local-training-output \
  --push-to-hub \
  --repo-id my-organization/my-custom-model-name \
  --token $HF_TOKEN

# Push to personal account with different name
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name ./experiment-v3 \
  --push-to-hub \
  --repo-id your-username/production-model \
  --token $HF_TOKEN
ParameterDescriptionDefault
--push-to-hubEnable pushing to HubFalse
--usernameHF username (for default repo naming)None
--tokenHF API tokenNone
--repo-idFull repo ID (e.g., org/model-name){username}/{project-name}
When using --repo-id, you don’t need --username since the repo ID already specifies the destination. However, you still need --token for authentication.

Advanced Options

Hyperparameter Sweeps

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name sweep-experiment \
  --use-sweep \
  --sweep-backend optuna \
  --sweep-n-trials 10

Enhanced Evaluation

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name my-model \
  --use-enhanced-eval \
  --eval-metrics "perplexity,bleu"

View All Parameters

See all parameters for a specific trainer:
aitraining llm --trainer sft --help
aitraining llm --trainer dpo --help

Next Steps