Skip to main content

Hyperparameters

Hyperparameters control how your model learns. Think of them as the settings on your training.

The Essential Three

Learning Rate

How big the steps are when updating the model.
  • Too high (0.01): Model jumps around, never converges
  • Too low (0.00001): Takes forever to train
  • Just right (0.00002): Steady improvement
Common values:
  • Fine-tuning: 2e-5 to 5e-5
  • Training from scratch: 1e-4 to 1e-3

Batch Size

How many examples to process before updating weights.
  • Small (8): More updates, less stable, needs less memory
  • Large (128): Fewer updates, more stable, needs more memory
Common values:
  • Limited GPU: 8-16
  • Good GPU: 32-64
  • Multiple GPUs: 128+

Epochs

How many times to go through your entire dataset.
  • Too few (1): Underfitting, model hasn’t learned enough
  • Too many (100): Overfitting, memorized training data
  • Just right (3-10): Good balance
Watch validation loss - when it stops improving or gets worse, stop.

Secondary Settings

Warmup Steps

Gradually increase learning rate at the start.
Steps 0-500: Learning rate goes from 0 → 2e-5
Steps 500+: Learning rate stays at 2e-5
Prevents early instability.

Weight Decay

Regularization that prevents weights from getting too large.
  • Default: 0.0 (for LLM fine-tuning)
  • No regularization: 0
  • Strong regularization: 0.1

Gradient Accumulation

Simulate larger batches on limited hardware.
Effective batch size = batch_size × gradient_accumulation_steps
Example: batch_size=4, accumulation=8 → acts like batch_size=32

Task-Specific Defaults

Text Classification

learning_rate = 5e-5
batch_size = 8
epochs = 3
warmup_ratio = 0.1

Language Model Fine-tuning

learning_rate = 3e-5  # AITraining default
batch_size = 2
epochs = 1
warmup_ratio = 0.1
weight_decay = 0.0
gradient_accumulation = 4

Image Classification

learning_rate = 1e-4
batch_size = 32
epochs = 10
warmup_ratio = 0.05

When to Adjust

Learning rate too high?
  • Loss explodes or becomes NaN
  • Accuracy jumps around wildly
  • Never converges
Learning rate too low?
  • Loss barely decreases
  • Training takes forever
  • Stuck at poor performance
Batch size issues?
  • Out of memory → reduce batch size
  • Training unstable → increase batch size
  • Use gradient accumulation if memory limited

Quick Start Values

Not sure where to start? Try these:
# Safe defaults for most tasks
learning_rate = 2e-5
batch_size = 16
epochs = 3
warmup_ratio = 0.1
weight_decay = 0.0
Then adjust based on what you see.

Evaluation Settings

Control when and how your model is evaluated during training:
ParameterDescriptionDefault
eval_strategyWhen to evaluate (epoch, steps, no)epoch
eval_batch_sizeBatch size for evaluation8
use_enhanced_evalEnable advanced metrics (BLEU, ROUGE, etc.)False
eval_metricsMetrics to compute (comma-separated)perplexity
eval_save_predictionsSave model predictionsFalse
eval_benchmarkRun standard benchmark (mmlu, hellaswag, arc, truthfulqa)None

Pro Tips

  1. Start with defaults - Don’t overthink initially
  2. Change one at a time - Easier to see what helps
  3. Log everything - Track what works for your data
  4. Use validation set - Monitor overfitting

Next Steps