Hyperparameters

Hyperparameters control how your model learns. Think of them as the settings on your training.

The Essential Three

Learning Rate

How big the steps are when updating the model.

Too high (0.01): Model jumps around, never converges
Too low (0.00001): Takes forever to train
Just right (0.00002): Steady improvement

Common values:

Fine-tuning: 2e-5 to 5e-5
Training from scratch: 1e-4 to 1e-3

Batch Size

How many examples to process before updating weights.

Small (8): More updates, less stable, needs less memory
Large (128): Fewer updates, more stable, needs more memory

Common values:

Limited GPU: 8-16
Good GPU: 32-64
Multiple GPUs: 128+

Epochs

How many times to go through your entire dataset.

Too few (1): Underfitting, model hasn’t learned enough
Too many (100): Overfitting, memorized training data
Just right (3-10): Good balance

Watch validation loss - when it stops improving or gets worse, stop.

Secondary Settings

Warmup Steps

Gradually increase learning rate at the start.

Steps 0-500: Learning rate goes from 0 → 2e-5
Steps 500+: Learning rate stays at 2e-5

Prevents early instability.

Weight Decay

Regularization that prevents weights from getting too large.

Default: 0.0 (for LLM fine-tuning)
No regularization: 0
Strong regularization: 0.1

Gradient Accumulation

Simulate larger batches on limited hardware.

Effective batch size = batch_size × gradient_accumulation_steps

Example: batch_size=4, accumulation=8 → acts like batch_size=32

Task-Specific Defaults

Text Classification

learning_rate = 5e-5
batch_size = 8
epochs = 3
warmup_ratio = 0.1

Language Model Fine-tuning

learning_rate = 3e-5  # AITraining default
batch_size = 2
epochs = 1
warmup_ratio = 0.1
weight_decay = 0.0
gradient_accumulation = 4

Image Classification

learning_rate = 1e-4
batch_size = 32
epochs = 10
warmup_ratio = 0.05

When to Adjust

Learning rate too high?

Loss explodes or becomes NaN
Accuracy jumps around wildly
Never converges

Learning rate too low?

Loss barely decreases
Training takes forever
Stuck at poor performance

Batch size issues?

Out of memory → reduce batch size
Training unstable → increase batch size
Use gradient accumulation if memory limited

Quick Start Values

Not sure where to start? Try these:

# Safe defaults for most tasks
learning_rate = 2e-5
batch_size = 16
epochs = 3
warmup_ratio = 0.1
weight_decay = 0.0

Then adjust based on what you see.

Evaluation Settings

Control when and how your model is evaluated during training:

Parameter	Description	Default
`eval_strategy`	When to evaluate (`epoch`, `steps`, `no`)	`epoch`
`eval_batch_size`	Batch size for evaluation	`8`
`use_enhanced_eval`	Enable advanced metrics (BLEU, ROUGE, etc.)	`False`
`eval_metrics`	Metrics to compute (comma-separated)	`perplexity`
`eval_save_predictions`	Save model predictions	`False`
`eval_benchmark`	Run standard benchmark (mmlu, hellaswag, arc, truthfulqa)	`None`

Pro Tips

Start with defaults - Don’t overthink initially
Change one at a time - Easier to see what helps
Log everything - Track what works for your data
Use validation set - Monitor overfitting

Next Steps

Evaluation Metrics

Measure your success

How Training Works

Understand the process

Getting Started

AI Training Fundamentals

Core Concepts

Interface Selection

Hyperparameters

Hyperparameters

The Essential Three

Learning Rate

Batch Size

Epochs

Secondary Settings

Warmup Steps

Weight Decay

Gradient Accumulation

Task-Specific Defaults

Text Classification

Language Model Fine-tuning

Image Classification

When to Adjust

Quick Start Values

Evaluation Settings

Pro Tips

Next Steps

Evaluation Metrics

How Training Works

Getting Started

AI Training Fundamentals

Core Concepts

Interface Selection

​Hyperparameters

​The Essential Three

​Learning Rate

​Batch Size

​Epochs

​Secondary Settings

​Warmup Steps

​Weight Decay

​Gradient Accumulation

​Task-Specific Defaults

​Text Classification

​Language Model Fine-tuning

​Image Classification

​When to Adjust

​Quick Start Values

​Evaluation Settings

​Pro Tips

​Next Steps

Evaluation Metrics

How Training Works

Hyperparameters

The Essential Three

Learning Rate

Batch Size

Epochs

Secondary Settings

Warmup Steps

Weight Decay

Gradient Accumulation

Task-Specific Defaults

Text Classification

Language Model Fine-tuning

Image Classification

When to Adjust

Quick Start Values

Evaluation Settings

Pro Tips

Next Steps