Hyperparameter Sweeps
Automatically search for the best hyperparameters.
Quick Start
aitraining llm --train \
--model google/gemma-3-270m \
--data-path ./data.jsonl \
--project-name sweep-experiment \
--use-sweep \
--sweep-backend optuna \
--sweep-n-trials 20
Python API
from autotrain.trainers.clm.params import LLMTrainingParams
params = LLMTrainingParams(
model="google/gemma-3-270m",
data_path="./data.jsonl",
project_name="sweep-experiment",
# Enable sweep
use_sweep=True,
sweep_backend="optuna",
sweep_n_trials=20,
sweep_metric="eval_loss",
sweep_direction="minimize",
# Base parameters (sweep will vary some)
trainer="sft",
epochs=3,
batch_size=4,
lr=2e-5,
)
Parameters
| Parameter | Description | Default |
|---|
use_sweep | Enable sweeping | False |
sweep_backend | Backend (optuna, grid, random) | optuna |
sweep_n_trials | Number of trials | 10 |
sweep_metric | Metric to optimize | eval_loss |
sweep_direction | minimize or maximize | minimize |
sweep_params | Custom search space (JSON string) | None (auto) |
Search Spaces
Default Search Space
By default, sweeps search over:
- Learning rate: 1e-5 to 1e-3 (log uniform)
- Batch size: 2, 4, 8, 16 (categorical)
- Warmup ratio: 0.0 to 0.2 (uniform)
LoRA rank is NOT included in the default sweep. Add it manually via sweep_params if needed.
Custom Search Space
The sweep_params parameter expects a JSON string:
import json
sweep_params = json.dumps({
"lr": {"type": "loguniform", "low": 1e-6, "high": 1e-3},
"batch_size": {"type": "categorical", "values": [2, 4, 8]},
"lora_r": {"type": "categorical", "values": [8, 16, 32, 64]},
"warmup_ratio": {"type": "uniform", "low": 0.0, "high": 0.2},
})
params = LLMTrainingParams(
...
use_sweep=True,
sweep_params=sweep_params, # JSON string
)
Sweep Backends
Optuna
Efficient Bayesian optimization:
params = LLMTrainingParams(
...
use_sweep=True,
sweep_backend="optuna",
)
Grid Search
Exhaustive search over all combinations:
params = LLMTrainingParams(
...
use_sweep=True,
sweep_backend="grid",
)
Random Search
Random sampling from search space:
params = LLMTrainingParams(
...
use_sweep=True,
sweep_backend="random",
)
Metrics
Standard Metrics
| Metric | Description |
|---|
eval_loss | Validation loss |
train_loss | Training loss |
accuracy | Classification accuracy |
perplexity | Language model perplexity |
Enhanced Evaluation Metrics
Enable use_enhanced_eval to access additional metrics:
| Metric | Description |
|---|
perplexity | Language model perplexity (default) |
bleu | BLEU score for translation/generation |
rouge | ROUGE score for summarization |
bertscore | BERTScore for semantic similarity |
accuracy | Classification accuracy |
f1 | F1 score |
exact_match | Exact match accuracy |
meteor | METEOR score |
Enhanced Evaluation Parameters
| Parameter | Description | Default |
|---|
use_enhanced_eval | Enable enhanced metrics | False |
eval_metrics | Comma-separated metrics | "perplexity" |
eval_strategy | When to evaluate (epoch, steps, no) | "epoch" |
eval_batch_size | Batch size for evaluation | 8 |
eval_dataset_path | Path to eval dataset (if different) | None |
eval_save_predictions | Save predictions during eval | False |
eval_benchmark | Run standard benchmark | None |
Standard Benchmarks
Use eval_benchmark to run standard LLM benchmarks:
| Benchmark | Description |
|---|
mmlu | Massive Multitask Language Understanding |
hellaswag | HellaSwag commonsense reasoning |
arc | AI2 Reasoning Challenge |
truthfulqa | TruthfulQA factuality |
Custom Metrics Example
params = LLMTrainingParams(
...
use_sweep=True,
sweep_metric="bleu",
use_enhanced_eval=True,
eval_metrics="bleu,rouge,bertscore",
eval_batch_size=8,
)
Example: Find Best LR
import json
params = LLMTrainingParams(
model="google/gemma-3-270m",
data_path="./data.jsonl",
project_name="lr-sweep",
use_sweep=True,
sweep_n_trials=10,
sweep_params=json.dumps({
"lr": {"type": "loguniform", "low": 1e-6, "high": 1e-3},
}),
# Fixed parameters
trainer="sft",
epochs=1,
batch_size=4,
)
Viewing Results
Optuna Dashboard
pip install optuna-dashboard
optuna-dashboard sqlite:///optuna.db
W&B Dashboard
View sweeps in the W&B web interface.
Best Practices
- Start small - 10-20 trials for initial exploration
- Use early stopping - Stop bad trials early
- Fix what you know - Only sweep uncertain params
- Use validation data - Always have eval split
Next Steps