Skip to main content

Hyperparameter Sweeps

Automatically search for the best hyperparameters.

Quick Start

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data.jsonl \
  --project-name sweep-experiment \
  --use-sweep \
  --sweep-backend optuna \
  --sweep-n-trials 20

Python API

from autotrain.trainers.clm.params import LLMTrainingParams

params = LLMTrainingParams(
    model="google/gemma-3-270m",
    data_path="./data.jsonl",
    project_name="sweep-experiment",

    # Enable sweep
    use_sweep=True,
    sweep_backend="optuna",
    sweep_n_trials=20,
    sweep_metric="eval_loss",
    sweep_direction="minimize",

    # Base parameters (sweep will vary some)
    trainer="sft",
    epochs=3,
    batch_size=4,
    lr=2e-5,
)

Parameters

ParameterDescriptionDefault
use_sweepEnable sweepingFalse
sweep_backendBackend (optuna, grid, random)optuna
sweep_n_trialsNumber of trials10
sweep_metricMetric to optimizeeval_loss
sweep_directionminimize or maximizeminimize
sweep_paramsCustom search space (JSON string)None (auto)

Search Spaces

Default Search Space

By default, sweeps search over:
  • Learning rate: 1e-5 to 1e-3 (log uniform)
  • Batch size: 2, 4, 8, 16 (categorical)
  • Warmup ratio: 0.0 to 0.2 (uniform)
LoRA rank is NOT included in the default sweep. Add it manually via sweep_params if needed.

Custom Search Space

The sweep_params parameter expects a JSON string:
import json

sweep_params = json.dumps({
    "lr": {"type": "loguniform", "low": 1e-6, "high": 1e-3},
    "batch_size": {"type": "categorical", "values": [2, 4, 8]},
    "lora_r": {"type": "categorical", "values": [8, 16, 32, 64]},
    "warmup_ratio": {"type": "uniform", "low": 0.0, "high": 0.2},
})

params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_params=sweep_params,  # JSON string
)

Sweep Backends

Optuna

Efficient Bayesian optimization:
params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_backend="optuna",
)
Exhaustive search over all combinations:
params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_backend="grid",
)
Random sampling from search space:
params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_backend="random",
)

Metrics

Standard Metrics

MetricDescription
eval_lossValidation loss
train_lossTraining loss
accuracyClassification accuracy
perplexityLanguage model perplexity

Enhanced Evaluation Metrics

Enable use_enhanced_eval to access additional metrics:
MetricDescription
perplexityLanguage model perplexity (default)
bleuBLEU score for translation/generation
rougeROUGE score for summarization
bertscoreBERTScore for semantic similarity
accuracyClassification accuracy
f1F1 score
exact_matchExact match accuracy
meteorMETEOR score

Enhanced Evaluation Parameters

ParameterDescriptionDefault
use_enhanced_evalEnable enhanced metricsFalse
eval_metricsComma-separated metrics"perplexity"
eval_strategyWhen to evaluate (epoch, steps, no)"epoch"
eval_batch_sizeBatch size for evaluation8
eval_dataset_pathPath to eval dataset (if different)None
eval_save_predictionsSave predictions during evalFalse
eval_benchmarkRun standard benchmarkNone

Standard Benchmarks

Use eval_benchmark to run standard LLM benchmarks:
BenchmarkDescription
mmluMassive Multitask Language Understanding
hellaswagHellaSwag commonsense reasoning
arcAI2 Reasoning Challenge
truthfulqaTruthfulQA factuality

Custom Metrics Example

params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_metric="bleu",
    use_enhanced_eval=True,
    eval_metrics="bleu,rouge,bertscore",
    eval_batch_size=8,
)

Example: Find Best LR

import json

params = LLMTrainingParams(
    model="google/gemma-3-270m",
    data_path="./data.jsonl",
    project_name="lr-sweep",

    use_sweep=True,
    sweep_n_trials=10,
    sweep_params=json.dumps({
        "lr": {"type": "loguniform", "low": 1e-6, "high": 1e-3},
    }),

    # Fixed parameters
    trainer="sft",
    epochs=1,
    batch_size=4,
)

Viewing Results

Optuna Dashboard

pip install optuna-dashboard
optuna-dashboard sqlite:///optuna.db

W&B Dashboard

View sweeps in the W&B web interface.

Best Practices

  1. Start small - 10-20 trials for initial exploration
  2. Use early stopping - Stop bad trials early
  3. Fix what you know - Only sweep uncertain params
  4. Use validation data - Always have eval split

Next Steps