Hyperparameter Sweeps

Automatically search for the best hyperparameters.

Quick Start

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data.jsonl \
  --project-name sweep-experiment \
  --use-sweep \
  --sweep-backend optuna \
  --sweep-n-trials 20

Python API

from autotrain.trainers.clm.params import LLMTrainingParams

params = LLMTrainingParams(
    model="google/gemma-3-270m",
    data_path="./data.jsonl",
    project_name="sweep-experiment",

    # Enable sweep
    use_sweep=True,
    sweep_backend="optuna",
    sweep_n_trials=20,
    sweep_metric="eval_loss",
    sweep_direction="minimize",

    # Base parameters (sweep will vary some)
    trainer="sft",
    epochs=3,
    batch_size=4,
    lr=2e-5,
)

Parameters

Parameter	Description	Default
`use_sweep`	Enable sweeping	`False`
`sweep_backend`	Backend (`optuna`, `grid`, `random`)	`optuna`
`sweep_n_trials`	Number of trials	`10`
`sweep_metric`	Metric to optimize	`eval_loss`
`sweep_direction`	minimize or maximize	`minimize`
`sweep_params`	Custom search space (JSON string)	`None` (auto)

Search Spaces

Default Search Space

By default, sweeps search over:

Learning rate: 1e-5 to 1e-3 (log uniform)
Batch size: 2, 4, 8, 16 (categorical)
Warmup ratio: 0.0 to 0.2 (uniform)

LoRA rank is NOT included in the default sweep. Add it manually via sweep_params if needed.

Custom Search Space

The sweep_params parameter expects a JSON string:

import json

sweep_params = json.dumps({
    "lr": {"type": "loguniform", "low": 1e-6, "high": 1e-3},
    "batch_size": {"type": "categorical", "values": [2, 4, 8]},
    "lora_r": {"type": "categorical", "values": [8, 16, 32, 64]},
    "warmup_ratio": {"type": "uniform", "low": 0.0, "high": 0.2},
})

params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_params=sweep_params,  # JSON string
)

Sweep Backends

Optuna

Efficient Bayesian optimization:

params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_backend="optuna",
)

Grid Search

Exhaustive search over all combinations:

params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_backend="grid",
)

Random Search

Random sampling from search space:

params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_backend="random",
)

Metrics

Standard Metrics

Metric	Description
`eval_loss`	Validation loss
`train_loss`	Training loss
`accuracy`	Classification accuracy
`perplexity`	Language model perplexity

Enhanced Evaluation Metrics

Enable use_enhanced_eval to access additional metrics:

Metric	Description
`perplexity`	Language model perplexity (default)
`bleu`	BLEU score for translation/generation
`rouge`	ROUGE score for summarization
`bertscore`	BERTScore for semantic similarity
`accuracy`	Classification accuracy
`f1`	F1 score
`exact_match`	Exact match accuracy
`meteor`	METEOR score

Enhanced Evaluation Parameters

Parameter	Description	Default
`use_enhanced_eval`	Enable enhanced metrics	`False`
`eval_metrics`	Comma-separated metrics	`"perplexity"`
`eval_strategy`	When to evaluate (`epoch`, `steps`, `no`)	`"epoch"`
`eval_batch_size`	Batch size for evaluation	`8`
`eval_dataset_path`	Path to eval dataset (if different)	`None`
`eval_save_predictions`	Save predictions during eval	`False`
`eval_benchmark`	Run standard benchmark	`None`

Standard Benchmarks

Use eval_benchmark to run standard LLM benchmarks:

Benchmark	Description
`mmlu`	Massive Multitask Language Understanding
`hellaswag`	HellaSwag commonsense reasoning
`arc`	AI2 Reasoning Challenge
`truthfulqa`	TruthfulQA factuality

Custom Metrics Example

params = LLMTrainingParams(
    ...
    use_sweep=True,
    sweep_metric="bleu",
    use_enhanced_eval=True,
    eval_metrics="bleu,rouge,bertscore",
    eval_batch_size=8,
)

Example: Find Best LR

import json

params = LLMTrainingParams(
    model="google/gemma-3-270m",
    data_path="./data.jsonl",
    project_name="lr-sweep",

    use_sweep=True,
    sweep_n_trials=10,
    sweep_params=json.dumps({
        "lr": {"type": "loguniform", "low": 1e-6, "high": 1e-3},
    }),

    # Fixed parameters
    trainer="sft",
    epochs=1,
    batch_size=4,
)

Viewing Results

Optuna Dashboard

pip install optuna-dashboard
optuna-dashboard sqlite:///optuna.db

W&B Dashboard

View sweeps in the W&B web interface.

Best Practices

Start small - 10-20 trials for initial exploration
Use early stopping - Stop bad trials early
Fix what you know - Only sweep uncertain params
Use validation data - Always have eval split

Next Steps

Evaluation

Evaluate sweep results

LoRA/PEFT

Efficient fine-tuning

Training Techniques

Optimization

Custom Development

Evaluation

Research

Production

Hyperparameter Sweeps

Hyperparameter Sweeps

Quick Start

Python API

Parameters

Search Spaces

Default Search Space

Custom Search Space

Sweep Backends

Optuna

Grid Search

Random Search

Metrics

Standard Metrics

Enhanced Evaluation Metrics

Enhanced Evaluation Parameters

Standard Benchmarks

Custom Metrics Example

Example: Find Best LR

Viewing Results

Optuna Dashboard

W&B Dashboard

Best Practices

Next Steps

Evaluation

LoRA/PEFT

Training Techniques

Optimization

Custom Development

Evaluation

Research

Production

​Hyperparameter Sweeps

​Quick Start

​Python API

​Parameters

​Search Spaces

​Default Search Space

​Custom Search Space

​Sweep Backends

​Optuna

​Grid Search

​Random Search

​Metrics

​Standard Metrics

​Enhanced Evaluation Metrics

​Enhanced Evaluation Parameters

​Standard Benchmarks

​Custom Metrics Example

​Example: Find Best LR

​Viewing Results

​Optuna Dashboard

​W&B Dashboard

​Best Practices

​Next Steps

Evaluation

LoRA/PEFT

Hyperparameter Sweeps

Quick Start

Python API

Parameters

Search Spaces

Default Search Space

Custom Search Space

Sweep Backends

Optuna

Grid Search

Random Search

Metrics

Standard Metrics

Enhanced Evaluation Metrics

Enhanced Evaluation Parameters

Standard Benchmarks

Custom Metrics Example

Example: Find Best LR

Viewing Results

Optuna Dashboard

W&B Dashboard

Best Practices

Next Steps