LoRA & PEFT

Parameter-Efficient Fine-Tuning lets you train large models with less memory.

What is LoRA?

LoRA (Low-Rank Adaptation) adds small trainable matrices to the model while keeping base weights frozen. This dramatically reduces memory usage and training time.

Quick Start

aitraining llm --train \
  --model meta-llama/Llama-3.2-1B \
  --data-path ./data.jsonl \
  --project-name lora-model \
  --peft \
  --lora-r 16 \
  --lora-alpha 32

Python API

from autotrain.trainers.clm.params import LLMTrainingParams
from autotrain.project import AutoTrainProject

params = LLMTrainingParams(
    model="meta-llama/Llama-3.2-1B",
    data_path="./data.jsonl",
    project_name="lora-model",

    trainer="sft",

    # LoRA configuration
    peft=True,
    lora_r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules="all-linear",  # Default: all-linear

    epochs=3,
    batch_size=4,
    lr=2e-4,  # Higher LR works with LoRA
)

project = AutoTrainProject(params=params, backend="local", process=True)
project.create()

Parameters

Parameter	Description	Default
`peft`	Enable LoRA	`False`
`lora_r`	Rank (size of adapters)	`16`
`lora_alpha`	Scaling factor	`32`
`lora_dropout`	Dropout rate	`0.05`
`target_modules`	Modules to adapt	`all-linear`

Rank (lora_r)

Higher rank = more parameters = more capacity:

Rank	Use Case
8	Simple tasks, very memory constrained
16	Standard (recommended)
32-64	Complex tasks, more memory available
128+	Near full fine-tuning capacity

Alpha

The alpha/rank ratio affects learning:

# Standard scaling
lora_r=16
lora_alpha=32  # alpha/r = 2

# More aggressive
lora_r=16
lora_alpha=64  # alpha/r = 4

# Conservative
lora_r=16
lora_alpha=16  # alpha/r = 1

Target Modules

By default, LoRA targets all linear layers (all-linear). You can customize:

# All linear layers (default)
target_modules="all-linear"

# Attention layers only
target_modules="q_proj,k_proj,v_proj,o_proj"

# Include MLP
target_modules="q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj"

With Quantization

Combine LoRA with quantization for maximum memory savings:

aitraining llm --train \
  --model meta-llama/Llama-3.2-8B \
  --data-path ./data.jsonl \
  --project-name quantized-lora \
  --peft \
  --quantization int4

params = LLMTrainingParams(
    model="meta-llama/Llama-3.2-8B",
    data_path="./data.jsonl",
    project_name="quantized-lora",

    peft=True,
    lora_r=16,
    quantization="int4",  # or "int8"
)

Memory Comparison

Model	Full Fine-tune	LoRA	LoRA + 4bit
1B	8 GB	4 GB	3 GB
7B	56 GB	16 GB	8 GB
13B	104 GB	32 GB	16 GB

Merging Adapters

By default, LoRA adapters are automatically merged into the base model after training. This makes inference simpler - you get a single model file ready to use.

Default Behavior (Merged)

params = LLMTrainingParams(
    ...
    peft=True,
    # merge_adapter=True is the default
)

Save Adapters Only

To save only the adapter files (smaller, but requires base model for inference):

aitraining llm --train \
  --model meta-llama/Llama-3.2-1B \
  --data-path ./data.jsonl \
  --project-name lora-model \
  --peft \
  --no-merge-adapter

Manual Merge Later

aitraining tools merge-llm-adapter \
  --base-model-path meta-llama/Llama-3.2-1B \
  --adapter-path ./lora-model \
  --output-folder ./merged-model

You must specify either --output-folder to save locally or --push-to-hub to upload to Hugging Face Hub.

Merge Tool Parameters

Parameter	Description	Required
`--base-model-path`	Base model to merge adapter into	Yes
`--adapter-path`	Path to LoRA adapter	Yes
`--output-folder`	Local output directory	One of these
`--push-to-hub`	Push to Hugging Face Hub	required
`--token`	Hugging Face token (for hub push)	No
`--pad-to-multiple-of`	Pad vocab size	No

Or in Python:

from peft import PeftModel
from transformers import AutoModelForCausalLM

# Load base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")

# Load and merge adapter
model = PeftModel.from_pretrained(model, "./lora-model")
model = model.merge_and_unload()

# Save merged model
model.save_pretrained("./merged-model")

Convert to Kohya Format

Convert LoRA adapters to Kohya-compatible .safetensors format:

aitraining tools convert_to_kohya \
  --adapter-path ./lora-model \
  --output-path ./kohya-lora.safetensors

Loading Adapters

Use adapters without merging:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

# Load adapter
model = PeftModel.from_pretrained(model, "./lora-model")

# Use for inference
inputs = tokenizer("Hello", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)

Best Practices

Training

Use higher learning rate (2e-4 to 1e-3)
LoRA benefits from longer training
Consider targeting all linear layers for complex tasks

Memory

Start with lora_r=16
Add quantization if needed
Use gradient checkpointing (on by default)

Quality

Higher rank generally = better quality
Test on your specific task
Compare with full fine-tuning if memory allows

Next Steps

Quantization

Further memory reduction

DPO Training

Preference optimization

Training Techniques

Optimization

Custom Development

Evaluation

Research

Production

LoRA & PEFT

LoRA & PEFT

What is LoRA?

Quick Start

Python API

Parameters

Rank (lora_r)

Alpha

Target Modules

With Quantization

Memory Comparison

Merging Adapters

Default Behavior (Merged)

Save Adapters Only

Manual Merge Later

Merge Tool Parameters

Convert to Kohya Format

Loading Adapters

Best Practices

Training

Memory

Quality

Next Steps

Quantization

DPO Training

Training Techniques

Optimization

Custom Development

Evaluation

Research

Production

​LoRA & PEFT

​What is LoRA?

​Quick Start

​Python API

​Parameters

​Rank (lora_r)

​Alpha

​Target Modules

​With Quantization

​Memory Comparison

​Merging Adapters

​Default Behavior (Merged)

​Save Adapters Only

​Manual Merge Later

​Merge Tool Parameters

​Convert to Kohya Format

​Loading Adapters

​Best Practices

​Training

​Memory

​Quality

​Next Steps

Quantization

DPO Training

LoRA & PEFT

What is LoRA?

Quick Start

Python API

Parameters

Rank (lora_r)

Alpha

Target Modules

With Quantization

Memory Comparison

Merging Adapters

Default Behavior (Merged)

Save Adapters Only

Manual Merge Later

Merge Tool Parameters

Convert to Kohya Format

Loading Adapters

Best Practices

Training

Memory

Quality

Next Steps