Skip to main content

Loading Models

The chat interface can load models from local paths or Hugging Face.

Loading a Local Model

After training with AITraining, your model is saved locally. To load it:
  1. Find your model path (e.g., ./my-project/)
  2. Enter the path in the model selector
  3. Click “Load Model”
Model path: ./my-project

What to Look For

Your trained model directory should contain:
  • config.json - Model configuration
  • model.safetensors or pytorch_model.bin - Model weights
  • tokenizer.json and related tokenizer files

Loading from Hugging Face

Load any compatible model from the Hugging Face Hub:
Model path: meta-llama/Llama-3.2-1B
Popular models:
  • meta-llama/Llama-3.2-1B - Small, fast Llama
  • mistralai/Mistral-7B-v0.1 - Efficient 7B model
  • google/gemma-2b - Google’s Gemma
Large models require significant GPU memory. A 7B model needs ~14GB VRAM.

Loading LoRA Adapters

PEFT/LoRA models are automatically detected and loaded correctly. Simply provide the path to your adapter directory:
Model path: ./my-lora-model
The chat interface automatically:
  1. Detects the adapter_config.json file
  2. Loads the base model specified in the adapter config
  3. Applies the LoRA adapters
If you trained with --merge-adapter (the default), your model is already merged and loads like any standard model.

Memory Requirements

Model SizeApproximate VRAM
1B~2GB
3B~6GB
7B~14GB
13B~26GB
Use quantized models (int4/int8) to reduce memory by 2-4x.

Switching Models

To switch to a different model:
  1. Enter new model path
  2. Click “Load Model”
  3. Previous model is unloaded
Note: Conversation history clears when switching models.

Troubleshooting

Check:
  • Path is correct and exists
  • For HuggingFace models, check the model ID
  • Ensure you have access (some models require authentication)
Try:
  • Smaller model
  • Quantized version
  • Close other GPU-using applications
First load downloads model weights. Subsequent loads are faster. Large models (7B+) take 30-60 seconds to load.

Next Steps