Skip to main content

Rate Limits

AITraining rate limits apply when using cloud resources.

Local Training

Local training has no rate limits - you’re only limited by your hardware.

Hugging Face Hub

When pushing to or pulling from the Hub:
OperationRate Limit
Model downloadsFair use
Dataset downloadsFair use
Model uploadsFair use

Handling Rate Limits

If you hit rate limits:
import time
from huggingface_hub import HfApi

def download_with_retry(model_id, max_retries=3):
    api = HfApi()
    for attempt in range(max_retries):
        try:
            return api.model_info(model_id)
        except Exception as e:
            if "rate limit" in str(e).lower():
                wait = 60 * (attempt + 1)
                print(f"Rate limited, waiting {wait}s...")
                time.sleep(wait)
            else:
                raise

W&B Logging

Weights & Biases has logging limits based on your plan:
PlanLogged Hours/Month
Free200
TeamsUnlimited
EnterpriseUnlimited

Reducing Log Volume

params = LLMTrainingParams(
    model="google/gemma-3-270m",
    data_path="./data.jsonl",
    project_name="my-model",
    log="wandb",
    logging_steps=50,  # Log less frequently
)

GPU Cloud Services

If using cloud GPUs (not applicable to local training):

Hugging Face Spaces

  • Limited by your Spaces quota
  • Persistent storage limits apply

Other Clouds

Check your cloud provider’s quotas for:
  • GPU hours
  • Storage
  • Network bandwidth

Best Practices

  1. Cache models locally - Don’t re-download
  2. Log efficiently - Don’t log every step
  3. Use checkpoints - Resume instead of restart
  4. Batch operations - Reduce API calls

Next Steps