Understanding Model Types

Different AI tasks require different model architectures. Think of it like choosing the right tool for the job - you wouldn’t use a hammer to paint a wall.

Language Models (LLMs)

The most versatile models that understand and generate human language.

What They Do

Language models can:

Answer questions
Write content
Translate languages
Summarize text
Generate code
Follow instructions

Common Models

Model	Size	Good For	Training Time
GPT-2	124M-1.5B	Starting point, quick experiments	Minutes to hours
BERT	110M-340M	Understanding text, classification	Hours
T5	60M-11B	Text-to-text tasks	Hours to days
LLaMA	7B-70B	General purpose, chat	Days to weeks
Mistral	7B	Efficient, balanced performance	Hours to days

When to Use

Choose language models when you need:

Natural language understanding
Text generation
Question answering
Conversational AI
Code generation

Classification Models

Specialized for sorting things into categories.

Text Classification

Categorize text into predefined groups:

Sentiment analysis (positive/negative)
Topic classification
Intent detection
Language detection

Best models: BERT, DistilBERT, RoBERTa

Image Classification

Identify what’s in an image:

Object recognition
Medical diagnosis
Quality control
Content moderation

Best models: ResNet, EfficientNet, Vision Transformer (ViT)

Multimodal Classification

Handle both text and images:

Meme understanding
Document analysis
Product categorization

Best models: CLIP, LayoutLM, ALIGN

Token Classification

Labels individual words or tokens in text.

Named Entity Recognition (NER)

Find and label specific information:

Names of people, places, organizations
Dates and times
Product names
Medical terms

Part-of-Speech Tagging

Identify grammatical roles:

Nouns, verbs, adjectives
Sentence structure analysis

Best models: BERT-NER, RoBERTa-token, SpaCy transformers

Sequence-to-Sequence

Transform one sequence into another.

Translation

Convert text between languages:

Document translation
Real-time chat translation
Code translation

Summarization

Condense long text:

Article summaries
Meeting notes
Report digests

Question Answering

Extract answers from context:

Customer support
Document Q&A
Educational tools

Best models: T5, BART, mT5 (multilingual)

Computer Vision Models

Process and understand images.

Object Detection

Find and locate objects in images:

Bounding boxes around objects
Count items
Track movement

Best models: YOLO, Faster R-CNN, DETR

Image Segmentation

Pixel-level understanding:

Medical imaging
Autonomous driving
Photo editing

Best models: U-Net, Mask R-CNN, SAM

Image Generation

Create new images:

Art generation
Product visualization
Data augmentation

Best models: Stable Diffusion, DALL-E, Midjourney

Tabular Models

Work with structured data like spreadsheets.

Regression

Predict continuous values:

Price prediction
Sales forecasting
Risk scoring

Classification

Categorize rows:

Customer churn
Fraud detection
Disease diagnosis

Best models: XGBoost, CatBoost, TabNet

Choosing the Right Model

Consider Your Data

Data Type	Recommended Models
Short text (< 512 tokens)	BERT, DistilBERT
Long text (> 512 tokens)	Longformer, BigBird
Conversations	DialoGPT, Blenderbot
Code	CodeBERT, CodeT5
Multiple languages	mBERT, XLM-RoBERTa
Images	ResNet, EfficientNet
Images + Text	CLIP, ALIGN
Structured data	XGBoost, CatBoost

Consider Your Resources

Limited Resources (< 8GB GPU)

DistilBERT (66M parameters)
MobileBERT (25M parameters)
TinyBERT (15M parameters)

Moderate Resources (8-16GB GPU)

BERT-base (110M parameters)
GPT-2 small (124M parameters)
RoBERTa-base (125M parameters)

Good Resources (24GB+ GPU)

GPT-2 large (774M parameters)
T5-large (770M parameters)
LLaMA 7B (7B parameters)

Consider Your Accuracy Needs

Speed over accuracy

Use distilled models (DistilBERT, DistilGPT-2)
Smaller architectures
Quantized models

Accuracy over speed

Use larger models
Ensemble multiple models
Longer training times

Model Sizes and Trade-offs

Parameters Count

Parameters are the adjustable parts of a model. More parameters usually mean:

Better understanding
Higher accuracy
More memory needed
Slower inference

Size Guidelines

Size	Parameters	Use Case	Training Data Needed
Tiny	< 50M	Mobile apps, real-time	100s examples
Small	50M-150M	Standard applications	1000s examples
Base	150M-500M	Production systems	10,000s examples
Large	500M-3B	High accuracy needs	100,000s examples
XL	3B+	State-of-the-art	Millions examples

Pre-trained vs From Scratch

Use Pre-trained Models

99% of the time, start with a pre-trained model:

Already understands language/images
Needs less training data
Faster to train
Better results

Train From Scratch Only When

Working with unique data types
Special domain (medical, legal)
Custom architectures
Research purposes

Fine-tuning Strategies

Full Fine-tuning

Update all model parameters:

Best accuracy
Needs more memory
Risk of overfitting

LoRA (Low-Rank Adaptation)

Update only small adapters:

90% less memory
Faster training
Slightly lower accuracy
Perfect for large models

Prompt Tuning

Train only prompt embeddings:

Minimal memory
Very fast
Good for few-shot learning

Freeze Strategies

Freeze some layers:

Freeze early layers: Keep general features
Freeze late layers: Keep task-specific features
Gradual unfreezing: Start frozen, slowly unfreeze

Multi-task Models

Some models can handle multiple tasks:

T5 Family

Text summarization
Translation
Question answering
Classification

Just change the prompt prefix:

“summarize: …”
“translate English to French: …”
“question: … context: …”

FLAN Models

Pre-trained on many tasks:

Better zero-shot performance
More flexible
Good instruction following

Specialized Architectures

Transformers

The current standard:

Parallel processing
Long-range dependencies
Most modern models

CNNs (Convolutional Neural Networks)

Still great for images:

Efficient
Well-understood
Good for edge devices

RNNs (Recurrent Neural Networks)

Older but still useful:

Sequential data
Time series
Streaming applications

Listen: Beyond LLMs - A Deep Dive

A 45-minute conversation about model types beyond language models, covering vision, tabular, and specialized architectures.

Next Steps

Ready to start training?

Quick Start

Train your first model in 10 minutes

Choose Interface

Pick UI, CLI, or API

Getting Started

AI Training Fundamentals

Core Concepts

Interface Selection

​Understanding Model Types

​Language Models (LLMs)

​What They Do

​Common Models

​When to Use

​Classification Models

​Text Classification

​Image Classification

​Multimodal Classification

​Token Classification

​Named Entity Recognition (NER)

​Part-of-Speech Tagging

​Sequence-to-Sequence

​Translation

​Summarization

​Question Answering

​Computer Vision Models

​Object Detection

​Image Segmentation

​Image Generation

​Tabular Models

​Regression

​Classification

​Choosing the Right Model

​Consider Your Data

​Consider Your Resources

​Consider Your Accuracy Needs

​Model Sizes and Trade-offs

​Parameters Count

​Size Guidelines

​Pre-trained vs From Scratch

​Use Pre-trained Models

​Train From Scratch Only When

​Fine-tuning Strategies

​Full Fine-tuning

​LoRA (Low-Rank Adaptation)

​Prompt Tuning

​Freeze Strategies

​Multi-task Models

​T5 Family

​FLAN Models

​Specialized Architectures

​Transformers

​CNNs (Convolutional Neural Networks)

​RNNs (Recurrent Neural Networks)

​Listen: Beyond LLMs - A Deep Dive

​Next Steps