Generation Parameters
Adjust these settings to control model output.
Key Parameters
Temperature
Controls randomness in responses.
| Value | Effect | Use Case |
|---|
| 0.0 - 0.3 | Very consistent, deterministic | Factual answers, code |
| 0.5 - 0.7 | Balanced | General conversation |
| 0.8 - 1.0 | More varied, creative | Creative writing |
| 1.0+ | Very random | Brainstorming |
Low temperature (0.3): "The capital of France is Paris."
High temperature (1.2): "Paris, the city of lights, serves as France's bustling capital!"
Max Tokens
Maximum length of the response.
| Value | Typical Use |
|---|
| 50-100 | Short answers |
| 256 | Standard responses |
| 512-1024 | Detailed explanations |
| 2048+ | Long-form content |
Longer max tokens = longer generation time.
Top-p (Nucleus Sampling)
Limits token selection to a cumulative probability.
- 0.95 (UI default) - Consider tokens until 95% probability mass
- 0.9 - Slightly more focused
- 0.5 - Very focused
Top-k
Limits to the k most likely tokens.
- 50 (default) - Consider top 50 tokens
- 10 - Very focused
- 100 - More variety
Parameter Combinations
Factual Q&A
temperature: 0.3
max_tokens: 256
top_p: 0.9
Consistent, accurate answers.
Creative Writing
temperature: 0.9
max_tokens: 1024
top_p: 0.95
Varied, creative output.
Code Generation
temperature: 0.2
max_tokens: 512
top_p: 0.95
Precise, syntactically correct code.
Conversation
temperature: 0.7
max_tokens: 256
top_p: 0.9
Natural, varied responses.
Finding the Right Settings
Start with Defaults
Default settings work for most cases:
- temperature: 0.7
- max_tokens: 256
- top_p: 0.95
- top_k: 50
- do_sample: true
UI Slider Ranges
The chat interface provides these parameter ranges:
| Parameter | Min | Max | Step | Default |
|---|
| Temperature | 0 | 2 | 0.1 | 0.7 |
| Max Tokens | 50 | 2048 | 50 | 256 |
| Top P | 0 | 1 | 0.05 | 0.95 |
| Top K | 0 | 100 | 5 | 50 |
Adjust One at a Time
- If responses are too random → lower temperature
- If responses are too repetitive → raise temperature
- If responses are cut off → increase max_tokens
- If responses are too long → decrease max_tokens
Test Systematically
For important applications:
- Pick 5-10 test prompts
- Try each parameter setting
- Compare outputs
- Document what works
Advanced Parameters
Repetition Penalty
Reduces repeated phrases.
- 1.0 - No penalty
- 1.1 - Mild penalty (recommended)
- 1.3+ - Strong penalty
Stop Sequences
End generation when these tokens appear.
- Useful for structured output
- Example:
["\n\n", "User:"]
Do Sample
Controls whether to use sampling or greedy decoding.
- true (default) - Use sampling with temperature/top-p/top-k
- false - Greedy decoding (always pick most likely token)
System Prompt
Set a system message to guide model behavior. Available in the chat interface settings panel.
Example system prompts:
- “You are a helpful coding assistant. Provide concise code examples.”
- “You are a creative writing partner. Be imaginative and descriptive.”
- “You are a technical documentation expert. Be precise and thorough.”
The system prompt is prepended to the conversation context and influences how the model responds throughout the session.
Parameter Effects Summary
| Parameter | Low Value | High Value |
|---|
| temperature | Consistent, focused | Random, creative |
| max_tokens | Short responses | Long responses |
| top_p | Focused | Varied |
| top_k | Very focused | More options |
| repetition_penalty | May repeat | Avoids repetition |
Next Steps