Skip to main content

Generation Parameters

Adjust these settings to control model output.

Key Parameters

Temperature

Controls randomness in responses.
ValueEffectUse Case
0.0 - 0.3Very consistent, deterministicFactual answers, code
0.5 - 0.7BalancedGeneral conversation
0.8 - 1.0More varied, creativeCreative writing
1.0+Very randomBrainstorming
Low temperature (0.3):  "The capital of France is Paris."
High temperature (1.2): "Paris, the city of lights, serves as France's bustling capital!"

Max Tokens

Maximum length of the response.
ValueTypical Use
50-100Short answers
256Standard responses
512-1024Detailed explanations
2048+Long-form content
Longer max tokens = longer generation time.

Top-p (Nucleus Sampling)

Limits token selection to a cumulative probability.
  • 0.95 (UI default) - Consider tokens until 95% probability mass
  • 0.9 - Slightly more focused
  • 0.5 - Very focused

Top-k

Limits to the k most likely tokens.
  • 50 (default) - Consider top 50 tokens
  • 10 - Very focused
  • 100 - More variety

Parameter Combinations

Factual Q&A

temperature: 0.3
max_tokens: 256
top_p: 0.9
Consistent, accurate answers.

Creative Writing

temperature: 0.9
max_tokens: 1024
top_p: 0.95
Varied, creative output.

Code Generation

temperature: 0.2
max_tokens: 512
top_p: 0.95
Precise, syntactically correct code.

Conversation

temperature: 0.7
max_tokens: 256
top_p: 0.9
Natural, varied responses.

Finding the Right Settings

Start with Defaults

Default settings work for most cases:
  • temperature: 0.7
  • max_tokens: 256
  • top_p: 0.95
  • top_k: 50
  • do_sample: true

UI Slider Ranges

The chat interface provides these parameter ranges:
ParameterMinMaxStepDefault
Temperature020.10.7
Max Tokens50204850256
Top P010.050.95
Top K0100550

Adjust One at a Time

  1. If responses are too random → lower temperature
  2. If responses are too repetitive → raise temperature
  3. If responses are cut off → increase max_tokens
  4. If responses are too long → decrease max_tokens

Test Systematically

For important applications:
  1. Pick 5-10 test prompts
  2. Try each parameter setting
  3. Compare outputs
  4. Document what works

Advanced Parameters

Repetition Penalty

Reduces repeated phrases.
  • 1.0 - No penalty
  • 1.1 - Mild penalty (recommended)
  • 1.3+ - Strong penalty

Stop Sequences

End generation when these tokens appear.
  • Useful for structured output
  • Example: ["\n\n", "User:"]

Do Sample

Controls whether to use sampling or greedy decoding.
  • true (default) - Use sampling with temperature/top-p/top-k
  • false - Greedy decoding (always pick most likely token)

System Prompt

Set a system message to guide model behavior. Available in the chat interface settings panel. Example system prompts:
  • “You are a helpful coding assistant. Provide concise code examples.”
  • “You are a creative writing partner. Be imaginative and descriptive.”
  • “You are a technical documentation expert. Be precise and thorough.”
The system prompt is prepended to the conversation context and influences how the model responds throughout the session.

Parameter Effects Summary

ParameterLow ValueHigh Value
temperatureConsistent, focusedRandom, creative
max_tokensShort responsesLong responses
top_pFocusedVaried
top_kVery focusedMore options
repetition_penaltyMay repeatAvoids repetition

Next Steps