Starting with Gemini 2.5 and enhanced in 3.x, Gemini models support a thinking_level parameter that controls how much internal reasoning (chain-of-thought) the model performs before answering. This allows developers to trade off between speed/cost and reasoning depth.
The ThinkingConfig object controls reasoning behavior via the thinking_level parameter:
| Level | Behavior | Best For |
|---|---|---|
| LOW | Minimal internal reasoning, fastest response | Simple lookups, classification, quick answers |
| MEDIUM | Balanced reasoning depth | Standard coding, analysis, summarization |
| HIGH | Maximum reasoning, longest latency | Complex math, multi-step logic, research |
from vertexai.generative_models import GenerativeModel
model = GenerativeModel("gemini-3.5-flash")
# Use LOW thinking for fast classification
fast_response = model.generate_content(
"Classify this email as spam or not spam: 'You won a prize!'",
generation_config={"thinking_config": {"thinking_level": "LOW"}}
)
# Use HIGH thinking for complex reasoning
deep_response = model.generate_content(
"Prove that there are infinitely many prime numbers.",
generation_config={"thinking_config": {"thinking_level": "HIGH"}}
)
Deep Think is Gemini 3.5 Pro's advanced reasoning mode that significantly extends the model's internal chain-of-thought for the most complex tasks — mathematical proofs, multi-file code refactoring, scientific analysis. Deep Think automatically engages when thinking_level is set to HIGH on capable models.
Thinking tokens (the model's internal reasoning) are billed as output tokens. When using HIGH thinking, the model may generate thousands of internal tokens before producing a visible answer. Monitor your usage carefully — setting thinking_level appropriately per task is essential for cost control.