With models like Claude Opus 4.6 and Sonnet 4.6, you can enable Extended Thinking via the thinking object. This allows the model to 'reason' internally before generating a final answer. This is not just a hidden prompt; it is a distinct compute block where the model can perform chain-of-thought analysis, mathematical verification, and architectural planning.
When enabling thinking, you must set budget_tokens (minimum 1024). These tokens are consumed from your max_tokens limit. If you set max_tokens: 4096 and budget_tokens: 2048, the model has exactly 2048 tokens left to give you a physical response. If it spends its entire budget thinking, it will reach 'Max Tokens' before answering.
In 2026, Claude can now perform interleaved thinking — reasoning in between sequential tool calls. This allows the model to analyze tool outputs, adjust its strategy, and deliberate before making the next action. This is critical for complex multi-step agent workflows where each tool result changes the optimal next step.
For Opus 4.6, Anthropic introduced an effort parameter that controls the balance between reasoning thoroughness and speed. Set effort: "high" for complex analysis, or effort: "low" for straightforward tasks. This gives developers fine-grained control over cost vs. quality tradeoffs at the API level.