Humans don't jump straight to answers — we mutter to ourselves, scribble notes, and reason through problems. The Inner Monologue pattern gives agents the same capability.
Instead of the agent directly outputting actions, you create a structured format where the agent must write out its reasoning before deciding what to do:
## Agent Scratchpad
**Current Goal:** Find the user's order status
**What I Know:**
- User provided order ID: #12345
- I have access to the orders_db tool
**What I Need To Do:**
- Query the database for order #12345
- Check if the order has shipped
**My Confidence:** 9/10 — this is straightforward
**Decision:** Call orders_db.get_status("12345")
| Benefit | Mechanism | Impact |
|---|---|---|
| Reduced Errors | Chain-of-thought forces logical reasoning | 30-50% fewer tool call errors |
| Better Debugging | You can read the agent's reasoning | Find failures in minutes, not hours |
| Self-Monitoring | Confidence scores trigger escalation | Agent knows when to ask for help |
| Auditability | Full reasoning trail is logged | Compliance and post-mortem analysis |
System Prompt: "Before every action, write your reasoning inside <scratchpad> tags. Include: 1. Current sub-goal 2. Information gathered so far 3. Next planned action and why 4. Confidence level (1-10) </scratchpad> Then emit your action."
Claude's native Extended Thinking feature automates this pattern. By enabling thinking: {type: "enabled", budget_tokens: 4000}, Claude shows its reasoning in a dedicated thinking block before the final response — no custom prompting needed.
Use a smaller, cheap model (like Haiku) as the "inner monologue" step, then pass its analysis to the main model for the final decision. This separates reasoning cost from action cost.
| Feature | Custom Scratchpad | Extended Thinking |
|---|---|---|
| Setup | Requires prompt engineering | One parameter toggle |
| Visibility | Visible in output (can be parsed) | Separate thinking block (may not be cacheable) |
| Control | Full control over format | Model decides depth |
| Cost | Counts as output tokens | Separate thinking token budget |