A single agent task might cost $0.05. At 10,000 tasks/day, that's $500/day or $180K/year. Cost optimization isn't optional — it's survival.
| Technique | Savings | Complexity | How It Works |
|---|---|---|---|
| Model Routing | 50-80% | Medium | Use Haiku for simple tasks, Sonnet for complex, Opus for critical |
| Prompt Caching | 60-90% | Low | Cache static prefixes (Anthropic reduces cached token cost by 90%) |
| Tool Result Caching | 20-50% | Low | Cache identical tool calls (same query = cached result) |
| Batch Processing | 50% | Low | Use Batch API for non-real-time tasks (Anthropic: 50% off) |
| Context Compaction | 40-70% | Medium | Summarize old messages, keep recent ones |
| Iteration Caps | Variable | Low | Hard limit on agent loops (prevent infinite spinning) |
// Route tasks to the cheapest capable model:
async function routeToModel(task) {
const complexity = await classifyComplexity(task); // Use Haiku to classify
switch (complexity) {
case "simple": return { model: "haiku", maxTokens: 1024 }; // ~$0.001
case "moderate": return { model: "sonnet", maxTokens: 4096 }; // ~$0.01
case "complex": return { model: "opus", maxTokens: 8192 }; // ~$0.10
}
}