Automatic Prompt Caching
The Responses API automatically caches repeated prompt prefixes. If your requests share a long system prompt or common context, subsequent requests pay reduced input token costs for the cached portion.
How It Works
- The API detects when multiple requests share identical prefix content
- Cached tokens are billed at a discounted rate (typically 50% off)
- No configuration needed — it's automatic with the Responses API
- Cache typically persists for 5-10 minutes between requests
Maximizing Cache Hits
- Put static content first — system prompts, instructions, examples
- Put dynamic content last — user queries, variable data
- Keep system prompts identical across requests
Rate Limits & Tiers
| Tier | RPM | TPM | How to Upgrade |
| Free | 3 | 40K | — |
| Tier 1 | 500 | 200K | $5 paid |
| Tier 2 | 5,000 | 2M | $50+ paid, 7+ days |
| Tier 3 | 5,000 | 10M | $100+ paid, 7+ days |
| Tier 4+ | 10,000 | 50M+ | $250+ paid, 14+ days |
🎯 Cost Formula: Total cost = (Uncached input tokens × rate) + (Cached tokens × 0.5 × rate) + (Output tokens × rate). Structure prompts for maximum cache hits.