[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3

Prompt Caching & Cost Control

💰 Production & Cost Optimization 12 min 250 BASE XP

Automatic Prompt Caching

The Responses API automatically caches repeated prompt prefixes. If your requests share a long system prompt or common context, subsequent requests pay reduced input token costs for the cached portion.

How It Works

  • The API detects when multiple requests share identical prefix content
  • Cached tokens are billed at a discounted rate (typically 50% off)
  • No configuration needed — it's automatic with the Responses API
  • Cache typically persists for 5-10 minutes between requests

Maximizing Cache Hits

  1. Put static content first — system prompts, instructions, examples
  2. Put dynamic content last — user queries, variable data
  3. Keep system prompts identical across requests

Rate Limits & Tiers

TierRPMTPMHow to Upgrade
Free340K
Tier 1500200K$5 paid
Tier 25,0002M$50+ paid, 7+ days
Tier 35,00010M$100+ paid, 7+ days
Tier 4+10,00050M+$250+ paid, 14+ days
🎯 Cost Formula: Total cost = (Uncached input tokens × rate) + (Cached tokens × 0.5 × rate) + (Output tokens × rate). Structure prompts for maximum cache hits.
SYNAPSE VERIFICATION
QUERY 1 // 3
How do you enable prompt caching in the Responses API?
Set cache: true
It's automatic — no configuration needed
Use a special header
Pay for a premium plan
Watch: 139x Rust Speedup
Prompt Caching & Cost Control | Production & Cost Optimization — OpenAI Academy