[ ABORT TO HUD ]
SEQ. 1
SEQ. 2

Thresholds, Costs, ZDR

Prompt Caching Framework15 min400 BASE XP

Economic and Technical Limits

Caching is not free; it involves a write premium during the initial serialization of the context. However, all subsequent 'reads' of that cache hit a massive 90% discount. To justify the overhead, Anthropic enforces a minimum token threshold of >1,000 tokens (unified across all models as of mid-2026). If your context is smaller than this, the cache header is simply ignored.

Lifecycle & Compliance

Caches are ephemeral and have a default TTL (Time To Live) of 5 minutes. Every time a cache is 'read', the TTL timer resets. For enterprises focused on security, this system is fully compatible with Zero Data Retention (ZDR)—the cached bits are held recursively in the inference boundary and vaporize immediately upon expiry, ensuring no persistent logs are generated natively.

SYNAPSE VERIFICATION
QUERY 1 // 3
If you send a 500-token prompt with a cache header to Sonnet, what happens?
It caches successfully
The system ignores the header because the context is below the >1,000-token minimum
It costs double tokens
It throws a 400 error
Watch: 139x Rust Speedup
Thresholds, Costs, ZDR | Prompt Caching Framework — Claude Academy