Caching is not free; it involves a write premium during the initial serialization of the context. However, all subsequent 'reads' of that cache hit a massive 90% discount. To justify the overhead, Anthropic enforces minimum token thresholds: 1024 tokens for Claude Sonnet 4.6 and 2048 tokens for Claude Opus 4.7. If your context is smaller than this, the cache header is simply ignored.
Caches are ephemeral and have a default TTL (Time To Live) of 5 minutes. Every time a cache is 'read', the TTL timer resets. For enterprises focused on security, this system is fully compatible with Zero Data Retention (ZDR)—the cached bits are held recursively in the inference boundary and vaporize immediately upon expiry, ensuring no persistent logs are generated natively.