Anthropic's Prompt Caching allows developers to persist large prefixes (like system instructions or tool definitions) in the model's high-speed memory. Unlike automatic caching systems, Anthropic requires explicit markers. You must append a cache_control object set to {"type": "ephemeral"} at specific breakpoints in your request array.
A single API request can contain a maximum of 4 cache breakpoints. This limit forces developers to be strategic: typically, you would cache your system prompt at breakpoint 1, your tool definitions at breakpoint 2, and maybe a large set of reference 'knowledge documents' at breakpoint 3. This leaves the final user-turn volatile while keeping the heavy repetitive context 'warm' in the cluster.