Server-Side Context Compaction
Context Compaction (Beta, 2026) is a server-side feature that automatically summarizes older parts of a conversation as it approaches the context window limit. This effectively extends the usable context window to infinity for long-running agent sessions.
How It Works
- Monitoring: Anthropic's infrastructure monitors the conversation's token usage in real-time.
- Triggering: When usage exceeds ~80% of the context window, compaction is triggered.
- Summarization: Older messages are replaced with a dense, LLM-generated summary that preserves key decisions, facts, and action items.
- Continuation: The conversation continues seamlessly with the compacted context + recent messages.
Developer vs Server Compaction
| Approach | Who Manages | Token Visibility | Best For |
| Manual (client-side) | Your code | Full control over summary quality | Production agents needing deterministic summaries |
| Automatic (server-side) | Anthropic | Transparent — handled in background | Rapid prototyping, long chat sessions, Managed Agents |
🚧 Important: Server-side compaction is lossy by nature. For applications where every detail matters (legal, medical), implement your own compaction logic with explicit preservation rules rather than relying on automatic summarization.