[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3
SEQ. 4

Caching & Conversation Compaction

🧠 Agent Memory10 min90 BASE XP

Keeping Agents Fast and Cheap

Without caching and compaction, agent costs grow linearly with conversation length. A 50-turn conversation can cost 100x what it should.

Three Caching Strategies

StrategyHow It WorksSavingsTrade-off
Prompt CachingCache the system prompt + tool definitions (Anthropic charges 90% less for cached prefixes)60-90% on repeated callsMust maintain prefix stability
Result CachingCache tool results (e.g., same API call = cached response)100% for repeated queriesStale data risk
Embedding CachingCache query embeddings to skip re-embedding identical queries50-70% on embedding costsCache invalidation complexity

Conversation Compaction

When a conversation exceeds 80% of the context window, compact it:

// Conversation Compaction Strategy:
// Before: 120 messages (80K tokens)
// After: 1 summary (2K tokens) + last 10 messages

async function compactConversation(messages) {
  if (tokenCount(messages) < MAX_TOKENS * 0.8) return messages;
  
  const oldMessages = messages.slice(0, -10);
  const recentMessages = messages.slice(-10);
  
  const summary = await llm.generate({
    system: "Summarize this conversation. Keep ALL key decisions, facts, and action items. Be thorough.",
    user: JSON.stringify(oldMessages)
  });
  
  return [
    { role: "system", content: `Previous conversation summary: ${summary}` },
    ...recentMessages
  ];
}
🚧 Warning: Compaction is lossy. Important details CAN be lost in summarization. Always include a caveat in the summary prompt: "Keep ALL key decisions, user preferences, and commitments. When in doubt, include the detail."

Cost Optimization Matrix

TechniqueImplementation EffortTypical Savings
Prompt Caching (Anthropic)Low (add cache_control breakpoints)60-90%
Conversation CompactionMedium (summarization logic)40-70%
Tool Result CachingLow (Redis/in-memory cache)20-50%
Model Routing (Haiku for easy tasks)Medium (classifier needed)50-80%
SYNAPSE VERIFICATION
QUERY 1 // 2
What is conversation compaction?
Deleting old messages permanently
Summarizing older messages into a condensed summary while keeping recent messages intact
Compressing messages with gzip
Removing duplicate messages
Watch: 139x Rust Speedup
Caching & Conversation Compaction | Agent Memory — AI Agents Academy