[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3

How Context Caching Works

🗄️ Context Caching 15m 300 BASE XP

Slashing Costs by 70%

When you cache a large prompt (like a codebase or a 1-hour video), Google processes the input and stores the Key-Value (KV) cache in memory.

Subsequent queries against that cached content skip the initial processing phase. This results in:

  • Up to 70% lower input token costs.
  • Near-instant time-to-first-token (TTFT).
from vertexai.preview import caching

# Cache a massive 1-hour video (minimum 32k tokens required)
cache = caching.CachedContent.create(
    model_name="gemini-3.1-pro-001",
    system_instruction="You are a video analyst.",
    contents=[video_part],
    ttl=datetime.timedelta(minutes=60)
)
SYNAPSE VERIFICATION
QUERY 1 // 2
What is the primary benefit of Context Caching in Vertex AI?
It trains the model on your data
It significantly reduces latency and cost when repeatedly querying massive prompts
It generates better images
It automatically translates the prompt into 50 languages
Watch: 139x Rust Speedup
Google Vertex AI Academy | Free Interactive Course | Infinity AI