The Gemini Live API enables real-time, bidirectional streaming of audio, video, and text over a persistent WebSocket connection. It is designed for interactive applications — voice assistants, live translation, video analysis, and collaborative coding.
| Feature | Description |
|---|---|
| Audio Streaming | Send and receive audio in real time for voice conversations |
| Video Streaming | Stream camera or screen feeds for live visual analysis |
| Text Streaming | Real-time text generation with token-level streaming |
| Barge-in | Users can interrupt the model mid-response, and it adapts instantly |
| 70+ Languages | Native multilingual support for global voice applications |
import asyncio
import websockets
import json
async def live_session():
uri = "wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent"
async with websockets.connect(uri) as ws:
# Send setup message
await ws.send(json.dumps({
"setup": {
"model": "models/gemini-3.5-flash",
"generation_config": {
"response_modalities": ["AUDIO"],
"speech_config": {"voice_config": {"prebuilt_voice_config": {"voice_name": "Puck"}}}
}
}
}))
# Stream audio chunks bidirectionally
await ws.send(json.dumps({
"realtime_input": {"media_chunks": [{"data": audio_b64, "mime_type": "audio/pcm"}]}
}))
# Receive streaming response
response = await ws.recv()
print(json.loads(response))
asyncio.run(live_session())
Barge-in allows users to interrupt the model while it is still generating a response. The model detects the interruption, stops its current output, and immediately processes the new input — creating a natural, conversational experience.