[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3
SEQ. 4

Speech Services & Voice Live

🔧 Foundry Tools (AI Services) 7 min 60 BASE XP

Speech-to-Text & Real-Time Voice

Foundry's Speech services enable voice-powered AI applications with high-quality transcription and synthesis.

Speech Capabilities

ServiceFunctionKey Features
Speech-to-TextTranscribe audio to textReal-time & batch, 100+ languages, custom models
Text-to-SpeechConvert text to natural speech400+ neural voices, custom voice cloning
Voice LiveReal-time speech-to-speechFully managed runtime, noise suppression, barge-in (New in 2026)
Speaker RecognitionIdentify speakers by voiceVerification and identification modes

Building Voice-Enabled Agents

Combine Speech services with the Agent Service to build voice-controlled AI assistants. With the 2026 Voice Live integration, this is easier than ever:

  1. User speaks → Voice Live captures audio, handling noise suppression natively
  2. Direct integration → Sent to Foundry Agent (e.g. GPT-4o Audio) for processing
  3. Agent response → Voice Live streams synthesis immediately
  4. User can interrupt ("barge-in") seamlessly
🎯 Pro Tip: Use the fully managed Voice Live runtime for interactive conversational agents rather than building custom STT/TTS pipelines. This natively handles complex edge cases like user interruptions ("barge-in") and echo cancellation.
FOUNDRY VERIFICATION
QUERY 1 // 2
What is the primary advantage of the new Voice Live runtime for agents?
It provides video streaming
It natively handles real-time speech-to-speech with features like barge-in and noise suppression
It translates text faster
It reduces token costs by 50%
Watch: 139x Rust Speedup
Speech Services & Voice Live | Foundry Tools (AI Services) — Azure Foundry Academy