← Back to Dashboard
1. Governance & the Linux Foundation2. MCP Gateways & Proxies3. Enterprise Security & Audit4. Multimodal & Audio Content

Multimodal & Audio Content

📚 MCP in 20267 min120 XP

Beyond Text and Images

The 2025-2026 spec expansions added support for audio content blocks, enabling MCP servers to interface with voice analysis, transcription, and Text-to-Speech (TTS) APIs.

Audio Content Blocks

// Returning audio from a TTS tool:
server.tool("text_to_speech", "Convert text to speech",
  { text: z.string(), voice: z.string().optional() },
  async ({ text, voice }) => {
    const audioBuffer = await ttsEngine.synthesize(text, voice);
    return {
      content: [{
        type: "audio",
        data: audioBuffer.toString("base64"),
        mimeType: "audio/wav"
      }]
    };
  }
);

Content Block Types (2026)

TypeUse CaseFormat
textResponses, logs, dataPlain text / markdown
imageCharts, screenshots, photosBase64 PNG/JPEG/WebP
audioTTS, voice analysis, recordingsBase64 WAV/MP3/OGG
resourceEmbedded resource referencesURI + text/blob
💡 Key Insight: Audio support opens MCP to voice-first applications — imagine an AI assistant that can listen to a meeting recording via MCP, transcribe it, and create action items.
🧪 Knowledge Check
Press 1-4 to select1 of 3
What new content modality was added to MCP in 2025-2026?
Video streaming
Audio content blocks
3D models
Holographic data
Watch: 139x Rust Speedup
Multimodal & Audio Content | MCP in 2026 — MCP Academy