The 2025-2026 spec expansions added support for audio content blocks, enabling MCP servers to interface with voice analysis, transcription, and Text-to-Speech (TTS) APIs.
// Returning audio from a TTS tool:
server.tool("text_to_speech", "Convert text to speech",
{ text: z.string(), voice: z.string().optional() },
async ({ text, voice }) => {
const audioBuffer = await ttsEngine.synthesize(text, voice);
return {
content: [{
type: "audio",
data: audioBuffer.toString("base64"),
mimeType: "audio/wav"
}]
};
}
);
| Type | Use Case | Format |
|---|---|---|
| text | Responses, logs, data | Plain text / markdown |
| image | Charts, screenshots, photos | Base64 PNG/JPEG/WebP |
| audio | TTS, voice analysis, recordings | Base64 WAV/MP3/OGG |
| resource | Embedded resource references | URI + text/blob |