The gpt-4o-mini-tts model generates natural-sounding speech with unprecedented control over tone, emotion, and delivery style.
const audio = await openai.audio.speech.create({
model: "gpt-4o-mini-tts",
voice: "coral",
input: "Welcome to the Infinity Tech Stack Academy!",
instructions: "Speak with enthusiasm and energy, like a tech conference host.",
response_format: "mp3"
});
Unlike traditional TTS that just reads text flatly, gpt-4o-mini-tts accepts instructions that control HOW it speaks — tone, pacing, emotion, accent emphasis.
| Voice | Character |
|---|---|
| alloy | Neutral, balanced |
| echo | Warm, conversational |
| fable | Expressive, storytelling |
| onyx | Deep, authoritative |
| nova | Friendly, upbeat |
| shimmer | Soft, calm |
| coral | Clear, professional |
For ultra-low latency voice applications, the Realtime API maintains a persistent WebSocket connection for bidirectional audio streaming with gpt-realtime-1.5.