OpenAI offers multiple transcription models for converting speech to text, from the legacy Whisper to the new GPT-powered models.
| Model | Quality | Speed | Best For |
|---|---|---|---|
| gpt-4o-mini-transcribe | Highest accuracy | Fast | Production transcription (recommended) |
| gpt-4o-transcribe | Very high | Medium | Complex audio, heavy accents |
| whisper-1 | Good | Fast | Legacy, basic transcription |
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream("meeting.mp3"),
model: "gpt-4o-mini-transcribe",
response_format: "verbose_json", // Includes timestamps
language: "en"
});
console.log(transcription.text);
gpt-4o-mini-transcribe for best results. It significantly outperforms legacy Whisper on noisy audio, accented speech, and alphanumeric content (phone numbers, codes).