[ ABORT TO HUD ]
SEQ. 1
SEQ. 2

Speech-to-Text (Transcription)

🎙️ Speech & Audio APIs 12 min 200 BASE XP

Audio Transcription Models

OpenAI offers multiple transcription models for converting speech to text, from the legacy Whisper to the new GPT-powered models.

Available Models (2026)

ModelQualitySpeedBest For
gpt-4o-mini-transcribeHighest accuracyFastProduction transcription (recommended)
gpt-4o-transcribeVery highMediumComplex audio, heavy accents
whisper-1GoodFastLegacy, basic transcription
const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream("meeting.mp3"),
  model: "gpt-4o-mini-transcribe",
  response_format: "verbose_json",  // Includes timestamps
  language: "en"
});
console.log(transcription.text);

Key Features

  • Timestamps: Get word-level or segment-level timing
  • Language Detection: Automatic or manual language specification
  • Translation: Translate non-English audio directly to English text
💡 Pro Tip: Use gpt-4o-mini-transcribe for best results. It significantly outperforms legacy Whisper on noisy audio, accented speech, and alphanumeric content (phone numbers, codes).
SYNAPSE VERIFICATION
QUERY 1 // 3
Which transcription model is recommended for production in 2026?
whisper-1
gpt-4o-mini-transcribe
gpt-4o-transcribe
speech-to-text-v2
Watch: 139x Rust Speedup
Speech-to-Text (Transcription) | Speech & Audio APIs — OpenAI Academy