Whisper
Speech Recognitionby OpenAI
OpenAI's automatic speech recognition model that transcribes and translates audio in multiple languages with high accuracy.
Registration & API Key Steps
Create OpenAI Account
Visit platform.openai.com and sign up. Same account for all OpenAI APIs.
Open linkUse Free Credits or Add Payment
New accounts get $5 free credits (3 months). At $0.006/min, that is ~833 minutes of transcription.
Pricing
| Tier | Price | Features |
|---|---|---|
| Whisper | $0.006 / minute | High accuracy, Multi-language, Translation to English |
| GPT-4o Transcribe | $0.006 / minute | Better accuracy, Speaker diarization, Better punctuation |
| GPT-4o Mini Transcribe | $0.003 / minute | 50% cheaper, Good accuracy, Best for bulk |
Application Tips
Self-Host for High Volume
Whisper is open-source. Break-even at ~500 hours/month vs API.
Use GPT-4o Mini for Savings
GPT-4o Mini Transcribe costs 50% less at $0.003/min with good accuracy.
Max 25MB File Size
Supports mp3, mp4, wav, webm etc. Max 25MB. Split longer files.
China Access Solutions
API Proxy
Use third-party proxy services for Whisper API.
Self-Host Whisper
Deploy open-source Whisper locally. No API needed.
Code Example
import OpenAI from 'openai';
import fs from 'fs';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function transcribeAudio() {
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream('audio.mp3'),
model: 'whisper-1',
language: 'en',
response_format: 'verbose_json',
});
console.log('Text:', transcription.text);
}
transcribeAudio();Rate Limits
| Tier | Limits |
|---|---|
| Tier 1 | 50 RPM |
| Tier 3+ | 500+ RPM |
Recommended Use Cases
Related API Guides
OpenAI GPT-4o / GPT-4.1 / o3
OpenAI
OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.
Anthropic Claude (Sonnet 4.5 / Opus 4.5)
Anthropic
Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.
Google Gemini (2.5 Pro / 2.5 Flash)
Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.
Meta Llama 4 (Scout / Maverick)
Meta
Meta's open-source Llama 4 models are free to use and available through multiple cloud providers. Llama 4 Scout and Maverick offer competitive performance at extremely low cost through partner APIs.