Whisper API pricing in 2026: cheaper than OpenAI's official one
If you've looked at OpenAI's transcription pricing recently, you saw $0.006/min for Whisper. That's almost 10× cheaper than LessRec's $0.05/min. So why does LessRec exist?
Because $0.006/min is the price of the model API call. It's not the price of getting a usable transcript. Once you account for what's actually included, the math flips for most use cases.
The all-in cost comparison
Imagine you're transcribing 1 hour of podcast audio (~60 minutes, ~50 MB MP3).
| Path | Headline price | Real cost |
|---|---|---|
| OpenAI Whisper API (raw) | $0.006/min × 60 = $0.36 | $0.36 + your dev time |
| OpenAI + your wrapper code | $0.36 | $0.36 once your wrapper exists |
| LessRec | $0.05/min × 60 = $3.00 | $3.00, all-in |
| Rev (auto) | $0.25/min × 60 = $15 | $15 |
| Otter Pro flat | $20/mo | $20 even if you transcribe 1 file |
OpenAI is unambiguously cheapest per minute of API time. LessRec is 8× more. But "API time" is the cheap part of running a transcription service. Here's everything OpenAI doesn't include:
What OpenAI's $0.006/min skips
1. The 25 MB upload cap
OpenAI's Whisper endpoint caps each request at 25 MB. A 1-hour podcast at standard MP3 bitrate is ~50 MB — over the limit. Workarounds: compress to 64 kbps, or split the file. Both are doable but they're your code to write. LessRec accepts up to 1 GB per file (about 6 hours of audio).
2. .docx and .srt output
OpenAI returns JSON or plain text. If you want a Word doc with paragraph breaks, you write a converter. If you want SRT subtitles with the right timestamps, you write a converter. LessRec ships .txt, .docx, and .srt in the response.
3. Job queue and retry logic
OpenAI's API call is synchronous — you upload, wait (up to several minutes for long files), get the result. If the connection drops, you re-upload. Building a proper job queue (background worker, retry, idempotency keys, status polling) takes a developer 1-2 days of work. LessRec has all of this and exposes a GET /api/job/:id endpoint.
4. No web UI
Your end users can't drag-and-drop into OpenAI's API directly. You need a web app on top. LessRec is that web app.
5. No CPU-tier pricing
OpenAI runs Whisper on GPUs, charging GPU prices. LessRec runs Whisper large-v3 INT8-quantized on CPUs (faster-whisper) — same model, ~80% of the accuracy at 5-10% of the per-minute compute cost. Our $0.05/min covers all-in compute + storage + bandwidth + UI; OpenAI's $0.006/min is just the GPU API.
When OpenAI's API is the right call
You should use OpenAI's API directly if:
- You're a developer integrating transcription into another product (chatbot, voice notes app, video editor)
- You can write the upload-chunking and job-queue code (~1-2 days)
- You're already paying for OpenAI for other reasons (saves you a vendor)
- You need very low latency (~real-time) and your audio is small (<25 MB chunks)
- You expect 10,000+ minutes per month of volume — at that scale, the $0.044/min savings adds up to $440+/mo, which is more than enough to justify the engineering work
When LessRec is the right call
- You're an end user (podcaster, journalist, lawyer, course creator) who doesn't want to write code
- You transcribe long files (1-6 hours each) and don't want to chunk them by hand
- You need .docx or .srt output without writing converters
- You don't want a $20/mo subscription if you only transcribe a few hours per month
- Volume is <1,000 minutes/month — the engineering time to build your own wrapper would cost more than 2 years of LessRec usage
The hidden cost developers forget
Many developers see $0.006/min and start building. Then they hit:
- Files > 25 MB → need ffmpeg-based chunking with overlap to preserve timestamps across chunks
- Long jobs → need background queue (BullMQ / Sidekiq / Celery), Redis, retry storms
- Customer needs SRT → need to convert OpenAI's segments to SRT (handle timestamp accuracy, formatting)
- Customer needs speaker labels → OpenAI doesn't do diarization at all; need pyannote.audio or alternative, GPU costs explode
- Storage → uploaded files need to live somewhere (S3) until transcription completes; bandwidth costs
By the time you ship a production-grade wrapper, you've spent 1-2 weeks of engineer time (~$5,000-15,000 fully loaded) and you're maintaining infrastructure forever. Whether that's worth it depends on your volume.
What LessRec actually offers (under the hood)
For transparency: we run Whisper large-v3 INT8 (faster-whisper port) on Hetzner CX43 dedicated servers. CPU-only inference at ~50-100× realtime. Job queue is SQLite + a simple Python worker. Storage is local disk with 7-day cleanup. Stripe for billing. Express for the API. The whole stack is ~1,500 lines of code maintained solo. We charge $0.05/min and the marginal cost per minute is ~$0.001, so the margin pays for the engineering time you'd otherwise spend.
Try LessRec API or web upload
10 free minutes, no signup. API key on request after first paid pack.
Try it →FAQ
Do you have an API endpoint or just web upload?
POST /api/transcribe with multipart form-data, GET /api/job/:id for status, GET /api/job/:id/download for output. API keys are issued after first paid pack — email hello@lessrec.com.
Can I self-host Whisper instead?
Absolutely. faster-whisper is open source. On an M-series Mac it runs at 5-10× realtime (free, your machine). On a $40/mo Hetzner CX22 it runs at ~25× realtime. Worth it if you transcribe enough volume to justify the setup time.
Why CPU instead of GPU?
For Whisper large-v3 INT8, a CPU at 50-100× realtime costs $0.001/min in compute. A GPU at 200× realtime costs $0.005/min. Both are "instant" from the user's perspective. We chose CPU because it lets us undercut everyone else and still have margin.