Try free →
Pricing

Whisper API pricing in 2026: cheaper than OpenAI's official one

May 5, 2026 · 5 min read

If you've looked at OpenAI's transcription pricing recently, you saw $0.006/min for Whisper. That's almost 10× cheaper than LessRec's $0.05/min. So why does LessRec exist?

Because $0.006/min is the price of the model API call. It's not the price of getting a usable transcript. Once you account for what's actually included, the math flips for most use cases.

The all-in cost comparison

Imagine you're transcribing 1 hour of podcast audio (~60 minutes, ~50 MB MP3).

PathHeadline priceReal cost
OpenAI Whisper API (raw)$0.006/min × 60 = $0.36$0.36 + your dev time
OpenAI + your wrapper code$0.36$0.36 once your wrapper exists
LessRec$0.05/min × 60 = $3.00$3.00, all-in
Rev (auto)$0.25/min × 60 = $15$15
Otter Pro flat$20/mo$20 even if you transcribe 1 file

OpenAI is unambiguously cheapest per minute of API time. LessRec is 8× more. But "API time" is the cheap part of running a transcription service. Here's everything OpenAI doesn't include:

What OpenAI's $0.006/min skips

1. The 25 MB upload cap

OpenAI's Whisper endpoint caps each request at 25 MB. A 1-hour podcast at standard MP3 bitrate is ~50 MB — over the limit. Workarounds: compress to 64 kbps, or split the file. Both are doable but they're your code to write. LessRec accepts up to 1 GB per file (about 6 hours of audio).

2. .docx and .srt output

OpenAI returns JSON or plain text. If you want a Word doc with paragraph breaks, you write a converter. If you want SRT subtitles with the right timestamps, you write a converter. LessRec ships .txt, .docx, and .srt in the response.

3. Job queue and retry logic

OpenAI's API call is synchronous — you upload, wait (up to several minutes for long files), get the result. If the connection drops, you re-upload. Building a proper job queue (background worker, retry, idempotency keys, status polling) takes a developer 1-2 days of work. LessRec has all of this and exposes a GET /api/job/:id endpoint.

4. No web UI

Your end users can't drag-and-drop into OpenAI's API directly. You need a web app on top. LessRec is that web app.

5. No CPU-tier pricing

OpenAI runs Whisper on GPUs, charging GPU prices. LessRec runs Whisper large-v3 INT8-quantized on CPUs (faster-whisper) — same model, ~80% of the accuracy at 5-10% of the per-minute compute cost. Our $0.05/min covers all-in compute + storage + bandwidth + UI; OpenAI's $0.006/min is just the GPU API.

When OpenAI's API is the right call

You should use OpenAI's API directly if:

When LessRec is the right call

The hidden cost developers forget

Many developers see $0.006/min and start building. Then they hit:

By the time you ship a production-grade wrapper, you've spent 1-2 weeks of engineer time (~$5,000-15,000 fully loaded) and you're maintaining infrastructure forever. Whether that's worth it depends on your volume.

What LessRec actually offers (under the hood)

For transparency: we run Whisper large-v3 INT8 (faster-whisper port) on Hetzner CX43 dedicated servers. CPU-only inference at ~50-100× realtime. Job queue is SQLite + a simple Python worker. Storage is local disk with 7-day cleanup. Stripe for billing. Express for the API. The whole stack is ~1,500 lines of code maintained solo. We charge $0.05/min and the marginal cost per minute is ~$0.001, so the margin pays for the engineering time you'd otherwise spend.

Try LessRec API or web upload

10 free minutes, no signup. API key on request after first paid pack.

Try it →

FAQ

Do you have an API endpoint or just web upload?

POST /api/transcribe with multipart form-data, GET /api/job/:id for status, GET /api/job/:id/download for output. API keys are issued after first paid pack — email hello@lessrec.com.

Can I self-host Whisper instead?

Absolutely. faster-whisper is open source. On an M-series Mac it runs at 5-10× realtime (free, your machine). On a $40/mo Hetzner CX22 it runs at ~25× realtime. Worth it if you transcribe enough volume to justify the setup time.

Why CPU instead of GPU?

For Whisper large-v3 INT8, a CPU at 50-100× realtime costs $0.001/min in compute. A GPU at 200× realtime costs $0.005/min. Both are "instant" from the user's perspective. We chose CPU because it lets us undercut everyone else and still have margin.