How to transcribe a 4-hour podcast for $12 (Otter would charge $80)
Long-form audio is where transcription gets expensive fast. A 4-hour podcast episode = 240 minutes. On Otter Pro you'd burn through your monthly 1,200-minute cap in 5 episodes. On Rev you'd pay $60 ($0.25/min auto, $1.50/min human). On Descript Pro you'd pay $24/mo plus overage.
Here's the cheapest reliable way to get clean transcripts for long files in 2026.
The cost comparison
| Service | Cost for 240 minutes | Notes |
|---|---|---|
| Rev (auto) | $60 | $0.25/min flat |
| Otter Pro overage | $20 base + extra | monthly subscription, caps at 1,200 min |
| Descript | $24/mo + overage | full editor, overkill if you don't edit video |
| OpenAI Whisper API | $1.44 | $0.006/min, but you build the upload/queue/output yourself |
| LessRec | $12 | $0.05/min, just upload |
OpenAI's Whisper API is technically cheaper at $1.44, but it has a 25 MB file size cap, no UI, no .docx output, no progress tracking — you have to build the wrapper yourself. Worth it if you're a developer and have other reasons to integrate; not worth it if you just want a transcript.
The 5-step workflow
1. Compress your audio first (optional, smart)
Most people upload uncompressed WAV files (~600 MB for 4 hours). Whisper doesn't need that quality — it works just as well on 64 kbps MP3. Convert with ffmpeg:
ffmpeg -i source.wav -b:a 64k -ac 1 podcast.mp3
4 hours of audio at 64 kbps mono = ~110 MB. Uploads in 30 seconds on home internet, transcribes just as accurately.
2. Upload to LessRec
Drag the file into lessrec.com. No signup required for the first 10 minutes — but for a 4-hour file you'll need credits. Buy a $25 pack ($25 = 500 min) which leaves you with $13 of credit left after this transcript.
The uploader supports files up to 1 GB. For longer audio (8+ hours) split into 2-3 segments and upload separately.
3. Wait (about 5 minutes)
Whisper large-v3 INT8 on a CPU runs at roughly 50-100x realtime — meaning a 240-minute audio file transcribes in 3-5 minutes of wall-clock time, not 240. We notify by Telegram or just keep the page open.
4. Download outputs
You get three formats:
.txt— raw transcript, no timestamps.docx— formatted, paragraph breaks, ready for Word edit.srt— timestamped subtitles for embedding in video
5. (Optional) clean up speaker turns by hand
LessRec doesn't do speaker diarization yet — every speaker dumps into the same paragraph stream. For a single-host podcast that's fine; for a 2-host show you'll spend ~10 minutes adding speaker labels in your editor of choice. We're shipping diarization in the next 30 days; until then, this is the workflow.
Why this is "good enough" for most podcasts
The accuracy floor for Whisper large-v3 on clean podcast audio (single mic, no music bed, native English speakers) is around 95-97% word-accurate. That means roughly 1 wrong word per 30. Most of the errors are easily-spotted (homophones, technical terms, names) — exactly the words you'd review anyway when publishing show notes.
For interviews with multiple voices, distant mics, or accented English, accuracy drops to 88-93%. Still usable as a starting draft; budget 30-60 minutes of human cleanup per hour of audio if you publish word-for-word transcripts.
Where this falls apart
- Court reporting / depositions: Need certified human reporter. Whisper is not legally admissible.
- Multi-language switches mid-conversation: Whisper handles each chunk in its dominant language; quick code-switches confuse it.
- Heavy music/SFX backgrounds: Run audio through a vocal isolator first (RX, Lalal.ai, or open-source
spleeter). - Real-time: LessRec is upload-only. For live captions during a Zoom call, use Otter.
FAQ
Will the file finish if I close the tab?
Yes. Once upload completes, the job runs server-side. You'll get a URL to come back to and download. We email you a link if you provide one (optional).
Can I run this on my own machine instead?
Yes — install faster-whisper and run Whisper large-v3 locally. On an M-series Mac it's about 5-10x realtime (slower than our optimized server but free). For occasional use this is great. For weekly long-form work, paying $0.05/min is cheaper than your time.
What about Hindi, Tamil, Cantonese, or other low-resource languages?
Whisper is trained on 100+ languages but accuracy varies. English/Spanish/French/Russian/Portuguese/German/Italian/Japanese/Mandarin/Korean — all very strong. Tamil/Bengali/Vietnamese — usable but expect more errors. Test with the 10 free minutes before committing.