How-to

How to transcribe a 4-hour podcast for $12 (Otter would charge $80)

May 5, 2026 · 4 min read

Long-form audio is where transcription gets expensive fast. A 4-hour podcast episode = 240 minutes. On Otter Pro you'd burn through your monthly 1,200-minute cap in 5 episodes. On Rev you'd pay $60 ($0.25/min auto, $1.50/min human). On Descript Pro you'd pay $24/mo plus overage.

Here's the cheapest reliable way to get clean transcripts for long files in 2026.

The cost comparison

Service	Cost for 240 minutes	Notes
Rev (auto)	$60	$0.25/min flat
Otter Pro overage	$20 base + extra	monthly subscription, caps at 1,200 min
Descript	$24/mo + overage	full editor, overkill if you don't edit video
OpenAI Whisper API	$1.44	$0.006/min, but you build the upload/queue/output yourself
LessRec	$12	$0.05/min, just upload

OpenAI's Whisper API is technically cheaper at $1.44, but it has a 25 MB file size cap, no UI, no .docx output, no progress tracking — you have to build the wrapper yourself. Worth it if you're a developer and have other reasons to integrate; not worth it if you just want a transcript.

The 5-step workflow

1. Compress your audio first (optional, smart)

Most people upload uncompressed WAV files (~600 MB for 4 hours). Whisper doesn't need that quality — it works just as well on 64 kbps MP3. Convert with ffmpeg:

ffmpeg -i source.wav -b:a 64k -ac 1 podcast.mp3

4 hours of audio at 64 kbps mono = ~110 MB. Uploads in 30 seconds on home internet, transcribes just as accurately.

2. Upload to LessRec

Drag the file into lessrec.com. No signup required for the first 10 minutes — but for a 4-hour file you'll need credits. Buy a $25 pack ($25 = 500 min) which leaves you with $13 of credit left after this transcript.

The uploader supports files up to 1 GB. For longer audio (8+ hours) split into 2-3 segments and upload separately.

3. Wait (about 5 minutes)

Whisper large-v3 INT8 on a CPU runs at roughly 50-100x realtime — meaning a 240-minute audio file transcribes in 3-5 minutes of wall-clock time, not 240. We notify by Telegram or just keep the page open.

4. Download outputs

You get three formats:

.txt — raw transcript, no timestamps
.docx — formatted, paragraph breaks, ready for Word edit
.srt — timestamped subtitles for embedding in video

5. (Optional) clean up speaker turns by hand

LessRec doesn't do speaker diarization yet — every speaker dumps into the same paragraph stream. For a single-host podcast that's fine; for a 2-host show you'll spend ~10 minutes adding speaker labels in your editor of choice. We're shipping diarization in the next 30 days; until then, this is the workflow.

Why this is "good enough" for most podcasts

The accuracy floor for Whisper large-v3 on clean podcast audio (single mic, no music bed, native English speakers) is around 95-97% word-accurate. That means roughly 1 wrong word per 30. Most of the errors are easily-spotted (homophones, technical terms, names) — exactly the words you'd review anyway when publishing show notes.

For interviews with multiple voices, distant mics, or accented English, accuracy drops to 88-93%. Still usable as a starting draft; budget 30-60 minutes of human cleanup per hour of audio if you publish word-for-word transcripts.

Where this falls apart

Court reporting / depositions: Need certified human reporter. Whisper is not legally admissible.
Multi-language switches mid-conversation: Whisper handles each chunk in its dominant language; quick code-switches confuse it.
Heavy music/SFX backgrounds: Run audio through a vocal isolator first (RX, Lalal.ai, or open-source spleeter).
Real-time: LessRec is upload-only. For live captions during a Zoom call, use Otter.

Try with 10 free minutes

Drop a file, see the accuracy on your actual audio.

Upload now →

FAQ

Will the file finish if I close the tab?

Yes. Once upload completes, the job runs server-side. You'll get a URL to come back to and download. We email you a link if you provide one (optional).

Can I run this on my own machine instead?

Yes — install faster-whisper and run Whisper large-v3 locally. On an M-series Mac it's about 5-10x realtime (slower than our optimized server but free). For occasional use this is great. For weekly long-form work, paying $0.05/min is cheaper than your time.

What about Hindi, Tamil, Cantonese, or other low-resource languages?

Whisper is trained on 100+ languages but accuracy varies. English/Spanish/French/Russian/Portuguese/German/Italian/Japanese/Mandarin/Korean — all very strong. Tamil/Bengali/Vietnamese — usable but expect more errors. Test with the 10 free minutes before committing.