Most transcription errors come from the audio chain, not the transcriber. A 20-minute cleanup pass on a difficult recording often saves 2-3 hours of editor time and reduces error rate by half. This guide is the practical playbook for what to fix yourself, what to leave to the transcription service, and the tools that actually work.
Why audio quality matters
Speech recognition models — the same Whisper-class models behind AI vs human review — train on clean speech. Every dB of background noise increases word error rate (WER) measurably. The math:
- Clean studio audio (SNR > 20 dB): 4-7% WER on a good model.
- Average meeting audio (SNR 10-15 dB): 10-18% WER.
- Phone codec + background (SNR < 10 dB): 25-40% WER.
SNR = signal-to-noise ratio. Each error is a word the human editor has to fix manually. Half a percent on a 5,000-word transcript is 25 fixes. Three percent is 150. The cleanup pass before sending is one of the few things on this list that compounds linearly: 20 min of work saves 2-3 hours downstream.
Fix these before sending
Worth doing yourself. None require pro tools.
- Normalize peaks to -3 dB. Free tools (Audacity → Effect → Normalize) bring quiet speakers up and prevent clipping on loud ones. Single most-impactful 30-second action.
- Apply mild noise reduction. Audacity’s built-in Noise Reduction with a 4-6 dB profile cut removes HVAC, computer fans, and most room hum. Stay below 10 dB or you start introducing artifacts.
- Trim silent intro/outro. The first 30 seconds are usually setup chatter, not content. Trim them — saves transcriber time orienting.
- Declip if needed. If the waveform shows flat-top clipping (loud peaks squared off), Audacity’s Clip Fix or iZotope RX’s De-clip restore some of the lost data. Better than nothing.
- Adobe Podcast Enhance Speech. Free web tool — drag a file in, get back a cleaned version. Aggressive enough that you only want to use it on truly bad source audio.
Leave these to the editor
Skip these — we have better tools for them on our side, and over-processing introduces worse artifacts than the original problem.
- Deep noise reduction. Anything beyond -10 dB profile starts to garble speech. We have iZotope RX with dialogue-specific models that do this without artifacts.
- Speaker separation. Don’t try to manually split crosstalk. The transcriber uses both AI-assisted separation tools and ears trained on overlap.
- Equalization. EQ tuning is dialect-specific and risk-prone. We do a light EQ pass during cleanup based on the recording device.
- Reverb removal. RX De-reverb is genuinely good but tricky to tune. Send the recording with reverb intact — we know how to dial it back without hollowing out consonants.
- Compression / leveling. Heavy compression makes things sound louder but bunches up dynamics. Light normalization is enough; leave the rest to us.
Tools that actually work
| Tool | Cost | What it does well |
|---|---|---|
| Audacity | Free | Normalize, light NR, declip, trim. The baseline. |
| Adobe Podcast (Enhance Speech) | Free / web | One-click broadcast quality. Works best on solo voice — multi-speaker results vary. |
| iZotope RX Elements | $129 | Dialogue-grade NR, de-reverb, declip. Worth it for journalists, lawyers, researchers who handle messy audio weekly. |
| Krisp | $8-15/mo | Real-time NR on calls. Best fix is at the source — record after running through Krisp. |
| Adobe Audition | $22.99/mo | Pro suite. Overkill unless you’re also editing video. |
| FFmpeg + arnndn filter | Free / CLI | Scriptable noise reduction. For programmers handling many files; not user-friendly. |
A 10-minute cleanup workflow
For an average meeting or interview recording, this sequence catches 90% of issues in under 10 minutes total:
- 0-1 min. Open the file in Audacity. Listen to the first 30 seconds and the last 30 seconds.
- 1-2 min. Select all → Effect → Normalize to -3 dB.
- 2-4 min. Select a 2-second segment of room tone only (no speech). Effect → Noise Reduction → Get Noise Profile. Select all → Noise Reduction with default settings (12 / 6 / 6).
- 4-6 min. Spot-check three random 30-second segments in the middle. Listen for over-processing artifacts.
- 6-8 min. Trim silent intro/outro. Add a 1-second fade-in and fade-out.
- 8-10 min. Export as WAV 16-bit / 44.1 kHz or M4A 192 kbps. Send to /upload.
For phone recordings, also run through Adobe Podcast Enhance Speech before step 2 — it’s the only thing that meaningfully recovers narrowband phone audio.
When to rerecord
Audio cleanup has a floor. If the recording has any of these properties, no amount of processing will save it — better to schedule a re-record if you can:
- Speaker more than 3 meters from the mic in a reverberant room.
- Audio level peaking continuously at 0 dB (hard clipping, not just loud).
- Background music or TV at conversation volume in the same frequency range as speech.
- Wireless mic dropouts more than once per minute.
- Phone codec audio of a key speaker you must quote verbatim.
For court-admissible work, the standard is higher — see court transcript mistakes. Verbatim certified transcripts of bad audio are not impossible but the cost climbs steeply.
FAQ
Should I run noise reduction before sending?
Light NR helps. Aggressive NR introduces artifacts the transcriber will hear as garbled words — counterproductive.
Does normalizing the audio help?
Yes — normalize peaks to -3 dB so quiet speakers are audible without clipping loud ones.
Can the transcriber recover audio I can’t hear?
Sometimes. We have iZotope RX and other tools but they can’t recreate information that isn’t there.
Does cleanup add to the transcript cost?
Mild cleanup is built into the multi-speaker tier. Aggressive cleanup shows up as +50% audio-quality surcharge.
Have a recording? Send it.
Upload audio or video. We’ll send a transparent estimate within an hour and confirm the deadline before you pay.
Upload audio