Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


Repairing Spoken-Word Recordings with iZotope Rx 8 Advanced – A Real-World Review

Producer/engineer Rob Tavaglione explains how he repairs spoken-word recordings with iZotope Rx 8 Advanced in this real-world review.

iZotope Rx Advanced 8 offers a number of modules for cleaning up spoken-word material, including (seen here) De-plosive, Spectral Repair, Voice-De noise, Dialogue De-reverb, Breath Control and Mouth De-click.
iZotope Rx 8 Advanced offers a number of modules for cleaning up spoken-word material, including (seen here) De-plosive, Spectral Repair, Voice-De noise, Dialogue De-reverb, Breath Control and Mouth De-click.

With hosts, guests and talent comfortably seated in your purpose-built and acoustically-treated  isolation room, the discussions are easily recorded with minimal bleed, defined clarity and no intrusion or distraction from unwanted sounds and noise. Yeah right! The truth is that for most of us voice recordists producing podcasts, audio books and interviews our audio is polluted with unwanted sounds of numerous varieties that must be prevented or removed if we expect rapt attention to the content. It can be hard to stay focused on dialogue when competing voices, air conditioning, ground hums, noisy appliances, passing trucks, airplanes, sibilance, plosives, ticks, clicks and massive vortexes (actually breaths, amplified to ridiculous levels) are stealing our attention. Luckily today’s digital, analytical, often aided by machine-learning software programs (and certain hardware pieces) are capable of not only mitigating, but downright removing extraneous noise. I’ll be using iZotope Rx 8 Advanced premium software in my examples, but there a number of competing programs that accomplish the same goals, in sometimes similar manners.

Before we delve into fixes, some effort should be spent making sure we are capturing the best  audio we can before applying processing. The cleanest and purest signal possible will ensure less severe processing, more successful processing and undetectability.


Capturing the cleanest signal possible helps ensure better processing results later on. Start with a room that allows spacing between people, with each speaker ideally seated in a circular pattern.
[/media-credit] Capturing the cleanest signal possible helps iZotope Rx 8 Advanced create better processing results later on. Start with a room that allows spacing between people, with each speaker ideally seated in a circular pattern.

The Room Is Key

Start with a room that allows spacing between the persons, with each speaker in the null(s) of the other’s mic(s) and seated in a circular pattern for large groups, which encourages interaction, allows more visual communication and rejects mic bleed. Make sure the room is devoid of anything that creates sound, including fridges, air purifiers, computers, cell phones or even hummy DC adaptor “wall warts”. Seal up windows, cover them with blankets, seal up doorways with weather-proofing (whether an interior or exterior door), dampen air ducts, reduce unwanted room ambience with acoustic treatments (ie. foam/fiberglass/cloth absorbent panels) and place diffusors on the closest walls to scatter sound waves and stop the dull muddiness that results from too much absorptive treatment in a small space.

All speaking talent will need closed-back headphones that don’t leak sound excessively, a pop filter on their mic to reduce plosives and enough room to back-off of the mic when needed. It is wise to have water available in glasses (plastic bottles can be noisy), a notepad (to aid memory and reduce unwanted interjections) and clean cloths (to dampen sneezing/coughing or other personal disasters).

Basic Mixing

Some basic mixing technique may be corrective enough to reduce any noise problems to the inconsequential. The big three tools here are equalization, compression/limiting and automation.

Assuming you’ve captured clean mic signal, sometimes a little filtering is all we need for high fidelity. If you used a bright condenser mic realize that too much crispy, treble-y, high-end definition can be irritating (if nearly painful). If so, employ either a high-frequency shelf somewhere around 8 kHz (reducing/attenuating three or four dB, or to taste) or engage a low-sloped low-pass filter somewhere between 12 to 18 kHz to remove the really high stuff.

Conversely, much of the environmental noise causing us trouble is found down in the low frequencies (ie. heating/cooling rumble, passing jets, appliance rumble, foot steps, traffic noise, the mic placed too close etc.) so a high-pass filter is essential. Engaging one around 80 Hz will remove noise without reducing “chesty-ness” but don’t be afraid to filter up to almost 200 Hz if severe low-frequency noise is persistent. The cumulative positive effects across multiple mics can be amazingly effective!

Many trained speakers are quite talented at maintaining ideal levels; knowing how to “stage whisper” or “half yell” for maximum effect without issue. Most people are terrible at such skills, so try to capture voice with a touch of compression to smooth out levels as you record. If your minimal set-up doesn’t allow this consider getting a mic preamp and a compressor (or all in one), or the purchase of a modern recording interface that allows software-based compression as you record. Then, still compress the track liberally in your DAW and apply limiting (severe compression) when stray peaks are stubbornly popping into the red.

Perhaps most importantly, you should automate levels in key moments to maintain consistent thematic focus. Like riding a physical fader for volume, automation can truly feature the right persons at the right moments; I aim to reduce volume on any given track that is not currently in use (or is relegated to only “ok’s” and “uh-huh’s) by about -4 to -8 dB. This maintains a consistent “air” and “presence” even as speakers take turns, but can still focus attention where you want it. If your noise problems were minimal to start, with only occasional major disruptions (ie. sirens or sneezes), such automation might be enough to fore-go the use of any corrective software.


Software Handles the Rest

If problems still persist, there is more we could do with surgical editing, extreme EQ and extreme automation, but why dig that deep when intelligent software can do the job more quickly, with less effort and with likely better results.

The HVAC of summer and winter makes more noise than is acceptable to reach Amazon/Audible’s audio book technical standards, so I clean each voice track with iZotope Rx 8 Advanced using the Voice De-noise module. This process requires a brief sample of the noise problem, so I use few seconds of pre-roll audio for analysis. Once De-noise has “learned” both the noise and the vocal timbre it can neatly remove noise for the entirety of the track. I often find -12 dB of reduction (set for Dialogue and Gentle) to be sufficient to achieve noise-floor standards, but a second less-intense pass can be done for bad problems. Leave a few seconds of each track’s pre-roll noisy so you can clearly verify the improvement the processing has made and check for “liquid” artifacts (if so, you’ll need to undo and process less severely). Remember to clean each track with its own uniquely learned noise profile, as there can are significant differences from track to track.

Popping P plosives and prominent breaths are horrible when monitored via today’s subwoofers and earbuds, so make sure you’ve removed all of them … or should I say reduced them to proper levels, as P’s, B’s and W’s require a little burst of air to function. The De-plosive module sure is quick and effective. Simply highlight the offensive plosives only (they are usually easily seen, with a big ole wave of low-frequency energy in the waveform) and adjust Sensitivity and Strength parameters dependent on severity of the issue. A second pass can be done, but you’ll seldom need it. De-breath is similarly useful in that it does a great job of moderating the problem without the stark removing of breath(ing). Beats the heck out of severe editing and placing “room tone” in the gaps!

The opposite of plosives, excessive sibilance can make S’s, C’s and K’s sound like little knives in your ears and they only get worse with bad earbuds and distorted playback devices. Treble reduction won’t cut it and side-chained compression is too complicated, but many de-esser plug-in’s will quite effectively manage sibilance. However, since Rx8 has a De-ess module it’s faster to make the fixes there. You’ll likely only need adjustments to Threshold and Cut-Off Frequency to get great results. Like the De-plosive module, only process the moments with problems, not the whole track top-to-tail like with Voice De-noise.

If you just couldn’t get the room right, or recorded in a big reverby space, the Dialogue De-reverb module works miracles. More importantly, I have found that this module works quite well at reducing short ambiences that are far quicker than reverb tails, in places like bedrooms, offices and meeting rooms. Some expertise and experimenting will be in order here with adjustments to Reduction, Sensitivity and Ambience Preservation and stereo “linkage”. Fear not, with a little practice it works way better than it ought to.

Austrian Audio Hi-X50 and Hi-X55 Headphones – A Real-World Review

BABY Audio Parallel Aggressor Plug-In—A Real-World Review

De-hum and De-rustle are both effective at their respective eponymous functions and easy enough to use, but Mouth De-click deserves the MVP award. My bane are those nasty, irritating mouth clicks/snaps that permeate vocal tracks from under-hydrated talent, so I’m elated that these modules actually work! You won’t need to adjust the Sensitivity or Click Widening that much, but do expect to require two, even three passes, on the worst offenders.

I routinely use all of the above modules to achieve compliance, but sometimes situations call for even more aggressive processing. Spectral Repair allows the targeted removal and replacement of unwanted sonic events (closing doors, dropped objects, sneezes etc.) in accordance with the time domain and the frequency domain. That is, you can select a lasso tool and neatly draw around the problem noise using a waveform and a spectrograph to clearly see the issue, then either simply attenuate it or replace the problem with audio from before or after the disturbance. Once you’ve experimented with Strength and Bands you’ll find this easier than my descriptions and miraculous.

There are three other highly specialized modules that use machine-learning for the odd difficult tasks that you may rarely if ever, encounter. For multi-location productions or (especially) interviews where the locations sound distractingly different, Ambience Match is a life-saver. You’ll need to teach the track the ambience you desire from another track, but it’s worth the effort to reduce such issues. If you receive audio from another location that is digitally ruined (ie. lossy encoding or low sample rate) Spectral Recovery can actually replace lost high-end information intelligently. Finally, Dialogue Isolate can save really poor audio (surely not recorded by you!) that suffers too much background noise. Careful adjustments must be made to Sensitivity and Ambience Preservation and multiple passes may be required, but this was your last stop … if your voice audio isn’t clean enough after all this you might need to refocus on better tracking.

One For the Road

It takes a lot of effort to ensure voice audio good enough that the audio itself becomes a non-factor, invisible and not even thought of. It is only then that we can truly maximize the goal of enabling communication and conveying ideas. You know that you’ve run your sessions right and used iZotope Rx 8 Advanced to edit/process/clean properly when all anyone can talk about is the actual content itself.

iZotope Rx 8 Advanced •