To the uninitiated, recording an announcer or “voice-over” artist
would seem to be relatively simple compared to other things audio. But
for those who have done it, it’s a creative/technical task not to be
taken lightly. Speech sounds are harmonically and dynamically
complicated because of the way vocals are produced-by the chest; lungs;
diaphragm; larynx; the oral cavity, including the tongue, hard and soft
palates, the teeth and lips, the nasal cavities; and by the dynamic
interaction of all those elements through time. Explosions of air
bursting from the mouth, the lips and tongue can sound wet, and “sss”
sounds can overmodulate a track.
Voice-over artists understand these factors, and the best know how
to use them to produce their own voice character. As professionals,
they can be counted on to back off for louder passages, to suppress
hard plosives like P and T, and to stay a consistent distance and angle
from the business end of the mic. Nonetheless, the engineer on the
other side of the glass has to have a keen ear, a good technical
understanding of how to capture the voice cleanly and a well-developed
sense of how to interact with both the Vo artist and the clients.
Mix interviewed five people (see “The Vo Panel” sidebar) who
record voice-overs and who edit and mix radio, TV commercials and
long-format programs, such as documentaries for The History
What are the defining characteristics of a good voice-over
Michael Mason: Vos have to be very present, which starts with
the acoustics of the room. You want an absolutely dead room, and you
want it large enough so that any reflected sound has had a chance to
travel out a way, and then return. You want it dead because, while it’s
always possible to make a sound more “live,” as yet there’s no
James von Buelow: We’re looking for people who are good
storytellers and who have a voice that’s not too sibilant or too dull.
Mouth noises and the like can be taken care of during editing, even
little clicks in the middle of words. But you’ve go to start with a Vo
artist who has a voice that possesses clarity and a pleasant
James von Buelow
Joe Casalino: It’s clean and quiet, with suppressed mouth
noises, not overmodulated and not overly compressed.
Wouter van Herwerden: You want as little external
interference as possible- just a clean feed from the mic, which lets
you treat it however you want to in post. You don’t want the talent to
get too close and crowd the mic, for then you risk certain distortions.
These include popping, mouth noise and too much modulation of the
diaphragm of the mic.
Could you describe your announce studio?
Von Buelow: It has a floating floor and walls, so it’s very
quiet for a New York booth. It’s a prefabricated, about 7 by 10 feet
with an 8-foot ceiling, so it’s fairly small. The air-conditioning
piped in there can’t be heard when it kicks in. Acoustically, the
booth’s pretty dead, with fabric-covered walls. It’s so dead, in fact,
that I’m often surprised when I go in there just how softly the talent
is speaking, even when they appear to be loud when heard through the
Casalino: It’s 8 by 12 feet with a 9-foot ceiling. It was
custom-built, with 6- inch multilayered walls that float, and with a
floating floor and acoustically sealed door. The window facing the
control room has triple half-inch panes, so I can work at reasonably
high levels. There’s also a window in the studio looking uptown toward
the Empire State Building.
Van Herwerden: The dimensions are about 10 feet by 25 feet
with acoustic panels on the walls and ceilings, as well as carpet on
the floor. There are also fixed diffusors to scatter sound and suppress
What microphone and preamp combination do you
Butler: On the East Coast, a typical mic configuration is a
[Neumann] U87 with a hunk of foam over it. I’ve found this to muffle
the sound, so I tend to use as little pop filtering as possible. A
nylon screen is about as far as I’ll go. like U87s on females and
thin-voiced guys. But I prefer a Sennheiser 416, which has a lot more
punch to it. It’s my primary Vo mic. I hate console mic pre’s. I like
Focusrite preamps, The Gold Channel from TC Electronic and the
Mason: I use, on average, three different mics—a U87, a
Sennheiser 416 and a Neumann U89. Basically you want a very quiet mic
that allows you to get a lot of noise-free gain out of the mic preamp,
which means a gain setting of no more than 45 dB while recording
conversational-level speech. With an 87, I don’t use the highpass
filter, leaving that to the controls on the console. I don’t use
outboard preamps because the mic preamps in the Euphonix console are
awesome. All the EQ and dynamics are just outstanding.
Von Buelow: We do about 95% of our work with a Neumann 89,
leaving it flat. I use a Millennia Media Model HV3, with the gain at
about 12 o’clock and no filtering.
Casalino: It’s a Neumann U87 in cardioid, to a Focusrite
Green preamp and compressor, patched into the 02R console, where it’s
bused directly into the AudioFile.
Van Herwerden: The mic we use most of the time is the
Sennheiser 416. We also have a Neumann 87 here. It’s very sensitive and
has a wide pickup in the cardioid field, so there’s more likelihood of
it picking up the room acoustic than if you use a short rifle, like the
416. It’s not that sonically I prefer the 416 to the Neumann, but it’ll
give me a cleaner voice sound. We’re trying to a get a specific vocal
sound that will cut through whatever else is going on in the track
without having to do a lot of extra processing. We’re using a TC
Electronic Gold Channel [for the mic pre’s]. The Euphonix CS2000
consoles we employ also have preamps, but we prefer the TC Golds.
They’re pleasing sonically, they have more headroom, and they have some
builtin features, which are nice.
How do you position the mic relative to the
Butler: The mic’s capsule is right on line with the talent’s
mouth and parallel to his or her face. I’d say from their lips to the
actual capsule is about 4 inches. The pop screen is about 1.5 inches or
so from the capsule, and then it’s about 1.5 inches from the pop screen
to their mouth. If “P pops” are a problem, which they can be with a
U87, I’ll put it into figure-eight or omni. The broader the pattern,
the less popping. Sometimes I’ll do that if it’s relatively
tight-miked, quiet Vo where I won’t get too much bounce around the
room. If the 87 is inverted and comes in from above on a boom, you’ll
get a slightly brighter pickup than if the mic is used upright.
Mason: On both a U87 and 416, I’ll mount the mic on a boom
coming in from above. They’re generally about 6 inches away. I prefer
to have the mic capsule’s lower edge on line with their mouth but just
above their upper lip. Using the 416, you’ve got to back up a little
more, because it’s a shotgun. I use a nylon pop filter to avoid pops.
If that doesn’t work, I’ll angle the mic off to the side to suppress
popping, though you have to be careful off-axis, because that does
change the sound.
Von Buelow: In long-form work, because they’re sitting down,
it’s maybe 6 to 8 inches from their mouth, and for commercial work, it
varies. It will go from very close-3 to 4 inches-to maybe a foot for a
very loud speaker. For the latter, I would tend to use the mic’s pad to
protect the front end of the mic.
Casalino: It’s to the talent’s side, turned toward the
talent, at about a 40-degree angle from straight on to the mouth.
They’re not talking directly into it, which can help with “pops.”
Generally, it’s about 6 inches away. I often use a nylon pop filter
they work right up against.
Van Herwerden: You try to come in from the side and get it
reasonably close without intruding too much on their space. I angle it
about 20 degrees from their mouth axis and place it 2 to 4 inches away.
I get good presence that way, without excessive danger of pops. Since
the acceptance angle of the 416 is probably about 20 to 30 degrees, the
artist has to stay “on mic” to maintain consistent results, but that’s
not a problem with pros.
How do you set up the gain structure of your system when
recording a Vo track and when doing the mix?
Butler: Because I want to preserve headroom, I record
significantly lower-probably 10 dB lower-on recording a Vo than I do on
the final mix. Voice actors tend to become popular because of a unique
harmonic structure in their voice, and part of that package seems to be
a fair amount of transient information. That stuff can get clipped off
or distorted rather easily. So I tend to record relatively low, with
peaks 15 dB below 0 VU, which I could never have done with tape because
Van Herwerden: During the first rehearsals, you’ll get an
idea of what kind of signal you’re dealing with and adjust the headroom
accordingly. The dynamic range isn’t all that great: You’re working
with a 2, 4 or 5dB range. We operate here, like anywhere else, at a
+8dB peak. We don’t record Vos anywhere near that level, because we
don’t need to with digital systems. As long as we get it down cleanly
into the system, we can deal with the odd peak or shout as long as it
doesn’t exceed that +8 level.
What kind of signal processing do you use during initial
recording of a voice-over?
Butler: I tend to always record flat; if I record 60 people a
month, I’d bet that 59 would be flat. In the mix, I may add just a
sprinkling of EQ, but if you’ve got your mic placement and choice
right, you shouldn’t need much. once in a while, I’ll go through a
highpass if there’s some kind of problem. And if the talent is
sibilant, you’ve got to change the mic. I don’t EQ that out, and I
almost never use a de-esser. But sometimes the talent has such a wicked
“s” that I’ll apply some in post, though I never use it while
recording. [For dynamics,] I might sometimes run just a stitch of
limiting, just a smidgen. Down 2 or 3 dB, max. And I set attack and
release times by ear, using theconsole compressors.
Mason: Overall, an 87 is a little too dull, and it needs some
brightness generally in the 5k range, with some cut at 300 Hz or so.
often, I’ll use a highpass filter to get rid of some of that
subharmonic stuff beginning at 80 Hz, because it just eats up headroom
without being heard. Concerning compression, I find it’s better to
compress a Vo at 2-to-1, both when recording and when mixing, than to
compress it once at 4-to-1. I prefer a fairly fast attack and a slower
release, because I think a fast release tends to be heard. Regarding
de-essing, during the mix I’ll use the algorithms set up in my Euphonix
con- sole dynamics. or if it’s really nasty, I’ll throw it into the Pro
Tools and use some of the plug-ins to deal with it.
Von Buelow: I plug it into a channel on an 02R, which I use
to boost 3 kHz and 10 kHz about +2 dB to brighten it.
Casino: On the 02R console, I’ll dial in a very steep
highpass filter at 94 or 105 Hz and below, so it just goes away. I
don’t use a whole lot of compression, 2 or 3 dB at the most. But I
don’t change it a lot and try to concentrate on consistency of
microphone position and sound in the booth. I only EQ at the mix
Wouter van Herwerden
Van Herwerden: In the normal day-to-day recording sessions,
we don’t apply any processing at all, for the simple reason that if we
had to continue on another day, in another room, or with someone else,
that the voice will sound the same from session to session. Later,
during the mix we’ll do processing and EQ, but during initial recording
absolutely nothing gets added, other than maybe a little compression or
During the finished mix, how do you handle EQ and
compression/limiting of the Vo track?
Butler: One of the ways that commercial clients judge the mix is
how loud their spot is perceived to be compared to others on the air. I
have several stages of compression to achieve this. There will be a
small amount of console limiting on the Vo input of the mix. Then I
might have an 1176 compressor/ limiter on an insert, as well. I’ll also
have just a little bit of bus compression. And last, I’ll patch in a TC
Electronic Finalizer Plus, which is a 3-band compressor. With that I
can give 4 or 5 dB more level to the DAT. I’m trying to maintain levels
that don’t exceed -7 or -8 on the meter of a Sony 7030 DAT, while
achieving an average mix level of +1 on a VU meter. If you can achieve
both of those goals, you’ve got a pretty hot mix.
Von Buelow: You tend to end up with 3 to 5 dB of boost at 3.5
kHz, or that area, and then a little boost at 8 to 10 kHz. That
midrange and high end really seems to do the trick on television.
Casalino: I only EQ at the mix stage. I’ll start at 60 Hz and
pull that back to avoid “tubbiness.” I’ll start lifting the top at 5
kHz maybe, on up to 8k. But I’ll avoid 2 to 3 kHz; that can be a little
Van Herwerden: The talent have a particular kind of vocal
quality you’re trying to maintain in the mix. With Pro Tools, you can
save EQ setups, so I can recall them for the artists they were created
for. Specifically, I’d be using a TDM plug-in within the virtual mixing
page input channels. The same thing goes for compression and de-essing.
My processing is “virtual,” not hardware, and the nice thing about that
is if the mix has to go to another room, as long as it has the
plug-ins, too, they can just load the entire session from our backup
CD-R and re-create everything I’ve done in the original session.
CAN WE ALL AGREE?
Common Technique in Voice-Over Recording
Each panel member has his own distinct approach to VO recording, but there are a few fundamentals that all can agree upon.
The main difference between short-form (commercials) and long-form (documentaries, audio books, etc.) is the total amount of compression used. There’s much more in the case of commercials, so as to make them “loudness competitive” with adjacent spots. Long-form readings are usually done with the talent seated, but there are exceptions because of personal taste (and endurance). Commercial spot VO artists usually stand; the body English makes for a better performance. Generally, standing while reading results in better vocal control because the diaphragm is free to move. Headphones were used by everyone, although Tim Butler felt they contributed to the talent being too concerned with the sound of their own voices.
Scripts are almost always placed on a music stand that’s padded and angled to avoid reflections back into the microphone. Usually the goal is to place the active part of the script high enough to avoid the talent looking down at it and getting off mic.
There were several other areas of complete agreement between everyone interviewed.
All record to the hard drive of a digital workstation with a DAT backup.
• All have a console with digitally stored control settings to enable recall of session parameters.
All edit most of their own material, and all perform final mixing on projects.
All are of the opinion that women’s voices are more likely than men’s to offer sibilance problems.
All the participants had several monitoring options, using Tannoys or Genelecs for large monitors (especially useful for revealing low-end thumps, pops, etc.), NS-10s and Auratones as small speaker references, and some kind of 2- or 3-inch television speaker as the final test of what works on the air. Wouter van Herwerden had some illuminating comments about this last piece of equipment, which he calls “Mr. Crappy.”
“My driving force is narration, so I’ll use him to help establish an EQ for the VO that gives me the sound that I want out of a 2-inch speaker,” he says. “Once I’ve set that, I’ll go to the NS-10s or bigger speakers and start doing my first pass on the mix, referencing everything to my narration track. While doing this, I’ll keep referring back to Mr. Crappy because he’s the final arbitrator of all the stages of our work here, that is until 5.1 takes hold much more widely. At the end of the day, a 2-inch speaker is what it all comes down to.”