Artifact Audibility

As professional recording engineers, our job is to capture musical performances as clearly and accurately as possible, especially acoustic music. Sure, when mixing pop and rock tunes, we often mangle sounds to make them more interesting. Creative use of EQ, compression, distortion and other effects is an important part of the job. But once a mix has been honed and approved, we aim to maintain the same tonality and clarity through to the listener’s loudspeakers.

Two things conspire to degrade quality and clarity: frequency response errors, and various distortions and noises that I call, collectively, “artifacts.” These include harmonic and intermodulation distortion, tape hiss, AC power hum and buzz, jitter noise, and other unwanted sounds. The most important property of an artifact is how loud it is. With practice we can learn to identify some types of artifacts at a fairly low volume, though at some level they’ll be too soft to hear at all, even to the most well-trained ears.

Besides the overall volume of artifacts in relation to the music, two other factors affect their audibility: their frequency content and the masking effect. The equal loudness curves developed by Fletcher and Munson in the 1930s show that our ears are more sensitive to some frequencies than to others, depending on their loudness. In short, our hearing is most sensitive at frequencies around 2 to 4 kHz, and much less so at low frequencies. So artifacts containing treble frequencies are more audible than low-frequency rumble.

The masking effect is equally important because it determines how well we hear one sound in the presence of another. When standing next to a loud jackhammer, you won’t hear someone talking softly 10 feet away. Masking is strongest when both sounds have similar frequency ranges. Therefore, when playing a cassette tape you might hear tape hiss during a bass solo, but not when cymbals or violins are prominent. Likewise, you’ll easily hear low-frequency hum when only a tambourine plays, but maybe not during a floor tom solo. Note that masking affects only our ears. Spectrum analyzers can easily identify any frequency in the presence of any other frequency, even when one is 100 dB softer.

Earlier I mentioned that jitter manifests as noise, though it may or may not be random like tape hiss. Depending on its cause, jitter may appear as FM sidebands, which is more like IM distortion or digital aliasing. But jitter is still an artifact, and as such its audibility can be assessed like any other artifact. Some people believe jitter narrows the stereo width and harms bass response, though I’ve never witnessed compelling proof from a proper blind test. Similarly, truncation distortion occurs when reducing 24-bit audio files to 16 bits when dither is not applied, and some people believe this too affects fullness and imaging. However, fullness is a function of frequency response that’s easily measured. And good imaging is related more to room acoustics and untamed reflections than low-level distortions and noise. Jitter artifacts are typically 100 dB or more below the music, and they simply can’t have that effect.

A few years ago I experimented to learn at what level distortion and other artifacts are audible. I created a 100Hz sine wave in Sony Sound Forge, then added a 3kHz tone at various levels below the 100Hz tone. I picked those two frequencies because they’re far apart to minimize masking, and our ears are most sensitive around 3 kHz. I inserted the 3kHz tone as a series of pulses that turn on and off once per second, making it even easier to spot. Tests like this are simple to do, and I urge everyone to experiment.

Note that when playing high-frequency sine waves through loudspeakers, you should move your head slightly while you listen. This avoids missing a high frequency that’s present, but in an acoustic null. Even when a room is acoustically treated, nulls at high frequencies can exist every few inches, especially when playing a mono source through two loudspeakers at once. You can easily hear this by playing a 3kHz tone by itself, then moving your head a few inches in any direction. I also created a special noise file you can loop and insert at various levels behind music:

http://www.ethanwiner.com/noise.wav

This noise contains treble frequencies where our ears are most sensitive, so it’s biased to favor those who believe very soft artifacts such as jitter are audible. If you can’t hear it mixed under music at -70 dB, it’s unlikely that jitter, which is much softer, will have any effect. Note that this file peaks at -20 dBFS because it’s very irritating to hear at a normal volume. So when you mix it with music, it’s actually 20 dB softer than the level control indicates.

I’m convinced that some people believe jitter affects fullness and imaging because of peaks and nulls in their listening room. Even in a well-treated room the response varies substantially at all frequencies over small distances. I’ve measured differences greater than 12 dB at 70 Hz in locations only four inches apart. At higher frequencies the level differences are even larger. So when switching your converter’s clock, unless you keep your head in the exact same place within half an inch, the fullness or clarity really can vary, but not because the amount of jitter changed!

Ethan Winer has been an audio engineer and professional musician for more than 40 years. His new book, The Audio Expert, is published by Focal Press.