What's the Holdup?

In the stone age of computer music (the 1970s), latency was measured in minutes, hours or days. Computer musicians entered their compositions on punch cards or QWERTY keyboards and then went out for coffee or perhaps got a good night’s sleep while the CPU computed the audio file.

By the late ’90s, latency (the time lag between when you, the human, would like to hear something and when you actually hear it) was whittled down to less than a second. But that was still way too long. Latency was a serious, nagging issue for anyone who wanted to use a computer for audio recording and mixing, much less for real-time effects processing.

Since then, manufacturers have made giant strides in developing lightning-fast ways to handle computer-based audio. For example, Steinberg’s ASIO and Apple’s Core Audio provide very low-latency audio I/O. (If you’re still using MME on a Windows system, don’t even bother reading the rest of this article. Go out and buy an ASIO interface.) The fact that the computer’s CPU speed has increased by a factor of 10 in the past decade has also helped. Today, latency isn’t a problem — unless it’s a problem. When it’s a problem, there are solutions you can deploy. But to find the optimum solution, you need to understand the nature of the problem.

In this article, I look at some scenarios in which latency can become audible, explore the technical issues involved and discuss some strategies to deal with it.

THAT WAS THEN, THIS IS NOW
We’re used to thinking of today’s computers as being extremely fast, but “old-fashioned” analog audio is much faster. In an analog system, audio signals travel — for all practical purposes — at the speed of light. In contrast, in a digital audio system, audio signals are represented by streams or packets of numbers. Moving the numbers from place to place takes time; processing them takes still more time.

We know that in a computer audio system, latency is caused by the time it takes for the system to move audio data from place to place, alter it in some way or both. A typical audio process — for instance, reading a track from the hard drive, passing it through a reverb, mixing it with other tracks and sending it to the audio interface for monitoring — occurs in several stages, and each stage introduces some latency. George Radai of M-Audio compares latency to the noise floor in an audio system: Each component may introduce only a little noise, but the noise build-up is cumulative. With latency, the time lag introduced by any one stage may not be perceptible, but when several stages follow one another in a chain, the system’s total latency may become not only perceptible, but maddening.

Digital audio is not usually handled one byte or sample word at a time. Instead, it’s handled in chunks or packets. At the beginning of the process, for instance, the A/D converter turns the signal from a mic or analog bus into digital audio. Typically, the audio interface fills a buffer with data and then sends a signal called an interrupt request to the computer’s operating system to request that it pick up the buffer’s contents. Once the OS has done so, the interface begins filling the buffer again. The actual situation may be somewhat more complex than this simplified scenario, but that’s the essence of what happens.

For purposes of discussion, let’s assume that the interface is sampling an incoming audio signal at a 96kHz rate with 24-bit resolution. Let’s also assume that the buffer is large enough to hold 1,024 sample words of data (3k bytes) for each monaural audio stream that is being digitized. At that rate, the first sample word arriving in the buffer will have to sit in the buffer for a minimum of 1/96 second (a little more than 10 ms) before being passed on to the OS.

The musician will most often be listening to previously recorded tracks while laying down a new one, so if the musician’s timing is perfect, he or she will be playing the new track 10 ms after the DAW played it. The newly recorded material will arrive in the DAW after another 10 ms, resulting in 20 ms of latency. If the DAW simply lays down the new track the way a multitrack tape machine would, then the new track will be 20 ms late.

Twenty milliseconds of latency would be very audible, but today’s DAWs compensate for this time lag so that you don’t hear it. The DAW “knows” the size of the interface’s buffer, so it advances the newly recorded audio data by the necessary amount when placing it in a track. Thus, the track will be perfectly synchronized with other tracks that were being monitored while the recording was being made. Problem solved.

The file on top, recorded in Steinberg Cubase SX 3, was bounced to the next track through Cycling ’74 Pluggo’s Dynamical compressor effect. Dynamical was set to a look-ahead time of 29.66 ms. Not unexpectedly, the bounced track’s observed latency (which would be the same as the latency of the original track’s live playback) matches the look-ahead time.

WHAT’S IN THE CANS?
A more serious issue can arise when musicians want to hear what they are singing or playing through the headphones during the recording process. If this live signal is entering the computer, being mixed with the other tracks and then sent back to the computer’s output for monitoring, then the live signal will be delayed twice: once going into the computer and again on its way out. This in-to-out latency can be perceptible and distracting to the musician.

Two easy solutions are available; the third solution is more expensive. First, many audio interfaces are equipped with zero latency through monitoring. This is an analog bus that loops directly from the interface’s input to its output without being digitized or passed through the computer. Once you’ve set up this routing in your interface’s control panel applet, the musician will be able to monitor the backing tracks and get his or her performance in sync.

If your interface doesn’t have zero latency through monitoring, you can accomplish the same thing through your hardware console; a bit of repatching may be required. For instance, you may need to connect the interface to an aux or bus output on the mixer to avoid recording the entire temp mix into the new track. This solution should also work with a digital console: While there is inevitably some latency in digital mixers, it’s kept very low thanks to an OS that’s optimized for the job. Yamaha reports, for example, an in-to-out latency of less than 2 ms for the 02R96 Version 2.

Most computer audio interfaces allow you to set the size of the input buffer. You may be wondering, “Why can’t I just reduce the buffer size to its minimum to squash the in-to-out latency?” Feel free to try it, but the smaller the buffer, the harder the CPU has to work. At a certain point (which you’ll find by experimenting with your system), reducing the buffer size further introduces crackling noises. These noises crop up when the CPU literally has to drop audio bytes here and there because it can’t keep up.

In that situation, the solution is to buy a faster computer. Today, fast systems can get the I/O buffer size down to 64 samples (1.3 ms at a 48k sampling rate) without trouble. But as you add more tracks and more processing, and especially when you’re running at a higher sampling rate, the CPU will have to work harder, which means you may need to increase the buffer size to prevent dropouts.

MIDI SLUDGE
It’s important to keep the buffer size as small as you can without allowing the audio to break up when you’re using a hardware MIDI controller to play a software synthesizer. In this scenario, there’s no way to compensate for the MIDI-in-to-audio-out latency. When you press a key on the keyboard, the keyboard needs 1 ms or so to create the MIDI message and the OS may hold onto it for another couple of milliseconds before passing it on to the DAW. The DAW then passes it to the plug-in (that step will probably be very fast) and the plug-in begins filling its own buffer with audio data. Depending on the buffer’s size, a few more milliseconds may pass before the plug-in sends the beginning of the synthesized note back to the DAW. The DAW then mixes it with the data stream being sent to the interface, where it waits in the interface’s output buffer for a few more milliseconds before being sent to the D/A converter.

In practice, MIDI-in-to-audio-out latency can be slightly perceptible, but it’s not usually a major problem. For one thing, keyboard players are used to a little latency. When a pipe organist plays in a cathedral, the latency caused by the speed of sound and the distance from the organ console to the furthest ranks of pipes can be more than ¼ second. A stage amp 20 feet from the player introduces 20 ms of latency, and no one complains.

In the days when “MIDI synthesizer” meant hardware, the bandwidth of the MIDI data stream itself could become a source of latency. Each note-on message in a MIDI cable requires 1 ms to transmit, so when a 10-note chord spread across several MIDI channels was sent down a cable, the last note would be delayed by as much as 10 ms.

With software synths, this source of latency has been eliminated. The speed with which MIDI messages can be transmitted to a plug-in instrument is far higher. This can cause problems, however, if a hardware synth is layered with a software synth. The hardware instrument might easily lag a few milliseconds behind the software, which can cause smearing of attack transients. In live use, however, the situation is reversed: The soft synth may be a few milliseconds late.

In the studio, the solution is simple: After recording the hardware synth’s output to a new track, advance the track a bit while listening or observing the waveforms.

PLUG-IN CHURN
Software-based effects processors can be another source of latency. When the plug-in receives a packet of audio data from the DAW’s mixer, it may have to spend a few milliseconds chewing on the data before sending it back to the mixer. As a result, a track that is rhythmically aligned with the rest of the music when monitored dry may lag slightly when an insert effect is applied. (See the figure on page 100.)

According to Jim Cooper of MOTU, today’s third-party plug-ins (at least on the Mac) universally make their latency figures available to the host DAW. MOTU’s Digital Performer automatically compensates for the latency by sending the track audio to the plug-in early so that the plug-in’s output will be in sync with the rest of the tracks.

If you’re using a plug-in with latency that isn’t being compensated, you can do the same thing by hand: Advance the audio segments in the track a few milliseconds at a time (either by dragging the audio segments or with a track-advance parameter) until the processed sound locks in rhythmically. At this point, the effect’s Bypass button will create the opposite problem: When the effect is bypassed, the track will be early. If you want to do a lot of A/B’ing of the track with and without the effect, the workaround is to duplicate the track and use the Track Mute buttons instead of the effect’s Bypass button.

Advancing the track’s audio data can also be a viable solution if you’re using a cherished hardware effect in your mix. In this case, you’ll need to compensate for the interface’s out-to-in latency, which will probably remain even if you’re feeding the rack processor a digital audio signal rather than going through a DA/AD conversion. If you want the hardware effect to be on a bus rather than functioning as an insert, advancing the track data won’t work, but there are still workarounds. For instance, you might pass the entire mix (except the effect return) through a wet-only delay line set to a few milliseconds of delay.

In practice, plug-in latency is likely to be a problem only with extremely DSP-intensive effects such as convolution reverbs and FFT-based processing. I tested a convolution reverb (WizooVerb W2) and found that the output of its dry signal path was word-aligned with the input. Inevitably, the convolution process delayed the wet signal, but a few milliseconds of pre-delay in a reverb are probably not going to hurt anything.

SCREEN REDRAW WOES
The most noticeable form of latency in today’s computer-based recorders is not audio latency but visual latency. When the DAW is busy handling a number of audio streams, it will quite sensibly put screen updates at the bottom of its to-do list so as not to risk audio dropouts. As a result, the meters may get jerky. Any faders being moved by automation data may jump from place to place rather than moving smoothly.

If the meter’s overload indicator is dependent on the visible meter performance rather than on the actual audio level, then sluggish meters could cause you to miss a moment of clipping. If the DAW’s code is well-written, that won’t be a problem. But we’re not quite out of the woods yet. If you’re using the mouse to record automation data, then the DAW may not read the mouse position often enough when the CPU load gets heavy. The same thing might happen with a hardware control surface.

To test your own configuration for this problem, load a CPU-intensive project, record some automation moves and inspect the data for jumps. Then mute, bypass or delete a bunch of stuff so that the CPU load drastically drops, record more automation and compare the results. Within a few minutes, you’ll have a handle on whether your system needs babysitting in this situation.

Jerky automation data can usually be smoothed out by hand, either with a pencil tool or by editing a few envelope breakpoints. If your studio handles a lot of tight-deadline projects, though, a better solution is to buy a faster computer.

LAST BUT NOT LEAST
Manufacturers are naturally eager to get the last ounce of performance out of computers, so CPUs are always going to be pushed to the wall. Whenever the CPU is working hard, increasing the audio I/O buffer size can become necessary to prevent glitching. It will be a few more years before audio I/O buffering drops below 1 ms and stays there. The good news is, there’s no need to sit around and wait. You can lick latency right now.

Jim Aikin is a regular contributor to Electronic Musician and other music technology magazines.