Surrounding the Audience

In last month’s column, I looked at some of the issues involved in putting together surround mixes for live sports events through the eyes of Phil Adler, an old friend of mine who’s been doing freelance sports mixing for 18 years; Ron Scalise, audio project manager for remote operations at ESPN and now ABC; and Jim Hilson, senior broadcast audio specialist at Dolby Labs. As we saw, it’s hard enough getting the mixes to make sense at the site, but this month, we look at what happens and what can go wrong when the sound gets sent on its way.

In 1999, the folks at Dolby Labs came up with a way to handle surround audio that fit the way broadcasters — well, most of them — wanted to work, and overcame one of the major obstacles that stood in the way of wide acceptance of surround remote broadcasting. Very simply, the problem was, says Adler, “There’s no way to handle six discrete channels out of the truck, through the routers and over the broadcast path. That’s why they came up with Dolby E.”

Dolby’s Hilson adds, “Most videotape recorders only have four channels of audio. To do surround, we need six channels, or eight if we also want a separate stereo mix at the same time. So Dolby E was developed as a way to get 5.1-channel mixes from point to point.”

Dolby E is a method of coding up to eight audio channels onto a single AES digital pair. The AES signal is 20-bit/48kHz, which allows room for metadata. Its data rate is 1.92 Mbps. It’s a lossy codec, but according to Hilson, “It’s good for up to 10 generations of encode/decode. It’s not as lossy as Dolby Digital. How many you can do depends on the complexity of the signal: Dialog can go several more generations before you notice it, but symphonic music might sound a little weird sooner. In a typical situation, you’re going from the truck to the network, which takes it apart and adds commercials. Then they send it to the affiliate, who also takes it apart and adds commercials. Then they send it to the transmitter, so you’re looking at five or six generations, maximum.”

Another advantage to Dolby E is that it is designed from the ground up to be video editing — friendly. “It’s built on frame boundaries,” says Hilson, “so it can be stored on tape in the video signal and edited right along with the picture. There’s a one-frame delay at encode and decode, but if all you’re doing is a cuts-only edit, the audio is in the right position. If you’re doing a lot of production, then when you do the layback, you just offset the audio.”

Yet another advantage is that the metadata automatically follows the signal as it is encoded and decoded. Among its functions are to keep the channels separate and to convey compression profiles and normalize dialog levels among different sources, such as programs and commercials.

A signal path from a remote truck at a sports event might go like this: The truck generates a 1.5-Gigabit/second uncompressed HD video stream. When it leaves, the signal is compressed as it is fed through an Asynchronous Serial Interface (ASI) to a satellite or fiber-optic network, usually taking the data rate down to between 45 and 60 Megabits/second. Most ASI systems allow two audio “services” to be carried with the video signal. One can be Dolby E and the other is often a 2-channel feed, compressed at the truck using MPEG-2, which takes it down from 1.536 Mbps to 384 kbps, the same data rate as the audio on a Dolby Digital signal going to the home viewer.

At the network, everything is decoded: The video is turned into Serial Digital Interface (SDI) or HD-SDI, the forms in which it can be distributed around the building, and the audio is converted to PCM. SDI video lets you embed up to 16 channels of audio, and some networks take advantage of this, but others put the audio on a separate cable. Commercials and other network feeds are added to the stream, and then it goes out again through an ASI on its way to the network’s affiliate stations.

Now let’s jump to the viewer’s set. When it ends up there as part of an HD broadcast, the audio is 384 or 448kbps Dolby Digital. Audio mixers like Dolby Digital. “It is really discrete,” says Adler. “Nothing goes into a channel unless you put it in that channel.” Viewers like it too: “The consumer sees the ‘DD’ light on his $10,000 home theater system,” adds Scalise, “and he’s happy.”

But what goes on during the time between when the signal arrives at the network and when it goes out over an individual station’s transmitter varies a lot, depending on who’s doing it. Says one observer, “Everyone wants to have their own little thing that sets them apart from everyone else.” The on-air networks use ASI — over satellite or fiber — to pass the programs on to their affiliates. But that’s about all they have in common.

CBS, when it distributes the video signal, embeds the Dolby E audio with the video, which the stations can then pull apart and encode into their HD transmissions. At Fox, however, the ASI signal going to the affiliates has already been down-converted to a standard HD video broadcast signal (19.4 Mbps) — a “transport stream” — which could be turned around by the affiliates and transmitted without further conversion. But it’s not quite done that way. First, each station adds local content using a “stream splicer” and then sends the signal to the transmitter.

NBC’s ASI stream, on the other hand, uses a higher 35Mbps bandwidth, which allows it to include eight discrete audio channels in the form of four MPEG-2 pairs. The individual stations decode the audio as PCM, and then can choose which channels they want to include with their video transmissions — some streams, for example, only use the stereo feed and stations on the East Coast might want different audio content from those on the West Coast.

ABC is an interesting case: It currently doesn’t use Dolby E at all (although until recently, its truck feeds for Monday Night Football were in Dolby E). A large part of the reason is that the network went on the air with HDTV and surround audio before Dolby E was developed. Working with Dolby, ABC developed a high-rate version of Dolby Digital that runs at 640 kbps as opposed to 384. The high data rate allows more encode/decode stages with less signal degradation. “It’s not necessarily the perfect solution,” says Hilson, “but it was a way for them to get on the air with 5.1. And the difference is pretty hard to tell, unless you had the original audio to compare it with.”

ESPN, which like ABC is owned by Disney, also doesn’t use Dolby E. (Recently, ABC dissolved its remote operations unit and turned everything over to ESPN.) To get the audio from ESPN’s trucks to the network, the network uses a completely different method of encoding: SRS Labs’ Circle Surround matrix encoding. In fact, Scalise’s original system for delivering surround used SRS’ technology for the entire signal chain — from truck to living room. “It was the simplest way to transport the audio and still be compatible,” he says. “It can be decoded with Circle Surround II and with Dolby Pro Logic II, as well as Neural Audio’s decoder. It’s a stereo signal until it’s told not to be by a decoder.

“The SRS system does a little moving of stereo cues from front to rear and vice versa, so you don’t get holes in the surround field,” Scalise continues. “The more separation there is between the left and right channels, the more it gets sent to the rear. Things in the front channels with reverb or echo effects extend to the rear. Even though we’re building the sound design for surround, the music we use is all delivered in stereo, so the music will envelop the user and blend in better with the overall mix. In effect, the decoder is up-mixing it. For example, instruments recorded in stereo or elements like overhead drum mics will be all around you, while the snare in mono will be in the front. Is it true to life, is it the way they recorded it? No. But does it add to the ‘wow’ factor? Yes.”

It also explains why no one — like the sponsor — complains when a stereo commercial is played in the middle of a surround sports broadcast. The Circle Surround system makes sure there’s interesting content in the surround channels even when there was none to begin with. At the network, the Circle Surround gets decoded to 5.1 and then re-encoded to Dolby Digital. The metadata in the stream is inserted at the network.

For ESPN, the priority is to make sure the affiliates — which, in its case, means cable head-ends — have to worry as little as possible about the signal. Because there are tens of thousands of head-ends among the network’s subscribers, and some large systems themselves have as many as 40 “virtual” head-ends, allowing each cable company to decode and re-encode the signal is just asking for trouble, not to mention that the hardware cost would be huge. So after experimenting for a year using SRS in the whole chain, ESPN switched over and now — like other cable networks that do surround, such as HBO and Showtime — sends the same 384kbps Dolby Digital audio stream over its ASIs that the cable systems send their customers. Thus, when the signal is turned around at a cable head-end or by a DIRECTV transmitter, it doesn’t require any audio conversion.

All of what I’ve talked about so far has to do with HD broadcasts. But the great majority of television viewers in the U.S. are still watching in standard definition (SD), and SD has no standard for encoding surround audio. Not surprisingly, different networks use different methods. And where there are multiple surround formats, there are going to be different ways of approaching the stereo mix.

“Some mixers do a totally separate stereo mix off the desk,” says Adler, “or they’ll use a Dolby 563 digital encoder to fold it into Pro Logic II, which can be listened to in either surround or stereo, or they’ll use a Dolby 570 monitoring tool to downmix it.” When Adler is at a game, he’ll generate a separate stereo feed, but, “it’s not really two mixes since the balances are the same,” he says. “I just use left-front, right-front and center. I don’t bother to put the surround channels into it because it’s mostly crowd.”

Despite the excellent technology and the best efforts of the network engineers and mixers, there are still many potential pitfalls. “If it’s done right, from conception to reception,” says Adler, “multichannel sound can be very effective. But there’s an awful lot that can go wrong, which is not in our control as mixers. If it leaves the network okay, you have stations broadcasting in both SD and HD during the course of the day, and sometimes their equipment isn’t working right to switch from stereo to surround, or they don’t have the equipment, or they forget to push the right buttons. So they end up broadcasting only channels 1 and 2 — which are left-front, right-front — but not channel 3, which is the center, and now you’ve lost the announcer.

“Until last year,” Adler continues, “all NFL games on CBS were sent on SD with surround audio matrix-encoded in Dolby Pro Logic. But they were getting a lot of complaints from the affiliates related to this — people not hearing the announcers and other problems — so they pulled the encoders out, and now they claim all the complaints went away.”

Says Scalise, “We send SD down-converted from the HD signal. When that left-total/right-total [Lt/Rt] matrixed signal comes back, before it gets decoded, it branches off to the SD feed. That’s where most of the problems lie: Maybe the cable system is sending it out in mono or maybe they have something out of phase. Each transmission channel has its own receiver, and we have to hope that every one of those receivers is set up properly. If they do something wrong, it can take a perfectly good stereo channel and put it out of phase, so your dialog goes away.

“We’re more successful on the HD side — it just passes right through and there’s not much they can do to it,” Scalise continues. “Then the only issue is what the guy at home does. Some receivers have bells and whistles that don’t necessarily do right by surround or even stereo signals. The worst we’ve seen is some TV sets with stereo enhancers; sometimes to push the channels out, they leave a void in the middle.

“The only really safe thing to send out is mono — you can’t screw that up, except someone will turn the bass all the way up and the treble all the way down, and complain he can’t understand what the announcer is saying. We do handle complaints, and sometimes have a task force that goes out to a particular market and solves the problem. It’s all for our own benefit.”

According to Adler, “There are a bunch of guys — hobbyists and pros — who are really into this thing on the ‘AVS’ forum, and they’ll talk about the programs as they’re on the air. Sometimes they’ll talk about the game, but also about sound and picture quality. It’s a small group, but my bosses read that stuff, so we have to pay attention.

“When everyone gets onboard with digital transmission [which the government has now mandated for February 17, 2009, the day all analog television transmission in the U.S. will stop], maybe the problems will all go away. But right now, you’re getting a mishmash; it’s a real mess.”

As I was finishing this two-part series, an item in the local newspaper caught my eye: Smaller college sports conferences — the Ivy League, for example — are moving away from network and even cable coverage of their games as the sponsors are becoming increasingly interested only in the bigger schools and conferences. So what are they doing? They’re moving to Webcasts. What are they going to sound like? One can only imagine.

Paul Lehrman’s anthology of his columns for Mix and other essays and a few jokes, The Insider Audio Bathroom Reader, is now available from your favorite neighborhood and online bookstore, and the Mix Bookshelf.

For the Dave Smith obituary click here.