Dolby E: A 2-Channel Solution for Multichannel DTV

With Digital TV already into year two of its seven-year, FCC-mandated phase-in, television is undergoing its most profound transformation since the advent of color broadcasting. As part of this transition, the DTV format will bring Dolby Digital-which supports audio ranging from mono to 5.1-channel surround-to both high-definition and standard-resolution television.

Dolby Digital is set to become an integral part of the TV-audio infrastructure, encoded at broadcast stations before transmission and decoded by DTV sets. But before most television programming ever gets to individual stations, it makes its way through multiple stages of the production and distribution chain. And currently, that chain is predominantly set up to handle stereo. The question, then, is how to get 5.1-channel surround program through this chain to the individual stations, where it may then be encoded for Dolby Digital delivery. The answer, according to Dolby, is Dolby E.

TWO APPLICATIONS, TWO FORMATSThe distinction between Dolby E and Dolby Digital is reflected in the way that Dolby refers to the two formats. Dolby E, the company says, is a distribution coding system, while Dolby Digital is an emission coding system. Dolby Digital will be encoded just before transmission, delivered to viewers via broadcast, cable or satellite, and decoded by consumer electronics devices such as televisions and receivers (such decoding is already a mandated capability of DVD-Video players).

In contrast, Dolby E is strictly pro-it’s never directly heard by consumer ears. Essentially, its role is to facilitate the handoff of multichannel audio during the pre-transmission distribution chain, especially where existing methods-such as network feeds or audio tracks on videotape-are not designed to handle the six discrete channels required for 5.1 audio. Dolby developed the format pecifically for “applications such as sending a program to a local station for commercial insertion or sending it via satellite to another broadcast facility.” Dolby gives two main reasons that Dolby Digital is not well-suited for the kind of distribution role envisioned for Dolby E. First, Dolby Digital is optimized for maximum quality at low bit rates; thus, it is limited to a single cycle of encoding (transmission) and decoding (reception). Second, Dolby Digital frames do not match video frames, which vastly complicates the editing process when changes are needed after audio and video elements are already combined into a single medium.

PERCEPTUAL CODINGUnlike digital audio interfacing standards such as AES/EBU or S/PDIF, formats such as Dolby Digital and Dolby E are designed to reduce the overall volume of data required to store or transmit an audio signal. In both cases-and in most other audio codecs (compression/decompression algorithms) used today-the bit rate reduction is achieved using “perceptual coding.”

Perceptual coding breaks the audio band up into a number of frequency ranges, with the audio activity in each constantly evaluated. Sounds that we are less likely to hear-frequencies that are masked, for instance, by louder frequencies nearby-are discarded, leaving more bits available for allocation to the more important sounds that remain. The higher the available bit rate, the less signal is lost during encoding, and thus the greater fidelity there will be to the original sound.

Of the two Dolby codecs, Dolby Digital is by far the more “lossy,” using as little as 384 kilobits per second to express a 5.1-channel signal that would take 768 kbps per channel if stored discretely using linear PCM at 16-bit/48kHz resolution. That equates to a data-reduction ratio of up to 12:1.

Dolby E, on the other hand, is designed for applications where far greater bandwidth is available to store and transmit a multichannel audio signal. These include any video or audio tape format or transmission line that supports two channels of uncompressed digital audio at 48 kHz with at least 16-bit word length.

“Any format, VTR or server, that can record or play back a 16- or 20-bit audio signal without sample rate conversion, dithering, truncation or gain change can record and play back Dolby E,” says Dolby professional products marketing manager Nancy Byers-Teague. “We regularly work with Sony Digital Betacam, new HDCAM and Panasonic HD D5, which all handle Dolby E. And any signal path that can carry a 16- or 20-bit audio signal, SMPTE 302M compatible, can carry Dolby E.”

If the supported word length is 16 bits, the two channels together will offer a bandwidth of 1.536 Mbps, which Dolby E can use to carry six discrete audio channels plus metadata (descriptive and playback-control data related to the audio). If 20-bit word length is available, the two channels will yield a combined bit rate of 1.92 kbps, which Dolby E uses to carry up to eight channels (perhaps a 6-channel surround mix plus a separate stereo mix), as well as metadata. A 24-bit mode is also specified for the Dolby E format but not yet implemented in encoders and decoders.

Byers-Teague adds that the compression ratio “depends upon the number of channels and their configuration. For eight channels, it is approximately 4:1, but remember that there is also metadata included in the signal, which takes up some of the bandwidth.”

Viewed solely from the standpoint of compression ratio, Dolby E falls into roughly the same mild data-compression ballpark as consumer electronics codecs such as DTS and Sony’s ATRAC (used in MiniDisc), rather than the more extreme Web-related codecs such as MP3 or RealAudio. But with perceptual coding, the ratio alone is far from the whole story-the devil is in the details of design and implementation. And as one of the more recent audio codecs, Dolby E has presumably benefited from research since the development of earlier approaches.

The first Dolby E production units were shipped last october, so it hasn’t yet been heard widely enough for an industry consensus to develop regarding its audio quality. Dolby is confident enough in the fidelity of the codec to declare that program can “withstand ten or more encode/decode generations” without audible degradation.

“Along with standard audio performance measurements,” says product manager Tim Carroll, “we conducted extensive double-blind listening tests during the development of the system, and we continued during beta testing with material sent to us by potential users. We followed the essential features of ITU-R Recommendation BS.1116 [Methods for the Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems]. An extensive search for ‘critical program material’ was conducted, so that we would be confident that future day-to-day use would not uncover any new coding artifacts.”

DOLBY E ENCODINGThe process of using Dolby E begins with digital audio in linear PCM, normally at the final stage of the production process, when it’s time to record completed mixes onto digital videotape or a video server. Dolby’s encoding unit is the DP571 Dolby E Encoder, which lists for $5,395 (additional units supporting Dolby E are expected to be ready for introduction at this month’s NAB Convention in Las Vegas).

“The source material can be at word lengths of 16 or 20 bits,” Byers-Teague says. “The system is designed to operate at a sample rate of 48 kHz, so converters are incorporated into the input channels to accommodate sources at other rates, such as 44.1 kHz.”

It’s important to recognize that there’s not necessarily a connection between the word length of the source PCM and the word-length mode at which Dolby E encodes. “The output from the Dolby E Encoder will always be at 48 kHz so that it can be synchronized with video signals,” Byers-Teague continues. “But it may be at 16 or 20 bits, depending on the Dolby E mode. These differing word lengths actually refer to the word length of the Dolby E stream, not to the word length of encoded/decoded audio channels.”

As the word length of the audio signal after decoding is always the same as that of the source, a 6-channel 20-bit signal may be encoded into a Dolby E stream that is recorded or carried on a system that only supports a 16-bit signal. “A 16-bit signal keeps its 16-bit resolution,” Carroll says, “and a 20-bit signal keeps its 20-bit resolution.”

EDITING AND METADATAOnce encoded, a Dolby E signal can pass through several steps on its way to the transmission facility. As last-minute changes, insertions and other edits come with the television territory, Byers-Teague says, Dolby E’s frame rate was “designed to match that of the video it accompanies, enabling insert or assemble edits on tape and audio-follow-video cuts between programs-without pops or clicks. This makes it possible to switch, route and perform assemble edits directly on the digital bitstream without decoding and re-encoding.”

Byers-Teague adds, however, “Anytime you want to fundamentally change the audio characteristics, it’s best to decode to baseband audio.” That covers common operations such as changing the EQ or dynamics, or adding effects like reverb. That’s why Dolby stresses the claim that Dolby E survives multiple encode/decode cycles unscathed.

Whether decoding occurs en route or at the final destination, it’s handled by the DP572 Dolby E Decoder (listing for $3,995). If the program is slated for transmission using Dolby Digital, it must first be decoded, then re-encoded into Dolby Digital. Dolby Digital metadata survives intact for re-encoding into Dolby Digital.

“A 5.1 channel mix may be played back through a mono, stereo, Dolby Pro Logic or a full 5.1 Dolby Digital system,” Byers-Teague explains. “The mixer can set downmix levels, proper channel configuration, dialog loudness level and compression parameters, and the Dolby Digital metadata carries the mixer’s vision all the way down the line to the consumer.”

To make it easier to generate the metadata, Byers-Teague says, Dolby has “hired the personnel to work with other professional video and audio manufacturers-including workstation and offline editing vendors-to allow their products and systems to work with Dolby E products and for their systems to produce Dolby Digital metadata that can be carried by the Dolby E codec.”

Given the choice between even a mildly lossy codec such as Dolby E and uncompressed PCM, most audio professionals would obviously choose the latter. Unfortunately, that option doesn’t currently exist on the scale needed to bring 5.1 channel surround into common usage for DTV. other options could possibly surface before the conversion to DTV is complete five years from now-but for the time being, TV broadcasters are still equipped with a 2-channel infrastructure.

Dolby Laboratories, 100 Potrero Ave., San Francisco, CA 94103; 415/558-0200; fax 415/863-1373; www.dolby.com.