It’s no surprise to most of us that, by adopting digital acquisition, processing and distribution, we’ve settled for audio that is audibly degraded relative to its analog antecedents. Emerging acquisition standards, such as high-resolution PCM and DSD, have brought back the easy listening of analog to digital audio. While discriminating engineers sweat the details during production, distributing the finished product is another matter entirely.
Distribution, until recently, dictated degraded quality. Take Justin the Freshman. He’s quite happy listening to MP3’s “near-CD quality” just as his forefathers were quite satisfied with their hideous 8-track tapes. Since today’s distribution hot button is the ‘Net, something’s gotta go when Justin’s lucky to have 56 kbaud. Whether it’s telecom, broadcast or optical delivery, there’s usually not much bandwidth available for our audio data. We can’t all afford xDSL just yet, but we can use some discrimination when the client asks us to “make it fit.”
The search for ways to stuff audio through pitifully puny pipes started with short-word-length PCM, ADPCM and IMA, and advanced to the current quagmire of “standards,” MPEG-1 Layer III (MP3), RealSystem G2 and QuickTime 3 (QT3)-all of which share the ability to take PCM source files and squeeze out the inherently redundant data, resulting in a smaller file that usually sounds acceptable. “Sounds acceptable?” Now that’s being polite. But, all is not lost. Of late, corporate brain trusts have been coming up with new, widely deployed codecs that actually sound good!
Restricted carrying capacity certainly isn’t new. Born in the Analog Era, there are several bandwidth-challenged analog standards you may recognize: NTSC and PAL television, AM/FM radio and pre-digital Plain Ol’ Telephone Service (POTS). Those standards relied on peculiarities of the human perceptual system to deliver just enough information to convey a message over a channel with less than full bandwidth.
The most common codecs in use are classed as perceptual coders. Though invented in the late 1980s at Bell Laboratories, these algorithms have continued to morph just as Bell Labs has morphed into Lucent after the breakup. Engineers at Dolby, QDesign, Lucent and the Fraunhofer Institute for Integrated Circuits (FgH) among others, have thrown every trick in their multidisciplinary book at the development of modern versions. The algorithms these companies developed, as in the analog days of yore, provide bit rate reduction without compromising perceived fidelity. The sidebar below shows some features of the low-data-rate king, QDMC v2, and the high-rate winner, ePAC, along with Advanced Audio Coding (AAC). ePAC is a highly refined dark horse with the power of Lucent behind it. MP3 is included for reference only as its old-school performance is considerably poorer than the other three.
How do perceptual encoders work? Let’s look at AAC, as it’s not cloaked in secrecy like QDMC and ePAC. Both ePAC and MPEG-2 AAC incorporate algorithms from Lucent’s PAC, dating from 1992. AAC is one of the codecs of the MPEG-2 standard and is a subset of MPEG-4, the new unified family of ISO standards for delivery of rich media. Major contributors included the FgH, AT&T, Dolby and Sony.
As with other offerings, AAC employs perceptual sub-band/transform coding, whereby the signal is first transformed from the time domain into the frequency domain using a variable window or block length. The encoder then applies a psychoacoustic model to estimate whether, in any particular band of frequencies, the signal strength is above or below the perceptual threshold relative to the adjacent bands. If the signal is above the masking threshold, a spectral coefficient or value is generated to represent the signal in that band. Masking threshold means the amplitude threshold below which a spectral component will be “hidden” or masked by louder components at frequencies nearby. It’s a brain thing, just go with it. Once all the valid coefficients are determined, AAC applies additional mechanisms to enhance coding efficiency, including: joint coding, which removes monaural redundancy in a stereo signal; temporal noise shaping prediction, which distributes quantization noise over time; and lossless Huffman coding.
Current distribution channels for digital audio, from satellite TV to CD-ROM, are largely supporting lossy codecs for the same reason. Some examples of lossy codecs in action:
* DVD-V, along with LPCM, has selected Dolby Digital as the codec of choice, with optional support for DTS and MPEG. DVD-V was the first distribution format with support for 96/24. An audiophile somewhere was persistent enough to improve upon the “perfect sound forever” of 44.1/16. Certainly a “professional” audio engineer wouldn’t have suggested such a thing. Most are perfectly happy with the crappy audio quality that we hear every day from gear with “pro” labels. Thank the gods that someone in the pro world listens to acoustic music once in a while, otherwise DVD-A and SACD wouldn’t have been proposed.
* Cinema sound is brought to you via Dolby Digital (AC-3), Digital Theater Systems, Inc. (DTS) or Sony’s 8-channel Sony Dynamic Digital Sound (SDDS), which uses a professional version of the same Audio Transform Acoustic Coding (ATRAC) algorithm used by MiniDisc.
* Digital radio services, known as Digital Audio Broadcast (DAB), use various schemes, including Lucent’s Perceptual Audio Coder (PAC), ISO/MPEG Layer II, MPEG-4 or Musicam (Masking Pattern Universal Sub-band Integrated Coding Multiplexing)-the codec of choice for good ol’ ISDN phone patches.
* North American digital television uses-you guessed it-AC-3. No wonder Dolby Labs has been busy hiring in its licensing division.
* The hairball of standards that is the Web has its own collection. Microsoft’s Windows Media format (WMA), the standard for that company’s extensive Active Streaming Format (ASF), and Version 2 of the QDesign codec (QDMC) used in the fabulous, open source QuickTime 4 are examples of audio mechanisms for popular computer operating systems. The SDMI distribution system for secure music purchases supports WMA, MP3, MPEG-2 AAC and Lucent’s Enhanced PAC (ePAC). Liquid Audio’s open, multiformat approach doesn’t play OS favorites. It supports AAC, AC-3 and MP3.
Lossy codecs won’t be going away anytime soon, based on the proliferation of terrestrial (and satellite) distribution of rich media, the emergence of solid-state personal stereos and record labels-large and small-betting on a hybrid revenue model of on-site advertising with the enticement of free track downloads. So, when a project comes knocking that allows your input, do what you’re paid for: Use your ears first, then make an educated choice from the new codec menu. To help you get a feel for which one is appropriate, I’ve posted sample files on my site for your evaluation. Stop by www.seneschal.net and follow the Papers & Articles link to take a listen.
Though I’ve written for Mix as far back as 1987, “The Bitstream” is my first column for this magazine. I’ll try to provoke some thought about the technological foundation for our industry and answer questions you may or should have. The conflux of audio and computers forms the basis for discussion, and there’s plenty out there to cover. I’ll also, on occasion, wander into other subject areas while eluding my editors. Let me know, at [email protected], what computer-based technology issues interest you; they may show up in months to come!
RIAA SUES MP3.COMOn January 21, the Recording Industry Association of America filed copyright infringement litigation against MP3.com, calling for a halt of MP3.com’s new services, Instant Listening and Beam-It. The RIAA charges that these new technologies, which allow users to log in and listen to music they already own on CD (verified via MP3.com’s proprietary software that automatically recognizes and remembers audio CDs placed in a member’s CD-ROM drive), violate copyright law.
Cary Sherman, RIAA general counsel, says that MP3.com’s new services are built on an “unauthorized digital archive…music that is not owned by MP3.com.”
MP3.com’s Michael Robertson defends the new services under fair use provisions: “Our service is nothing more than a virtual CD player. Only the person who buys the CD is entitled to listen to that music through our service.”
CCS SETTLES WITH ONLINE SOFTWARE PIRATESHampton Hill, UK-based Copyright Control Services, announced that, on behalf of its audio software manufacturing clients, it has reached settlements with a number of individuals who had been offering pirated software for download on the Internet. The CCS, under the 1998 Digital Millennium Copyright Act, tracked down individuals in four states who have separately admitted posting CCS clients’ software in newsgroups. These individuals have agreed to pay settlements to CCS in order to support CCS and its clients’ antipiracy campaign and to settle the copyright and other legal claims of the copyright owners against the infringers.