This month's issue is devoted to sound tagging along with picture, and, glory hallelujah, we audio folks finally come out ahead! Pardon my exuberance and permit me to explain. Yesterday, I strolled the aisles of the MPEG-4 Industry Forum's Workshop and Exhibition and became amused by all of the crawly, noisy video that was being demo'd. Ah, but the audio — though only mono or stereo — had exceptional quality considering its data rate. Welcome to the wonderful world of MPEG-4, where audio finally steps to the front of the line.
Established in 1988, the Motion Pictures Expert Group (MPEG) was formed to specify digital-coding schemes for audio and video at low data rates. Its most well-known creations are MPEG-2 video, sanctioned by the DVD Forum, and MPEG-1 Layer III, also known as MP3, the current fave of the download crowd. Over the next few years, MP3 audio may be dethroned by MPEG-4 AAC, the advanced audio codec developed to improve quality without backward-compatibility restrictions imposed on prior codecs. AAC was designed to provide quality audio that is indistinguishable from a lossless parent file by the majority of listeners. The interesting thing is, the target data rate specified in the mandate was 320 kbits/second for five full bandwidth channels!
Though MPEG-2 AAC was developed awhile ago, it was chosen as the basis for sampled or natural audio in MPEG-4. If you've done any listening tests on lossy codecs, then you know that at high rates they all do an okay job. Some actually sound quite good. However, at dial-up data rates, most codecs fall on their faces, and that's where some help is needed. As a recovering MP3 basher, I've learned that the equation isn't MP3 = the death of quality. Rather, as in all things in life, it's a trade-off. The real questions are which codec do you choose, which sample and data rates do you encode, and whether or not pre-processing is warranted. With the exception of preprocessing, MP4 provides a wide range of solutions to the low data-rate dilemma.
MPEG-4 comprehensively describes methods to represent content, and the audio tools available are as varied as the range of distribution methods. Building on previous efforts, the MPEG-4 group has given us geeks more of everything. MP4 is:
- More efficient. The music codec in MPEG-4 is designed to operate at around 64 kbit/second (kbps) per channel. To give you some real-world perspective for that number, MP3 stereo at 192k VBR (96k times two) sounds pretty darn good to me; AAC at Main profile does about the same job at 128k. 'Nuff said. Save space when storing or save bandwidth when streaming; it's your choice.By the way, MP4 AAC can handle from one to 48 channels and includes both downmix capabilities and default channel configurations, including 5.1 multichannel in Dolby's Mode 6 (C|L|R|LS|RS|LFE) assignment. Having all of those channels available means that you can, for instance, do spiffy multimedia presentations with separate mix-minus, VO and effects tracks.
- More scalable. Because MPEG-4 audio allows for a wider range of bit rates, quality can be matched to a wider range of applications. In conjunction with the increased efficiency, applications such as transmission over wireless data networks, Internet streaming, digital audio broadcasting and advanced portable players are more practical.Transmission over best-effort protocols like IP will not cause buffer underflows because the decoder adapts by simply scaling back on the quality, usually by reducing the audio passband when it's starved for data. Another tool available, usually for speech transmission, provides that scalability. The TwinVQ coder is a good example of the adaptive abilities to encode several partial bitstreams, which can be decoded alone or, if the sustained data throughput is high enough, in concert for higher fidelity.
- More modular. MPEG-4's “object-oriented” approach to content delivery means optimal encoding for each data type. Content creators have a broad range of methods to code audio, though software vendors have yet to bring mature production tools to market. One of the new additions to the MPEG-4 audio toolbox, along with long-term prediction and bit rate-scalability tools, is Perceptual Noise Substitution, or PNS, a feature designed to further optimize bit rate efficiency.PNS is based on the observation that, perceptually, all noise sounds about the same. This means that the actual fine structure of a noise signal isn't that important. Rather, the bitstream just transmits a region of frequencies as noise-like; additional information defines the total power in that band. In the decoder, a randomly generated noise will be inserted into the appropriate spectral region according to the power level.
- More extensible. Not locked into the limits of current technology, MPEG-4 can grow as new developments emerge. As the president of the MPEG-4IF says, “The object-based MPEG-4 standard is both state-of-the-art and future-proof; it can easily incorporate improvements in technology, if and when they materialize.”
- More cooperative. Sorry to break the news, but MPEG-4, though a worldwide standard for audio, is more important for video and multimedia. Though preliminary testing and my experience indicate that MPEG-4 won't improve on existing proprietary video codecs like Sorenson, it does produce a much better quality image than MPEG-1 and has the ISO stamp of approval to boot. That, in turn, will go a long way for widespread market acceptance as was the case with MPEG-2.
Designed with interoperability in mind, MPEG-4 is meant to be wedded to MPEG-7, an emerging deep-metadata standard to describe content. Together, they will work more graciously with DRM (Digital Rights Management) and interactive presentation infrastructures (DTV anyone?) as all of this stuff matures. This will reduce confusion for consumers. As an example, MPEG-4 includes a set of standard interfaces to proprietary rights-management systems. If you access protected content, then the MPEG-4 bitstream should contain the information needed to obtain the correct unlocking software.
If you're still awake, you may have noticed that I snuck in the “sampled” qualifier back in paragraph three. The reason is that, along with so called “t/f (time/frequency) coders” for music and speech, MPEG-4 audio includes tools for synthesized audio among its data objects or types. MP4-SA, or “Structured Audio,” relies on the decoding infrastructure to algorithmically create synthetic programming from very compact instructions. If this sounds like MIDI, you're not far off. MIDI and wavetable synthesis are also supported.
Well, there's lots more to cover, but that's all for this overview. For those wanting to test the video capabilities of MPEG-4, DivX 5 has been available for a while. Though audio tools are a bit more rare, www.AudioCoding.com has some Win source code and a Winamp plug-in. For cross-platform fun, QuickTime 6 should be out of beta testing so that everyone can begin to hear the benefits of MPEG-4 audio. RealNetworks has also adopted a strategy of interoperability to combat the balkanization of the Web that Microsoft envisions. In a future “Bitstream,” I'll dig into MPEG-7, the future of metadata and MPEG-21, so stay tuned.
In the long term, MPEG-4 will significantly impact us, so let me know if you'd like more depth on this subject by dropping a line to email@example.com. See you next month!
OMas provides tech help to a wide variety of media mavens. In his quieter moments, this column was decoded while under the influence of Nusrat Fateh Ali Khan's Shahen-Shah, along with the classic strains of (Who's Afraid of) The Art of Noise. Links and other useful arcana relating to this month's “Bitstream” are lurking atwww.seneschal.net.