Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


The Engineer’s Perspective



Online audio is a huge subject. There are the moral/ethical/legal issues that incite courthouse brawling between major labels and dot-com upstarts. There are the business issues, with their attendant glut of buzzwords like “e-commerce” and “bricks-and-mortar.” And then there are the technical issues, such as bandwidth, telephony infrastructure, perceptual coding and a host of acronymic codecs.

But from an audio engineering perspective, the only issue that there isn’t enough talk about is audio fidelity. In the rush to squeeze big sound through small pipes, we’re losing the battle to preserve the audio quality that we struggle so long to achieve in the studio. But that’s neither unique to the Internet—think AM radio or the 7-inch 45—nor something over which we have much control.

Eventually, ubiquitous access to broadband and further improvements in coding techniques may minimize the fidelity gap between the Internet and prerecorded media (CDs, DVD-Audio, etc.). In the meantime, the Internet is here to stay as a medium to disseminate sound, no matter what we may think it does to that sound. And its influence on the music business—and thus the recording industry—continues to grow. So there’s no time like the present to examine online audio from the professional audio perspective. If you are already deeply involved in audio Internet applications, then you’re probably already familiar with much of the information below. If not, then here’s your chance to get up to speed.


Identifying why you or your client wants to use audio on a Web site in the first place is essential to make the right choices, in terms of how to do it. The opportunities include promoting and selling your own work, as well as doing audio work for someone else’s online enterprises.

The most common type of site on the Web is the promotional site, and it’s as valuable for music and sound professionals as any other line of business. If your site has anything to do with music or audio-related services or products, then it’s natural that it includes not only text and graphics but also sound. In particular, for studio owners, producers and engineers, performers, composers, sound effects designers and voice-over artists, the Web can serve as a kind of online demo tape, a convenient way to expose your portfolio to potential clients.

Online music’s promotional function is handled by posting samples that visitors can click on and listen to as they browse the site. Even if the samples are low-fi, at least they’ll give a suggestion of what you can do. In the core music business, it’s a very effective way for new artists to reach listeners who might enjoy their music, and it lets established artists give fans a taste of their latest work. It also helps music lovers to explore music that is not otherwise promoted, such as back catalogs.

If you have finished sound products to sell—commercial CDs and tapes, “needle-drop” music, sound effects or synth patches, for instance—then you can take a promotional site one step further by allowing visitors to place orders from the site, or at least get an order form to e-mail or fax back to you. Independent artists, record companies and online music retailers are all using Web sites for direct sales of prerecorded audio.

As the MP3 phenomenon has shown, convenience and instant gratification are higher priorities for many people than getting the best possible fidelity. If the target users of your site fall into this category, then you can add digital downloads to give them what they want—without having to press Cds or duplicate cassettes. In this “electronic music distribution” model (EMD), the content owner isn’t actually selling a physical object embodying the sound, but rather selling (or giving) the buyer a license to copy a digital file of the material. Needless to say, this is the aspect of online audio that has proven most controversial, because not everyone agrees about who has the right to allow others to copy files.

Another type of online audio is Webcasting, where continuous sound is available to end-users in real time. Applications range from large news organizations covering big events to local radio stations simulcasting their broadcast signal. Webcasting is of particular interest if you’re involved in sound for live performances and broadcasts, because it dovetails well with the core package you already offer your clients. However, it requires specialized equipment and a different approach than typical Web sound applications such as playing samples from a page. See “Webcasts on the Rocks,” page 76, for more information.

Finally, the opportunity to hire your services out to others to create sounds that add interest to sites, even when those sites have nothing to do with music or audio, is offered through the Web. Flash animations—often with soundtracks—are an increasingly popular method to welcome visitors to a home page. There’s also the possibility of using ambient music to create a pleasant experience for visitors as they browse around. And then there are event-triggered sounds that provide feedback in response to user actions.


Once you determine what you want to do with sound, you can start assessing the various categories of online audio to see how well they apply to your situation. First you have to bear in mind that almost everything depends on bandwidth, the bit rate that data can be transmitted from point A to point B across a connection.

“Dial-up” modem connections top out at 56 kilobits per second. “Broadband” connections—generally either cable service or DSL lines for home users—offer much higher bit rates, starting at 256 kbps. Though the number of broadband connections has undoubtedly risen, early 2000 research from The Wall Street Journal found that about 95% of American households with Internet service used dial-up connections.

If your primary purpose is to enable digital downloading, then the bandwidth of your typical user’s connection is not a make-or-break issue, because the main purpose of downloading is to transfer audio files to the client in their entirety, where they remain available for listening—even when the client is no longer connected to the site. But even so, the bandwidth of the user’s connection and the offered file’s bit rate affects the decision of whether it’s worth the time to download.

Consider a four-minute song, which would take 42 Megabytes in 16-bit/44.1kHz PCM on a CD. Encoded to a 128kbps MP3 file, the song is reduced to a mere 3.84 MB. It would take 10 minutes (best case) to download with a 56k dial-up, which is 2.5 times real time. Over a 256k DSL connection, however, the download time drops to two minutes. This seems to be among the reasons that downloading is most prevalent on college campuses, where dorms are frequently served by high-capacity connections, such as T1 or ISDN lines.

With streaming audio, the listener doesn’t have to wait for the entire file to be transferred onto the client system. Instead, sound begins playing from a data buffer on the client’s hard drive as the file is transmitted from the server to the client. When the bandwidth of the connection is greater than the bit rate of the file, you’ve met the main criteria for streaming audio.

Depending on the software used to play streaming audio, old data may be overwritten by new data as it is recorded into the buffer, meaning that a complete copy of the file might never be present on the client’s hard drive. In other words, the listener might have to be connected to the server to hear the file.

Webcasting is one form of streaming where the point is generally to listen to the audio as it streams, rather than retaining a copy for subsequent listening. Like tuning in a radio station, the listener selects an audio program that is fed in a continuous stream from the server to their local machine. Perhaps the most common form of streaming in the music business, however, is “on-demand” streaming that is used to play samples of songs in response to listener’s clicks when they browse a label or artist Web page.

Though streaming and downloading both involve prerecorded performances in the form of digital audio files, ambient music on the Web is well-suited to a MIDI-based approach using software synthesis. With the Beatnik editor, for instance, you can transform MIDI files into Rich Music Format (RMF) files, which play back via the Beatnik plug-in for Internet Explorer (Windows only) and Netscape browsers. RMF files download before they play. But because MIDI data is tiny compared to digital audio, the files start playing almost immediately, using the plug-in’s built-in General MIDI sounds. RMF files may also include digital audio, which you can use for event-based sound effects.


As with other forms of audio data compression, encoding for the Web involves what is known as “perceptual coding.” A “codec” (compression/decompression algorithm) makes the file smaller by discarding parts of the audio information. The elimination of audio material is based on assumptions about what information the human ear is least likely to notice. During playback, the codec decodes the data and re-creates an audio signal for the listener.

Because a codec determines how audio data is compressed, it also determines how it sounds when it is decoded. Codecs may be more or less clever at choosing what data to discard, and there certainly have been major advancements over the last few years. In general, however, a higher bit rate still means better fidelity.

Most codecs allow audio encoding over a range of different bit rates to support transmission over connections of different speeds. For instance, if you’re creating 30-second song samples for an artist’s site, then you can make one set for dial-up users and another higher-fidelity set for broadband users. Depending on the file format, these two versions could be incorporated into a single file. The playback software for that file format can use “bandwidth negotiation” to figure out the connection speed and stream the corresponding file. Popular codecs today include RealAudio, which is primarily used on audio streaming, and codecs that create files at bit rates appropriate for streaming or downloading, including MP3 (MPEG-1 Layer 3), Windows Media Audio and MPEG Advanced Audio Coding (AAC).

In addition to audio data, many file formats support information for display during playback, in both text (artist name, song title, composer, etc.) and graphic format. Some file formats, including Liquid Audio and Windows Media, let file creators (presumably the owner of the material) define the circumstances that allow an end-user to download—for instance, after entering credit card information. (It is the lack of both of these features, particularly digital rights management capabilities, that makes MP3 a poor format for electronic commerce in music.)

File formats are designed to work hand-in-hand with audio “players.” A player runs the decoding codecs for the file formats it supports, and handles the processing and display of any additional information included in a file. Most players handle playback of downloaded files; some also support streaming. To stream, a player must manage—perhaps in concert with the listener’s Web browser—the real-time flow of data between the server and the client, which is not required for digital downloading.

Software players are plentiful, each supporting various file formats and codecs. The widely used RealPlayer line from RealNetworks supports streaming of RealAudio files, and the company’s RealJukebox plays downloaded files in a variety of additional formats, including MP3, Liquid Audio, WMA and .WAV. Liquid Audio files (using AAC, Dolby Digital and MP3 codecs) are also supported by the Liquid Player, as are files in MP3 and .WAV formats. Apple’s QuickTime supports a variety of formats, including codecs from QDesign for music and Qualcomm for voice. Windows Media Player supports WMA, MP3 and Voxware. Many other players are also available, and the range of supported formats and codecs is continually evolving.

Players may also be stand-alone hardware devices. For a couple years now, you’ve been able to transfer audio files to Walkman-style portables such as Diamond Rio, RCA Lyra, Creative Nomad and Sony Music Clip. There are also handheld computers from Compaq, Casio and HP (all using Microsoft’s PocketPC operating system) that play Windows Media and MP3 files (the Casio can be fitted with a wireless modem for streaming on-the-go), plus Internet radio devices from companies such as Kerbango, Sonicbox (iM Remote Tuner) and Akoo (Kima). Wireless phones with streaming capabilities are also reportedly in the works from companies such as Motorola.


Although connection bandwidths today generally don’t support high fidelity, there are, however, a few things one can do in advance to mitigate the damage to the original audio. The key steps are filtering, EQ, normalizing and compression, though the specifics of how these are used depend on both the content of the source file and the target format you are optimizing.

For streaming over dial-ups, the audio preprocessing may be quite radical compared to that used in premastering for CD. Low- and highpass filtering reduce the audio bandwidth of the signal, while EQ can strategically bring out frequency ranges that seem lost in the encoded file. Normalizing ensures that peaks are close to (but not over) maximum, and compression then brings all the music below the peaks up into the higher end of the dynamic range, where it is less likely to be discarded by the perceptual coder.

Unfortunately, you generally can’t directly hear how your preprocessing will affect the sound of a given encoded file; you have to encode it and then listen back. Because level, dynamics and frequency content interact to affect the outcome, truly optimizing for the best possible encode can be a time-consuming process, one best done on a song-by-song basis. However, most organizations that have large quantities of material to encode either don’t preprocess at all or use general presets rather than custom optimizing each song.

Given these circumstances, how can producers and artists ensure that their music is heard at the best possible fidelity in each medium, including over the Internet? By insisting (contractually or otherwise) that they have the same input into mastering the final product for the Web that they have over mastering their releases for CD. Instead of letting someone else throw your music into a batch-processing run with thousands of other files, you can deliver your own optimized, encoded files for each codec and bit rate where the music will be made available online.

Besides developing a “Web mastering” market for studios and engineers, custom-optimized files will also give fans more reason to pay for authorized downloads rather than settle for unauthorized MP3s “ripped” (off) from CDs. By focusing on the “traditional value” of fidelity in music, artists and labels may help keep new online audio technologies from running the music industry out of business.


AAC: The MPEG Advanced Audio Coding codec.

Bandwidth: Data transfer rate, usually expressed as kilobits or megabits per second.

Broadband: A high-speed, high-capacity transmission channel.

Client: An individual computer that is connected to a server on a network.

Codec: Compression/decompression algorithm, generally used to reduce the amount of data needed to transmit and/or store a given type of information, such as digital audio or video.

Dial-up: Internet connection using a modem over a standard telephone line, generally at bit rates of 28.8 to 56 kbps.

Digital Download: Copying a file from a server to a client.

Digital Rights Management (DRM): Mechanisms for controlling the exchange of intellectual property in digital form over the Internet or other electronic media.

EMD: Electronic Music Distribution, see digital download.

MP3: MPEG-1, Layer 3 audio.

On-demand streaming: Streaming individual files that are posted on a Web site (rather than Webcast).

Perceptual coding: A data-reduction technique where some source file audio information is discarded, based on human sensory perception models.

Player: A software program or hardware device that runs a decoding program for encoded files and handles additional display information (if any) included.

Secure Digital Music Initiative (SDMI): A record industry initiative to promote the protection of intellectual property rights in digital media.

Server: A specialized computer for storing files and distributing them to client computers over a network.

Streaming: The playback of files on a client as they are transmitted from a server. Data is copied from the server into a buffer on the client; play-back begins as soon as there is sufficient data in the buffer, without waiting for the entire file to be transferred.

Webcast: A program fed in a continuous stream, often live, from a server to a client.