In the spring of 1995, I had no inkling as I entered the world of game sound design that I was about to experience what was popularly referred to back in the ’90s as a “genuine paradigm shift.” With a solid background in music, film and video production, I expected to cruise into the game world with only minimal new techniques to learn. Do not make this mistake: Cobras and grizzly bears both have two eyes and sharp teeth, but that’s as far as the similarities go. Though sound design for games has increasingly made use of traditional post-production techniques, it is, nonetheless, a different animal with a host of new challenges and hurdles.
Although there are many kinds of interactive media, games are generally the toughest to create, so this article will focus primarily on the challenges of game sound, with some examples of the day-to-day reality. Sound for the Web is an entirely different kettle of fish that I shall judiciously ignore, discretion being the better part of valor.
The single largest factor shaping the process of creating sound design for interactive media is unpredictability. You can never be certain what a player will do next, unless you have limited the options to the point where the game’s not really much fun to play. Attempting to respond to the player’s actions as immediately as possible almost by definition means that the computer, its peripherals-and, by extension, the programmers, sound designers and other game creators-are constantly up against the wall, banging into technical limitations to the desired game experience.
Primarily, technical limitations manifest themselves as competition for computer resources. CPU processing power, bandwidth from the disk on which the game is running, RAM and storage space on the game’s distribution medium are the Four Horsemen of the Interactive Apocalypse, though there can be other horses in the race.
Console platforms, like those from Sony, Sega and Nintendo, present different versions of the same challenges that crop up for games on desktop computers. In particular, memory space is generally much tighter for console games, but there may be more real-time processing available.
BUILDING BACKGROUNDSThe objective for the interactive sound designer is to immerse the player in the universe of the game. From a sound standpoint, this means generating an appropriate ambient environment and creating sounds for interactive elements.
When building ambiences, the designer has to take into account that the player may remain in the space for only a few seconds or hang out for 15 minutes. In the real world, standing in one area for 15 minutes is likely to reveal an ambience that is fairly constant in character but has continual variation that includes both periodic and random elements-in short, infinite variation.
Given that there may be dozens of environments in a game, the challenge is to create ambiences for all of them so that they remain interesting for the hours the game might be played at a single sitting. Immediately, technical limitations start framing the response. Often, the resources allocated for all sound and music tasks amount to 10% (or less) of the CPU’s processing power, a relatively small chunk of memory and a decent, but not huge, amount of space on the game’s disk(s).
In these circumstances, the brute-force approach clearly cannot work: 15-minute stereo ambiences at full bandwidth for two dozen spaces adds up to several Gigabytes of data-too much to distribute as part of a game. The advent of multichannel game sound does nothing to diminish this problem.
Even if lossy compression is used to reduce file size, which can be a useful strategy across the board, disk bandwidth is a stumbling block. Playing a full-length ambience requires continuous streaming-not an easy thing to assure when competing with art and other data that has to pass through the same pipelines. Typically, some streaming can be supported, but the number of streamed channels may be very low on many games; sometimes only a single stereo stream can be played on an ongoing basis.
Another popular strategy is loops. Ambience loops have to be fairly long, at least seven or eight seconds, or they will become obnoxious quickly. Loops are often loaded into memory, another resource for which there is great competition. Variety might be gained by making multiple loops that are variations and then periodically swapping them in and out of memory.
Of course, anytime you modify an ambience, the transition must be smooth, which implies clean and, sometimes, sophisticated crossfading.
With a basic bed established, the ambience can be populated with “spice” elements in a number of ways. For instance, in a factory, you might have background loops of general machinery and down-the-hall “echoy” activity. If there are machines the player might move around (but not interact with), small loops could be made for them, with the playback volume of each linked to the player’s proximity.
To add some randomization to the ambience, other short files could be loaded into memory, and a randomly chosen one could play every so often as a “one-shot.” In the factory example, there might be a door slamming down the hall, a loud “clank” and an unintelligible P.A. announcement, one of which is played every 40 seconds or so.
A fairly small number of ambience elements can be leveraged with whatever real-time processing is possible. Volume is the most obvious, but lowpass filtering and pitch-shifting are also sometimes available and can be used quite powerfully to these ends. Sound cards with real-time DSP, such as Creative Labs’ SoundBlaster LIVE!, have begun to penetrate the market and are supported by interactive environmental processing managers like Creative’s EAX (Environmental Audio Extensions) to Windows’ DirectSound. Real-time environmental processing, which usually includes filtering to simulate occlusion, will soon significantly enhance the quality of ambiences in interactive products.
HE GOT INTERACTIVE GAMEAside from ambiences, interactive media must incorporate sounds for those parts of the game with which the user can interact. These break down, roughly, into the object and interface sounds. Interface sounds convey some information to the players, such as they have picked up some points, “power” or object. These sounds are heard often and must be kept in memory a lot of the time. To conserve space, as well as to keep them from interfering with the flow of gameplay (which can be very fast-paced), interface sounds are generally kept short. They can’t attract too much attention, must convey the message to the player, yet be of a sufficiently distinctive character that it is not mistaken for any other such sound.
Object sounds present a different challenge. Like interface sounds, they are generally kept short to reduce bandwidth and memory demands. Some objects are simply functional and do not carry any significant meaning-for example, an elevator. To accommodate the reuse of the sound for different elevators, or for a player who might change the speed of the elevator, the sound will typically consist of three elements: start, move and stop, where the move sound is looped. These may be three different files or one file with markers in it.
Similarly, the simulation of walking on various surfaces is achieved by delivering files of each kind of footwear (boots, high heels, etc.) and for each kind of surface. Variation can be achieved by supplying several files for each kind of shoe and surface and mixing them up on the fly. For instance, one might deliver ten files for boots on concrete, five of which are intended to be left footsteps and five right, with instructions to the programmer to randomly choose one left and one right for each pair of steps.
Many objects are intended to have a more significant role in gameplay, but the message they carry is often more implied than stated. When the player interacts with such an object, the information conveyed is usually about the object’s behavior, and it is up to the player to interpret the implications. The simplest case is a weapon in a shooting game. In addition to some sort of “pickup” sound that indicates the player’s acquisition of the weapon, there is usually a “cocking” or other arming sound, then one or more firing sounds. When players hear a weapon being armed, they know it’s ready to use.
When an object is part of a game puzzle, it can be presented much more subtly. For example, a player might encounter a small forklift and try to start it. The puzzle may require that the player make four attempts to start the forklift before its engine finally turns over and runs. In that case, the sound designer could make a “failed attempt” sound and an “engine finally turns over” sound, but it provides a better clue if, for example, the same “fail” sound is used for the first two attempts, but a different “fail” sound that comes closer to the engine turning over is used for the third attempt. Thus, the player is encouraged to try it the magic fourth time. Using the same sound the first two times keeps the puzzle from being too easy, as well as saves a tiny bit of memory space.
BALANCING ACT: THE REAL-TIME MIXIt cannot be known ahead of time what a player will do at any given moment, so you can’t really premix all the elements of the game’s sound in perfect relationship to each other. In fact, any point within the game may sound different given the variables. For example, a player may be in a quiet room all alone at one moment, then activate some machines or other noisemaking objects, then be attacked by enemies. Typically, the music will change in intensity and tone to reflect these conditions. To complicate things even more, the game engine itself is the final determinant of the volume at which any sound is played.
So, the mix cannot be predetermined, may vary with circumstances and is ultimately controlled by a “foreign” (to the sound designer) agent. Given all this, how is the game mixed?
Currently, the answer is usually empirical: The sound designer must play the game and experience the elements in context, then tweak them to get the best compromise fit. The simplest tweak is changing the level of the file that is delivered, but there are various schemes, usually hatched by collusion between the sound designer and game sound programmer. This may take the form of markers in the file (which are interpreted by the game engine), algorithms that factor in circumstances, ducking or whatever other clever schemes can be concocted without requiring too many resources.
One advantage game sound has over sound in other media is that you can predict the position of the listener (in front of the computer monitor or TV). Until fairly recently, only 2-channel surround technologies were practical to use. Dolby Surround, or a simulation using out-of-phase information, was the first available surround technique, but the field then filled up with a variety of “3-D” schemes from companies like Q-Sound, Aureal, Sensaura and Harman International. Recently, true multichannel reproduction has begun to penetrate the market in cards like the SoundBlaster LIVE!, which has a pair of 2-channel outputs. There is also some activity around 5.1 schemes such as Dolby Digital. The 3-D sound companies have scrambled to make their programs capable of true multichannel sound, while retaining their original 2-channel implementations.
All of these possibilities have gotten somewhat messy to deal with. From the basic game engine standpoint, an object’s position in a 3-D game is specified by a simple set of coordinates. That doesn’t change, no matter what the surround playback scheme is. But how can the sound designer know whether the player is using two “multimedia” speakers, headphones, a set of four speakers (plus a subwoofer) or even a full-bandwidth 5.1 system? From the programming standpoint, some technologies require calls to be inserted to divert 3-D sound coordinates to the desired hardware and/or software (although this is one of the things EAX seeks to obviate). Much of this burden falls on the programmers, but sound designers must keep surround in mind.
In the end, most game sounds have very little dynamic range. As with very thick music or film mixes, limiting the dynamic range of any individual element yields better consistency once a basic balance is obtained. The mix, then, is a time-consuming process of playing the game and tweaking. No way around it.
I GOT ONE FOOT ON THE PLATFORM…A single game may be released on several platforms, including versions for Windows, Macintosh and one or more consoles. While games for Windows or Mac must run on a variety of machine configurations, consoles can often be the most complicated for developers. In addition to the memory constraints, there is often a specific data format into which sounds must be put, and custom development systems that must be used. When a game is being developed for an as-yet-unreleased console platform, the development systems may not even be available until the process is quite well along.
One of the main challenges of game sound design is the frequency with which game engines change. Although console platforms are more stable in this area because of the comparative infrequency with which new platforms come out, desktop computer games change engines often with every new product or two. The sound designer must suss out, each time, the abilities, limitations, idiosyncracies and workarounds of each new engine. As if that weren’t enough, it is a moving target: The engines develop during the course of the development cycle, and new bugs are always introduced with each addition.
In all cases, issues of file formats, sample rates and bit depth must be resolved. Sometimes it is necessary to put some sounds at different sample rates in order to save space, but not all sound engines accommodate this gracefully. File format issues can heavily affect the particulars of sound design. One example is loops: Though technically capable of containing loop markers, .WAV files never do because you’d be very hard pressed to find off-the-shelf tools that can read and write them properly. If a game engine demands .WAV files, the elevator sound example given previously would have to be in the form of three separate files. On the other hand, it is common for tools that read and write AIFF files to be able to handle loop markers within them. AIFF is also a good choice because it is in common use on Mac and Windows machines, as well as by some of the console systems. Sound Designer 2 files, on the other hand, are easily handled only on the Macintosh and, so, are not desirable for most game applications.
PLUCK HIM ONCE, WHY PLUCK HIM AGAIN?It is often forgotten that games and interactive media are software and, as such, their development process is in many ways closer to creating the next version of a word processor than it is to a feature film.
For example, sound effects for film are primarily created after the picture has been shot and edited. Although there may be a number of edits after the sound designer has received the picture, it is essentially done and may be designed to directly. Once the sounds are designed and edited, the film goes to mix, and that’s all she wrote.
Games, in contrast, require a much more parallel development process, where it is not unusual for sound to be required before the picture (or objects) it is intended to portray even exist! Just as the sound designer can mix only by playing the game, the game designer can only build the game by having some sound in it. The solution is for the sound designer to ship a group of rough approximations, commonly called “placeholder” sounds. (I suppose this is somewhat analogous to having “temp” effects for initial screenings of a film.) These sounds don’t have to be finished, or even be the right sound at all, but you’ll save yourself some aggravation at the end of production if you get close near the beginning.
The upside of this craziness is that the sound designer will usually have time to make at least one (and often more) revisions to many of the sounds before the product finally ships. Another plus is that, no matter how long the sound designer takes to complete the job, the programmers will take longer. There is no such thing as bug-free software, so programmers keep working on the game until it is considered acceptable for shipping, at which point it is yanked from their hands. Typically, sound reaches a very acceptable condition well before the rest of the game.
I KNOW IT’S HERE SOMEWHERE…The unit of currency in interactive media is files. Lots of them. Lots and lots and lots and lots of files. Large adventure games can easily have 6,000 or 7,000 voice files alone, not to mention sound effects and music files. Each of these files may go through several revisions. Clearly, file management and tracking are critical. Databases are absolutely necessary, and it usually falls to the sound designer to design and create them.
Directory structures go hand in hand with databases in keeping track of what is where, what has been delivered, what needs revision and when approval has been given for sounds. A typical sound-effects directory structure might feature subdirectories of each major sound type (e.g., weapons, Foley, interface, doors and hatches, vehicles) for sounds used all over the game and perhaps individual subdirectories for sounds used only in specific places.
Batch processors and mirrored directory structures are two primary tools for dealing with the quantity of files involved. Unfortunately, many programs that work well with ten to 1,000 files seem to get tripped up or just plain break down when asked to handle 5,000 files. It often comes down to creating custom tools; I can’t tell you how many AppleScript utilities I’ve written to accommodate this sort of need.
THE PERSONAL AND THE INTERPERSONALThe dirty little secret in the game world is that one rarely knows what the hell is really happening with a particular project. It’s a “ride the tiger” proposition all the way down the line. Having a sense of when a milestone will be hit (as opposed to when one is told it will be hit), being alert to small asides dropped by other project team members that may reveal important changes in the project, judging one’s degree of completion of a job that may be changing as one goes-these are scars of experience gained by the accomplished sound designer in the field of interactive media. Personal task management is a difficult skill to acquire under these circumstances, but it is a survival skill.
The fluidity and parallel nature of game development dictate a high degree of interaction with other members of the project team. Often, this interaction takes the form of the sound designer trying to get information or action from fellow team members who are up to their buttocks in large-toothed reptiles. Powers of persuasion and building cordial relations are generally more useful here than screaming, though screaming is sometimes necessary.
Tackling all the variables comprehensively is simply not humanly possible; reaching reasonable compromises takes most people right to the limits of their abilities. Less ambitious projects have a bit of an easier time of it, but it always ends up feeling rather like juggling chipmunks. As frustrating and, sometimes, infuriating as the problems can be, the challenges faced by the sound designer in interactive media are those that come with pioneering any frontier, and there is hardly a more exciting intrigue to be found anywhere in the world of sound.