Sound IntegrationTHE NEXT REVOLUTION WILL BE PLAYED 3/01/2007 7:00 AM Eastern
A revolution is taking place in the game world: increased budgets, more hardware resources on target platforms and more high-level software integration tools that are being developed for designers and composers. In the past, audio integration was once solely within the grasp of programmers. It was common practice to have a sound designer create content and then hand it off to a programmer, never to hear it again until the game was released. Although this practice still exists in some ways today, the process is changing as tools for designers become more refined.
The increase in overall game budgets has trickled down to the audio production pipeline, as more money is being spent in all facets, from building quality studios to conducting location recording, Foley work and live music sessions, ensuring that games have a level of sonic quality control that rivals film.
The latest generation of game consoles and PCs are all about sheer processing power. This, of course, translates to more audio! From voice counts to output channels and effects processing, current game systems have broken down many barriers that once existed in game audio.
To dig deeper into the issues surrounding game audio integration, we rounded up a group of pros on the frontlines of audio development for some of the hottest game titles being released. For more on our participants, see “The Players” sidebar on page 54.
If you are asked to create specific sounds, what types of sounds do you primarily focus on? Also, what are the challenges associated with the sounds that you are primarily integrating?
Phil Hunter: I dealt with the speech files for Carbon, and integration occurred constantly throughout the development of the project. The greatest amount of my time is spent designing, recording and tweaking the speech. Since Carbon was launched in six languages, integration did take its toll toward the end of the project, as each language contained over 11,000 individual lines of dialog.
Robb Rinard: Our company [2XL Games] is currently focused on creating racing titles. The Number One most time-consuming audio production element comes from the creation of the engine audio. There is a broad range of performance across all the different vehicle types we're including, so I've recorded everything from 30 hp go-carts all the way up to 1,000 hp open-header sand rails. The biggest problem is the mix getting really thick. It's one thing to listen to a high-fidelity race car engine by itself. But once you surround yourself with 10 other vehicles, it's challenging to keep the overall mix sounding good and keep the player vehicle sitting on top of the mix at all times.
Do you have dedicated programmers for audio integration? If so, how often do the designers interact with programmers?
Rinard: Yes, and in our case the audio programmer is also the sound designer and field recording person. I've worked on past games where the audio designer was not a programmer, and that's always a challenge. It's rough when the sound designer has a vision for doing something cool in real time, but the programmer doesn't quite understand that vision and is struggling with the implementation.
Our latest challenge has been connecting the vehicle to world. In last-gen games, the vehicle typically only has an engine sound, but that's not enough to connect the vehicle to the world in which it exists. Now that we have enough console power and memory, we are adding a host of secondary layers of Foley to the vehicles. These items include a simulation of all the other sounds that the vehicle makes as it moves through the atmosphere, such as tire noise and airflow over the surface of the vehicle. Also included are suspension compression, bottoming out, sliding, skidding, gravel sounds, et cetera.
Hunter: The sound designers and programmers work together on a daily basis throughout the entire project. It's truly a combined effort — we throw ideas against the wall and they tell us what sticks, and vice versa. Having that balance of sound/artist creativity and “coder logic” keeps the ideas fresh and fluid. For example, as the cop speech in the previous Need for Speed title [Most Wanted] proved, a well-designed speech system can significantly enhance the gaming experience for the player. Having the audio programmer and speech designer working closely together from the very beginning eliminates many of the problems that can occur early on. This enables us to focus on getting the player more emotionally involved, which, of course, makes for a better gaming experience.
Are you using a proprietary or middleware solution for your audio implementation needs? What sorts of features do you look for in a toolset/engine?
George Valavanis: Having strong middleware alleviates the need for audio programmers and streamlines the production process so that the sound designer/composer is in complete control of all sound assets in the game. Wwise, from Audiokinetic, is fantastic in that it minimizes the dependency on our engineering department and maximizes my level of control. It has a construct called Real-Time Parameter Controls (RTPCs) that I find very useful. RTPCs are essentially variables that can be mapped to virtually any audio parameter in the tool. Wwise is basically like working with a real-time DAW, only slightly more abstract.
Nick LaMartina: FMOD, created by Firelight Technologies, provides a GUI and work methodology that very closely resembles common sequencers and NLEs, so working with it is a very simple, transitional and familiar experience. The thing we all like the most about it is that nearly all of the behavioral and structural data associated with the sound events is handled in the graphic Designer tool, so audio decisions can be made by audio people and programming decisions can be made by programming people.
Ideally, the programmers will only need to step in if something goes wrong, but we'll see how that goes in the future.
Stephen Miller: The tools and engine that we use are completely done in-house, giving us a customized solution so we can be more efficient with how we want to work. The proprietary solution we used on Path of Neo was both a GUI and a scripting language. Importing sounds into the database to be used in the game was a GUI — and a fantastic one at that. It allowed us to import files from Pro Tools sessions and maintain the layers and timing we had created.
Rinard: For the low-level control of the audio hardware, we use the Miles sound system from RAD. Then we wrote an abstraction layer on top of Miles that allows us to manage the sum of the game's audio in a way that is similar to a mixing console. Our game engine has features that allow the audio designer to layer and mix just about anything you can think of, as well as position the sound in 3-D with total control. We can group collections of sound together into a virtual bus, apply effects to any bus and then feed all the buses into a master bus, apply more effects if needed and then ship it off to the hardware for AC3 encoding. On the Xbox 360, we are typically running between 128 and 160 mono channels of audio at any given time in a race.
Michael Smith: Most of my projects use one middleware solution or another. Sometimes it's Miles, lately more FMOD, but it ultimately depends on the project and the people making it. I prefer to have the programmer working on backside functionality and expanding on the tools. Unless a programmer has a really strong background in audio, they're not going to know how best to integrate the sounds.
The integration time depends a lot on the tools you have: If you have well-designed and well-tested implementation tools, the integration doesn't take long at all, perhaps 20 percent of the overall time. However, if the tools aren't so hot or are nonexistent, the integration will likely take much longer than creating the sounds themselves. It's not just a time issue, either. Weak tools also sap the creativity in the sound design process. It's kind of a soapbox issue with me. If the sound designer is dreading the integration phase, there will be less iteration in the sound design, so the sounds themselves won't be so hot. From there, the sound designer will be less likely to do fancy tricks that create that “wow” factor out of fear of breaking something or fear of learning an obtuse integration system.
What was the most challenging sound integration instance on a recent project?
Ed Lima: On my last project, the biggest challenge was designing and implementing a real-time ducking compressor on voice-over. The problem we encountered was that, having received final dialog assets relatively late in production, we found that the aggressive music and sound design mix left no headroom for voice-over to be heard. We put together a system wherein we tagged voice-over sound assets with a flag that would drop everything else by 4.5 dB. This produced some pretty good results overall, but we then went back and built a second system to bypass the ducking on specific sounds, such as specialized cut-scene sounds, some music cues and so forth.
Miller: One of the most challenging sound-integration tasks I have had to do was placing all the sounds to literally thousands of animations. Everything from different kinds of footsteps, cloth movement, whooshes and hits, to gun reloads and sword swishes. When the project was complete, there were over 40,000 sound entries on just the animations.
Overall, how do you approach mixing?
Lima: I think about the mix throughout the entire design process. I generally try to bake some slight equalization curves or tendencies into families of sounds. For instance, explosions might be bottom-heavy, voice-over might occupy a higher band or the music might be designed with specific instruments and frequency bands in mind.
Some of that seems pretty obvious, but if carried through the entire sound production effort, what I'll find is that before I start properly mixing the game, the sounds playing in-game can already be heard to a large extent residing in their own frequency pockets, regardless of their playback volume setting. From there I can start to tweak until I've got everything right where it needs to be.
Adam Boyd: The audio mix for Carbon was done by [audio director] Charles Deenen and myself. We use proprietary mixing software that allows every sound element to be individually controlled at run-time by the game data that is occurring at that particular moment. For instance, we can have the wind noise volume increase proportionally to the car's velocity. This does not need to be hard-coded and will change dynamically as the variables within the game change.
Other mixing systems might allow you to turn down all other elements — sound effects, music, et cetera — within a mix to allow dialog to cut through, but our system has much more detail and control. We can turn down only the elements that interfere with the speech, in terms of frequency masking, which produces a much cleaner and less “crude” mix. Ultimately, our goal is to have our games sound like feature films.
What are some of the challenges you face with surround sound? Also, do you use multichannel streams? If so, are the multichannel streams capable of being panned in 3-D?
Boyd: The biggest challenge with surround sound is to use it in a way so that it doesn't draw a lot of attention to itself. We tend to approach our audio design from a lowest common-denominator perspective, which means that we want our game to sound great with small speakers through an average TV, as well as have it sound amazing on higher-end systems. Surround is a great tool because it informs the player of where his opponents, or police, are in relation to them. It also greatly enhances the player's sense of speed by accentuating the rapid movement of objects around the car.
Many games tend to use too much surround, though, and in the typical living room setup, this can be very distracting. We try to use it in a natural, tasteful way. Regarding multichannel streams, we do use them for our pre-rendered sequences, but not in-game audio. The only multichannel streams we use in-game are environmental ambiences and music. However, every object that emits sound in the game has a position in the 3-D game space. This real-time surround panning is a fundamental component of our in-house audio engine. Through our dynamic mixer, we control what that position is relative to — the player car or the camera — and how these sounds are perceived over distances.
Miller: We often use 5.1-channel streams for movie sequences, but not for in-game. There are several reasons for this; one is we often need looping sounds in game for things such as ambience, but to my knowledge, only with the introduction of Pro Tools 7 was a multitrack editor capable of sample-accurate looping of a 5.1-channel file. Often, the payoff of using 5.1-channel ambience files isn't worth it for in-game use, anyway. Most of the elements we want to play back [are] in a 3-D position that stays fixed to the environment, creating something that the player can interact with, instead of a static scene that does not change position with the player's view.
The amount of data we would have to stream off the disc during the game is also a detracting factor for us. We can get more bang for our buck, so to speak, [by] leaving room to stream other in-game content, such as music or a large explosion sequence that would not otherwise fit into memory. Another problem with having a lot of 6-channel audio is that there is no multiplatform, loopable compression scheme for it. The Xbox 360 does support this with XMA, which is great, so hopefully others will follow this direction. This leaves us with disc space problems, however.
Valvanis: Zoo Tycoon is a PC franchise and our target market dictates our approach to surround mixing. We mainly focus on the stereo mix since most casual PC gamers do not have elaborate surround systems. Because our market is so large and there are those who have the proper systems, it's important to also spend some time on surround-sanity checks. I do a 5.1 pass using Wwise on everything before we ship.
One last question: How do you feel about 7.1?
Lima: Oh, man, let's get 5.1 under control first!
Michel Henein is a game audio consultant, sound designer and entrepreneur based in Phoenix.
Adam Boyd and Phil Hunter: Part of the audio team that worked on Need for Speed: Carbon for EA Canada, Hunter handled the speech design and Boyd was the audio lead on the project.
Nick LaMartina: sound designer at Cheyenne Mountain Entertainment who is currently working on the Stargate Worlds MMORPG (Massively Multiplayer Online Role-Playing Game), which is based on the hit TV show.
Ed Lima: audio director of Gearbox Software. Previously, Lima created the audio for 3D Realms' Prey and Id's Doom 3.
Stephen Miller: audio lead at Infinity Ward, creator of the Call of Duty Series from Activision. Miller recently worked on the Matrix: The Path of Neo game for Atari.
Robb Rinard: cofounder of 2XL Games and game designer of ATV Offroad Fury 2 and MX Unleashed. Rinard is currently working on all of the audio development and design of the next-gen racing title for THQ.
Michael Smith: Audio director at Sony Online Entertainment, Smith has worked on the EverQuest Series, Star Wars Galaxy and Matrix Online.
George Valavanis: audio director at Blue Fang and maker of the Zoo Tycoon series for Microsoft.