Roots in Music, Career in Film, a Life of Interactive Storytelling

On October 13, at the fifth annual Mix Presents Sound for Film & Television event at Sony Pictures Studios, Scott Gershin will kick the day off by delivering the keynote speech. His name might not be on the nominee list for Best Sound Editing at the Oscars each year, but his sounds have appeared across hundreds of films, a few of them nominees, since he entered the world of sound design in the early 1980s. He was a rock and roll musician out of Berklee College of Music with a deep understanding of the emerging MIDI standard, a great love of tone and an open, conceptual approach to storytelling.

Since then, few sound designers in the world can claim the variety and types of A-list credits across as many mediums as Gershin, from films such as Pacific Rim, Hellboy 2, Night Crawler, Chronicles of Riddick, Team America, Shrek, Book of Life and American Beauty to major games such as the Gears of War series, Epic Mickey, and Final Fantasy, the Resident Evil and Fable franchises. Gershin is equally at home in front of a roomful of game designers and implementers at the Game Developers Conference as he is on an Oscar sound design panel at the Academy of Motion Picture Arts and Sciences. He speaks the language of sound, music and technology fluently.

So when Mix went looking for someone well-respected with a foot in tradition and both hands in the future, someone who could address an audience of peers and offer compelling insights into the changing nature of workflow, the multi-release demands of modern media, the merging of the big and small screens, and the coming of virtual reality/augmented reality, we called Scott Gershin.

Scott Gershin in his newly revamped sound design mix suite.

Gershin in his sound design suite/mix room

From his early days at the famed Soundelux, where his understanding of the art and craft of sound design and sound supervision grew into full maturity, to his recent stint as head of the Interactive Group at Technicolor, and at all stops in between, Gershin’s curious mind and obvious talents have kept him at the leading edge of sound for picture. During our interviews in mid-September he was finally able to talk about his next venture, which has him thinking worldwide and every bit as excited as he was when he recorded and cut his first sound effects.

But first, about that keynote speech. We asked Gershin for a few preliminary thoughts on what he might like to say to a full house at the Cary Grant Theater, what he’s discovering as he bounces from Cubase to Pro Tools to the Wwise console and Fmod game engine—and what it means for storytellers, particularly sound storytellers, in the coming years.

Mix: Let’s start with the brief-bio portion of the interview, Scott, before moving into the future. Music provided your entrance into sound, I understand?

Gershin: I come from a music background, but it was slightly different in that I realized early on that not only do I love notes and melody, but I love tone. I’ve always been fascinated by how they got that sound of an orchestra. Early on, when people were listening to the Beatles, I was listening to Yes, to Emerson Lake & Palmer. So, I was always attracted to music that was rich in tonal structure.

Then when I first saw Star Wars, I thought, “Okay, I’ve now figured out what I want to do with my life. I either want to become a visual effects artist or I want to go into sound. The sound was so amazing. The two movies that set me on my path were Star Wars and Apocalypse Now. When I heard the fan blades morph into the helicopter while playing "Ride of the Valkyries" …. I just went, “Oh yeah.”

Those are two films that helped to redefine how sound could work in film.

Absolutely. I have since learned that sound design is used in two ways. One: creating great creative sounds that open up new worlds of possibilities, like the light saber in Star Wars, the communication and language of R2D2, Darth Vader’s ship-bys and all those amazing sounds that came out of that film. Then the other style, which is more Walter Murch, it’s not only the sounds you choose or design, but how you decide to use them. The Valkyries coming in off the helicopters. The fan blades becoming the helicopter. The accentuated footsteps of crossing a large room in Hudsucker Proxy. It wasn’t just about the individual sound; it was about how each sound is used to help tell the story.

Gershin outside the inflatable theaters of the "Marvel Experience Mobile Theme Park"

Gershin outside the inflatable theaters of the "Marvel Experience Mobile Theme Park"

To me, designing sound for a project is like a tango between dialogue, sound design/effects and music. In a music, sometimes it’s the sound of an instrument, and sometimes it’s the melody that stands out. The melody in sound design is how you decide to use, or not use, a sound within any given scene.

Here’s a good example: I supervised American Beauty, and when I interviewed for the job, the director, Sam Mendes, looked at me, looked at my credits, looked at me again, a little perplexed and said, “I’m a little confused. You’re an interesting choice for me—everything you do is big, bold and loud. This movie is not that.” So I said, “Well, let me equate this to music. You can think of me as a heavy metal guitar player, a musician, and that’s what I’m known for. But I’m good at jazz, too. So when I looked at the script, it’s not about what I decide to put in, but what I decide not to put in. It’s got to be a quiet movie, about negative space. Its not about quantity, it’s about quality. Like a fine wine, delicate… Every single sound counts.” Ten minutes later while driving home from the interview, I got the gig.

Quiet can be a lot harder to mix than loud, I understand.

Silence is the loudest sound I can use. In a soundtrack and mix, it always comes down to detail and making things easy to listen to. Just as a writer writes a script and a director of photography captures a visual, the audio team has different tools to use at different times of a project. What do I mean? Well, in any film or script you have an establishing shot, a visual. “Where are we? I see we’re in a room, but is that room in a modern city like NY or off a cobblestone street in old England.” So with backgrounds, which I love to create, we have audio's version of an establishing shot.

The next thing you have to do is glue reality making sure dialogue doesn’t just float against the music. Reality can come from many places, such as from sound effects and Foley, to having a voice and movement with the right amount of reverb on it so that you buy they are in a room, not a set. It’s about the phonograph in the corner of the room. Is it scratchy? Is it new? All these kinds of cues are part of the conversation, and they are all tools that the sound supervisor, mixers and director can use to tell the story.

Gershin recording group walla in a bull ring in Mexico for the movie "Book of Life"

Gershin recording group walla in a bull ring in Mexico for the movie "Book of Life"

I use sounds to create pace and timings. Once you understand the rhythmical arc of a scene, you can work to start sculpting the sound to extract emotion. Sound really is all about emotion, about how you feel. At the end of the day, as a sound designer, supervisor or mixer, our job is to be an audio psychologist, to understand in any given scene how these sounds will make an audience feel. Will you giggle, be sad, be scared or feel triumphant?

Related: Mix Presents Sound for Film & Television 2018, Sep. 6, 2018

Is that fundamental approach the same whether it’s a film or a game?

Whether I’m doing a big screen, an interactive title, a game, VR or AR, or a theme park ride, my job is to convey one of two things at any one time: the story and the experience. Some things are much more story driven with regards to a beginning, a middle and an end, and some projects are about the experience. “How would it feel to fly a spaceship?” It’s all about fantasy, about being someone else, about experiencing something else. I don’t care if you’re flying over the Grand Canyon like a bird or if you are making believe you’re a soldier in an interstellar battle fighting aliens. Whatever it is, it’s taking a person away from their daily life, for the time that you have them, and have them experience something.

Does that mean you’re presenting them with essentially one version of reality?

Our whole history has been ingesting our entertainment as a voyeur, from the other side of the window, whether it’s the proscenium of a theater or the glass window of a TV. We are always on the other side looking in. We watch what’s in front of us.

Now there are other types of technologies, which is more like what we used to do as children, where we would watch the movie and then go act it out on the playground. With games and VR, you no longer have to go to the playground. You can put on your goggles or play a game, and now you are participating. This is trickling into all forms of entertainment, and right now I think we are at a fork in the road like never before in terms of how we ingest our entertainment. We’ve always been a passive participant, and now we have the ability to be an active participant.

Gershin records airplane sounds for "Chronicles of Riddick"

Gershin recording airplane sounds for "Chronicles of Riddick"

What does that mean for a more traditional sound editor or mixer? What do they need to know before diving in?

In a film you have a fixed point from where the directors and filmmakers have decided they’re going to put the camera. In a game and in VR, that is now about, “Where does the user want to put the camera? Inside the car? From the back of the car? On the front hood?” The player has choices.

So again, each form of entertainment will dictate how the experience will be experienced. It’s not about what I want. It’s about what sonic perspectives and opportunities there are. So I now have to look at sound from all these positions.

In film, I build elements based on the perspective of the camera and a picture editor’s cuts, but in a game or VR, there are no cuts and no set perspective.

Here’s a good example of the difference. When I go out to record guns, I use many, many, many microphones based on perspective. In film, the sequence of events will always be the same, it’s a linear format. However, in games, when someone is running at me with a gun. I have to design for all possibilities. So in the distance, I could hear distant gun report. As an opponent gets closer, I’ll hear more perspective differences such as the thud of the gun shot, then as the opponent gets closer, a high-end crack. Then as it gets really close, I’ll start to hear mechanisms and bullet ejects. Or if I run away, that scene will play totally different. The user is controlling perspective.

In games there are parameters and conditions. I’m telling the game engine that when you get to this distance, I want to trigger these sets of sounds and not these sounds. Then at this distance I want to hear these, but not these, etc. I’m having to build the complexity as if I’m moving a listener anywhere within that sonic space—at a distance we trigger a set of samples, add acoustical attributes based on the distance, perhaps apply EQ and compression, and as it gets closer add different sets of samples.

Different acoustical applications change in real-time parameters based on the player’s perspective to any given object. These are things we would do automatically on a mixing console, but we’re now getting the computer to do it based on where our taste is, based on players perspective.

Remember, in games and VR, there is no static mix! And that is the tricky part, because the parameters of a mix have to start somewhere. It can be based on distance, it can be based on height. There are so many things you can define, and then you have to have the tools to help you. Like we have Pro Tools for linear, there are tools like Wwise, Fmod, and custom programs for interactive entertainment.

Let’s jump to the present-future. Before they had shuttered the rooms at Technicolor this summer, you had lines on a new gig, doing what you do and maintaining the variety. Can you talk about it now?

I have just joined a new company which will be announced shortly. I’ll be working out of the same room as when I was with Technicolor with the same crew, just a different name on the door. It’s a fascinating company in that it is very flat and spread around the world, with approximately 6,000 employees. It’s about the clients and the content, not about giant structures in a centralized place. We want to be where our clients are. L.A. is a new territory for them. They come from the gaming world supplying localization, testing, technology and graphic design. Now they are expanding into other forms of entertainment with additional services, and I look forward to being part of the team that grows the L.A. market.

Want more stories like this? Subscribe to our newsletter and get it delivered right to your inbox.

I presume that means films, games, VR/AR and all future experiences?

I feel that VR was kind of exciting there for a second, and now it’s gotten a little bit timid as we wait for the technology to catch up. When VR first came out it was overhyped and rushed to market. people were only seeing cardboard with low-quality images that looked like their childhood TV. It can’t compete with their new 4K OLED, HDR television in their homes. Eventually the technology will catch up.

The most exciting thing on the VR side for me right now is the theme park industry. That is where you have goggles and mobile PC on backpacks. These systems have haptics, where you have things you can touch and feel. So they’re using fans for winds and even technology for smell. I did a project called Tree where we manipulated your senses. The Void has great titles where you can experience these things. You can grab a weapon, open a drawer, walk across a plank between buildings. And you believe it because the plank is wobbly. There are some great experiences out there, where we’re getting close to being on a holodeck. That’s very exciting.

AR, in the meantime, has a lot of everyday tools that people are going to like, such as, “I was thinking about buying a couch.” You could photograph the room and overlay the couch in that corner. The value of that is huge. Think of education for sound engineers. I want to learn the Avid S6, I reach out for a button and an overlay is there and you say, “Ah! That’s what that button does.” Medical is very exciting. Doctors can start with visuals in an overlay of where or how they might have to operate. We can start understanding that there are so many great opportunities for this technology.

And they all need audio.

And they all need good audio. And good stories and experiences.