Immersive Sound for Cinema


The introduction in 2005 of the Digital Cinema Initiatives standard brought with it the largest wholesale change in motion picture presentation since the arrival of widescreen cinema and stereophonic sound in 1953. It differed greatly from the past because picture and sound specifications had already been carefully vetted by committees with an eye toward scalability of the DCPs (Digital Cinema Packages) that are sent to theaters.

For the image, this meant 2k resolution was the minimum, but 4k was supported; in sound, all theaters were expected to have basic 5.1 systems, although the standard allowed for a total of 14 channels. Two additional channels are reserved for mono mixes for hearing impaired and visually impaired patrons, the latter being narration on top of the mix.

However, it was inevitable that variations would soon occur, and these were first in picture with various implementations of 3-D. As soon as this was starting to sort itself out in 2012, two different immersive sound formats arrived to break the 7.1 barrier that was the limit for almost all previous DCPs.

First, in January 2012 Auro Technologies, in association with Barco Cinema, introduced Auro-3D with the film Red Tails in Auro 11.1, which was shown in about 2 theaters in the U.S. The development of Auro-3D began seven years prior, with research that CEO Wilfried Van Baelen had done at his Galaxy Studios in Belgium.

The Auro-3D cinema format, in its basic 11.1 cinema iteration, adds a 5.0 height layer—three screen speakers and two upper surround channels—above the standard 5.1 system—plus a top layer comprising a center-ceiling “Voice of God” channel. The system can be expanded to 13.1 with the splitting of the lower surrounds into four channels, as in 7.1.

Utilizing their proprietary Auro Codec, the additional tracks are encoded in the four least significant bits of a standard 24-bit, 48kHz mix, so that only one 5.1 or 7.1 printmaster needs to be shipped on DCPs, with the additional height and top channels decoded in the cinema.

Auro Technologies has a complete suite of plug-ins to aid mixers, including the Auro-Panner, to place sounds in the 3-D field, and Auro-Matic Pro, which allows upmixing of mono, stereo and 5.1 elements to their 11.1 and 13.1 formats.

The second “salvo” in the new format wars occurred in June 2012 when Dolby Laboratories introduced its Atmos format for the Pixar animated film Brave on 14 screens. Dolby had been researching expanding cinema speakers for years, going back to 2002 and We Were Soldiers, which utilized an overhead VOG channel.

After years of experimentation with various speaker positions, including screen height as in Auro-3D, Dolby arrived at standards for surround speaker spacing, locations, and dispersion and mounting angles. The side surround speakers begin near the screen, and fill up the first third of the auditorium where normal surrounds are absent.

Timbre match of surrounds to screen channels is made a reality by employing bass management; this, combined with the placement of surrounds closer to the screen, helps smooth out the transition of sounds off the screen and by giving surrounds much increased power handling. Bass management is not used in all theaters; at Dolby’s screening room in Los Angeles and at the Samuel Goldwyn Theater at the Academy, the existing surrounds were able to go down to 40 Hz, which matches the specified low-end response of screen speakers. The final speakers added in Atmos are two overhead arrays down the length of the theater’s ceiling. Up to 64 speakers are supported by the CP-850 Atmos cinema processor, which went into production in April 2013; before that, theaters were using the studio RMU mastering unit.


Where Auro-3D is its current form is channel-based in the classic stereo film manner, with recorded tracks assigned either to specific speakers or arrays of speakers, Atmos is object-based. In object-based cinema audio, sounds are not necessarily dedicated to a specific channels for the length of program, but instead individual files are placed in the three-dimensional space of the theater via metadata containing level, location XYZ coordinates and start/stop times. (X is left-right across the screen, Y is from the screen to the back wall, and Z is height.)

Object-based audio (OBA) is of course the foundation of video games, in which the timing and location of sounds are variable according to where players are in their worlds. For movies, which occur in a linear fashion, OBA is used for two purposes: One, to pinpoint the location of a sound in what otherwise might have been an array (such as a surround theater wall) or a group of speakers (such as behind the screen) or in three-dimensional variations among arrays and speakers. Two, it allows for this accurate panning to take place in various theater configurations and sizes: “halfway down the right side wall” scales to the same position, regardless of whether the wall contains eight or four speakers.

Among the first public demonstrations of OBA for cinema were in the early part of the last decade by IOSONO, based on research done at the Fraunhofer Institute in Germany. IOSONO was shown in various venues in Los Angeles from 2008-2010, although current IOSONO efforts have primarily been in special venues and corporate events. As of summer 2014 the company is undergoing financial restructuring.

While it is possible to mix Atmos exclusively utilizing objects, standard practice entails mixers using “beds,” which are essentially full-length “static object” tracks dedicated to specific channels. Thus, 7.1 beds for the dialog, music and effects stems at the final mix involve dedicating 24 of the 128 inputs to Dolby’s RMU. Sound effects and music beds are frequently expanded to 9.1, with the stereo overheads as two arrays.

In this example, up to 104 object tracks can be recorded as mono .wav files containing XYZ coordinates and other object metadata. While most mixes may never need 128 simultaneous objects, object tracks (like beds) are dedicated to specific stems, to easily allow the creation of M&Es, not to mention facilitating archiving. When creating the MXF-wrapped file that is in effect the printmaster of Atmos mixes, only the actual audio used in the mix is used, with the silence between the events on all tracks—objects or beds—are deleted for space-saving purposes. During rendering in the theater, the objects are triggered in sync and placed to the proper locations according to the metadata.

The “scalability” of Dolby’s Atmos specifications apply not only to the surround and overhead arrays, but also to the screen, specifically the left-center and right-center speakers that have recently been largely absent from film mixes, save for certain Sony Dynamic Digital Sound (SDDS) mixes that used all channels in that format. (The configuration of course began in the Fifties with Cinerama, and later continued in Todd-AO.)

Dolby has been strongly recommending Lc and Rc speakers where the screen is wider than 40 feet, and reports that a large percentage of Atmos installations have five screen channels. The presence of those speakers will make themselves known by smoothing out pans in wide screens common in today’s cinemas with stadium seating, especially to patrons seated close.

While the smoothing of pans with standard three-screen speaker mixes, further benefit can be had when mixers create stems with five screen channels or create static objects, assigning elements to the narrow-width Lc and Rc speakers. This is generally regarded as very useful for dialog and effects panning, and for increasing the resolution of the primary screen “proscenium.” Again, just as three-speaker mixes spread out naturally to five, phantom images are created for Lc and Rc objects when there are no speakers present.


When digital sound first came to film exhibition in the early 1990s, three formats competed for the attention of filmmakers and theater owners: Dolby Digital, DTS and SDDS. Initially, distributors were divided into “camps,” so to hear any film in digital stereo, exhibitors had to install all three formats, something that was not practical or cost-efficient.

By about 1995, theater owners were given a “pass,” and studios started to release “quad track” 35mm prints containing all three digital formats, plus a stereo optical analog track. Within a few years, most major studios releases were done this way, although many films continued to be released only with Dolby Digital, which eventually became the most popular format, both in filmmaker and exhibitor acceptance.

As noted earlier, digital cinema was initially a proverbial Switzerland of film sound, and this situation changed with Atmos, whose proprietary immersive format requires that a separate 5.1 or 7.1 PCM mix be included on DCPs. Auro-encoded printmasters, on the other hand, can play in any theater, and this summer for The Amazing Spider-Man 2, the 5.1 PCM mix was 11.1 Auro-3D encoded. When creating a 5.1 mix, the Auro Encoder adjusts the levels of the height and top layers, and these adjustments are “undone” when played back in Auro theaters.

However, the fact still remains that Auro-3D and Atmos, as the first two immersive sound formats in widespread use, are completely incompatible in philosophy, implementation and speaker layout. The situation is worse than had been the case with 35mm digital formats, and theater owners have the most to lose by investing in one system that is unable to play the competition’s track.

In early 2013 the technology committees of the U.S. exhibition trade organization NATO (National Association of Theater Owners) and the European trade group UNIC (Union Internationale des Cinémas), along with DCI, joined forces in an effort to see to it that a “common immersive sound package” be utilized, as opposed to the 35mm quad-track solution of delivering all formats to all theaters.

Answering the call, SMPTE formed a special Working Group (TC-25CSS) to assist in this standardization effort. Auro Technologies and Dolby have both pledged to adapt to the agreed-upon open format. While Auro-3D is not currently object-based, their creative tools suite allows object-based mixes to be made, although it will not be in the same 5.1 or 7.1 PCM format as today. Also, the Barco cinema processors were designed with an upgrade path in mind, and 24 outputs, which presumably would allow the surrounds to be split into more zones.

Essentially, the goal will be for the metadata of any format’s mix to be seen by any cinema processor’s renderer, which is matched to the configuration file of a theater’s specific speaker layout. Indeed, back at the mix stage, there have been mixes that were originally made in Auro or Atmos that have had panning data modified for the other format. The difference to the public would be how much the theater’s system matches that of the mix stage.

One potential solution that has been presented is Multi-Dimensional Audio by DTS. The company, which was originally known for its double-system digital theatrical format, split in two in 2009, with DTS keeping the licensing of consumer software and codecs. (The theatrical business was spun off to a new company, Datasat, which coincidentally manufactures the AP24 processors for Barco on an OEM basis.)

The intellectual property of MDA originally began at SRS Laboratories, before it was acquired by DTS. The “MDA Cinema Proponents Group,” an informal alliance comprised of DTS and companies such as QSC, USL, Barco and Auro Technologies, has gathered to present MDA to TC-25CSS.

While MDA has not been used on any films, it has been tested in the industry, and version 1.0 of the code was released in early August, following up on specifications submitted to 25CSS months earlier. The SMPTE standardization process is famously long and drawn-out, and while the industry wants the format war to end, there’s no reason that filmmakers or equipment manufacturers need to wait for any decision to be made to use MDA in the real world. (After all, Auro Technologies and Dolby didn’t need to wait!)

Object-based like Atmos, MDA is being offered to the industry as an open format, with a SDK available to developers. As an open format, MDA would be license-free, and DTS would make available necessary software for digital audio workstations and console manufacturers. (Auro Technologies and Dolby have been providing similar support to filmmakers.)

Unlike Auro and Atmos, whose basic philosophies demand specific, scalable speaker locations and aiming (with Dolby going a step further in components and EQ), MDA is, by design, speaker agnostic. Indeed, there will be presumably much leeway in its implementation in theaters. For example, USL has come up with a cost-effective way for cinemas to upgrade by rendering the MDA mix to 13.1 channels of PCM files “offline,” distributing those files to the media blocks of the servers in theaters. Rendering would take into account configuration files for individual cinemas.

Once an open object-based format is agreed upon, the next goal for theatrical presentation could be for the immersive file to downmix to 5.1 or 7.1 in theaters, avoiding the need to have a separate channel-based PCM mix on the DCPs. However, because this would mean that the downmixes are done without filmmaker intervention and control, it remains unclear if this is even a practical goal. Items like screen-to-surround panning would make downmix errors especially apparent.


The standard worldwide theatrical license for Dolby Digital in recent years has been around $11,000, giving filmmakers 16 hours of engineering support. A small increase this fall is anticipated, and the number of included engineering hours will increase at the same time. While the need for Dolby consultants on the stage increased greatly with the introduction of Atmos, as mixers become more familiar with the technology, their constant presence has been less needed.

The Dolby CP-850 cinema processor costs $33,750 [all prices in this article are list] and includes the Dolby Commissioning Service: consulting on speaker layout and selection and doing the initial room equalization tuning. The unit can support up to 64 speaker channels, and at least one DAC3201 D-to-A device, at $3,750 each, is needed for the first 32 outputs. The cost for each theater for retrofitting varies greatly, with some theaters north of $150,00 total. As of September 1, Dolby has over 560 Atmos screens worldwide, 175 in North America, and the total number of Atmos mixes at 120.

Auro-3D is free to content creators, and the Barco AP24 processor used in all Auro-3D theaters costs $25,000. As of now it has been installed in 215 screens, with the U.S. as the largest base, followed by India, and over 55 films have been released in Auro-3D. Up to this point, Barco is the exclusive licensee of Auro-3D for digital cinema, and Atmos is only available in Dolby’s own units.

All of this is setting the stage for a conundrum waiting to happen: Everyone wants to promise a special experience for the public who leave their homes to go to theaters, yet all the formats are poised to expand to broadcast, Internet, home theaters, gaming, cars, mobile devices and pets. (The last is probably an exaggeration, but one never knows.)

One of the reasons that interoperable standardization is possible in theatrical films is that uncompressed PCM audio is used per the DCI specifications. Not only is it specified, there’s much space on the hard drives that contain DCPs—the picture alone can take up hundreds of gigabytes.

Outside of theatrical presentation is a different thing, with media size (50GB on Blu-ray discs) and broadcast bandwidth limitations. Hand-in-hand with this is the decision as to which lossless (such as Dolby TrueHD or DTS-HD Master Audio) or lossy (Dolby Digital Plus or DTS-HD) codec to use.

Auro-3D, in formats such as Blu-ray would be able to use its original 5.1 theatrical PCM printmaster, where Atmos and MDA mixes will presumably need to have separate home video immersive printmasters created to fit in the smaller bandwidths.

Dolby and Auro Technologies have taken their first steps to get their immersive tracks in AV processors. Auro Technologies announced their own Auro-3D Auriga unit at this year’s CES show, and they have signed up McIntosh, Datasat and Lyngdorf, among other companies, to bring Auro-3D to the home.

Dolby’s serious push for Atmos at home began in August, with demonstrations around the U.S. to the consumer audio press. Atmos for home theaters is scaled down (from a maximum 64 speakers in theaters) to 24 floor speakers and 10 overheads. They expect that most homes will have no more than four overhead speakers, and Dolby has anticipated practical mounting issues by designing “Atmos enabled” speakers that fire up at the ceiling.

DTS will clearly be making a big push for MDA’s use in all media; donating it as an open, free format for cinema exhibition usage has to pay off some time. They and Dolby have dominated the licensing market for home audio for decades now.

Object-based audio at home will allow manipulations like turning announcers off and just listening to the director calling shots in the truck. Or the pit crews at NASCAR. Or just the immersive stereo sound on the field. The possibilities are endless.

With the latest cinema sound format—now, immersive audio—leaving multiplexes and going to the home, the “natural” order has been restored. The prior “standard” format, 7.1, began in home theaters before heading to cinemas.

At the end of the day, everyone wants to make more money, and it will be up to the collective votes of filmmakers, exhibitors, theatrical moviegoers and consumers as to which will prevail. The extreme cost and paradigm shift involved on all fronts makes this perhaps the most difficult transition in motion picture sound history.

An Op- Ed on Immersive Sound from the Author

There’s no way for me to write my opinions of the state of immersive sound in cinema without initially stating my prejudices.

First, I think that the 5.1 format that has been standardized since the coming of digital sound to 35mm release prints in the early 1990s is just fine.

In fact, I would almost take this opinion a step further and say that 4.1, with mono surrounds, is only a hair less better than 5.1. (Danger: One should stop operating power tools before reading the next sentence.) I believe that Apocalypse Now would be scarcely harmed with a mono surround mix. The greatness of that sound job lies in its ideas, which won’t be diminished if a Loach doesn’t precisely criss-cross the theater or if the opening ghost helicopter were to start in all surrounds, as opposed to the right surrounds as it does today.

Second, the cost: All of the current formats ask theater owners to spend as much on this upgrade as they did with their original 5.1 systems—at least $50,000, most often closer to $100,000. For perspective, the cost of installing Dolby Digital processors over 20 years ago was less than $10,000.

Third, we must always remember that the drama of the movie is on the screen, within the proscenium. It’s still a sheet on the wall, and even 3-D images do nothing to change this fact. If theatrical film sound must be extended beyond 7.1, there’s plenty of gold to be mined in creative use of the remaining six channels available in a standard DCP, using them for specific speakers (especially left-center and right-center screen channels, or left-wide and right-wide just outside the screen) or additional surround arrays.

In this manner, all sound editing and mixing could be done without any elaborate object-based encoding and rendering. You would have a format that would “immerse” the audience to a degree as to be indistinguishable from Dolby Atmos or DTS MDA for 98 percent of the running time of most films.

I think that the missing two percent are not just an acceptable compromise, they’re a desired one. What object-based mixing does uniquely—move sounds either down the length of the theater or to specific locations away from the screen—is precisely what I don’t like. In the mid-1990s, some mixers took a few movies to get out of their system, putting silverware Foley or snare drums in the surrounds, and the thought of point-source sounds barking at me from the ceiling or walls is almost too much to bear. In fact, even the 7.1 format, dividing the surround tracks in two, bypasses my ken.

Now that I’ve said this, I admit that the toothpaste is out of the tube, and with hundreds of theaters worldwide putting in immersive sound systems, the question is not if immersive sound should be implemented, but how. First, let’s look at the two approaches: Auro-3D, with its emphasis on height layers, vs. object-based Atmos and MDA.

In the multiple Auro-3D demonstrations that I have attended, I have not heard anything that I consider to be a radical improvement over 5.1. The most impressive parts of their test material were in sections recorded with their custom mic arrays, with various height levels. As good as these sounded, I think that custom mic arrays highlighting the strengths of Atmos and MDA would be much more impressive. Besides, as film sound history has shown us—such as with CinemasScope’s three-track production recordings—literal reality is often not desired or practical.

The comparison between Dolby’s Atmos and DTS’s MDA is like Newton’s third law of motion. To wit: Atmos’ strengths lie in its rigid specifications that help ensure a match from mixing stage to commercial theater. What they have come up with sounds really good to my ears—bass-managed surrounds, the wide screen “proscenium” speakers, left- and right-center screen speakers in most theaters—this whole gumbo rocks and their unmatched industry support helps see that this is reality.

The downside? This quality is via by far the most expensive immersive system for theaters to install, frequently north of $150,000. And the content creators have to pay a license fee to Dolby, something that doesn’t feel right in this era of digital distribution, where one can put a 5.1 or 7.1 mix on a DCP without any licensing fees or special equipment.

I’ve been friends with the folks from Dolby for decades, beginning first with Ioan Allen and Steve Katz in 1979, and following soon after that with David Gray and Doug Greenfield. My connection to film sound, first as a journalist and later behind a console, has evolved with these gentlemen and their colleagues at Dolby, and it’s a relationship that I treasure. This being said, I no longer want to have to get Dolby’s permission or gear to do a printmaster, and I’d prefer to put that money into additional sound editing or mixing time.

DTS on the other hand, is offering its MDA technology for free to filmmakers. Mixers will be able to start and finish their immersive mixes on their own, no different than the most humble 5.1 mix. Great, right?

Well, not so fast, Jackson. The “equal and opposite reaction” here is that, by having no “enforced” standards of how MDA theaters are set up, you can be assured that many will be installing two speakers overhead and calling it a day. And also calling it “just like Atmos.” (I know this possibility pains my friends from DTS, and they would like to see SMPTE address speaker standards, too.) Of course, since Dolby has pledged to support whatever open format the industry is agreed upon, you could get lucky and have your MDA (if that is indeed the open format of the future) mix play in a theater originally equipped for Atmos. Or in a well-equipped MDA-only theater. But, “lucky” is the operative word, and if I’m at any level presenting a experience as “special” to the public, I want to be sure that they’re hearing the film as I mixed it.

Whether or not a company is charging content creators (like Dolby) or not (like Auro-3D and DTS), they all see the real prize in convincing the home-theater-owning public of the need to keep installing extra speakers and purchasing new AV processors. While there are some unique potential benefits in object-based audio at home—I would love to watch an NFL game listening to the director and technical director calling out shots—I am not looking forward to reading reviews in the audiophile press comparing renderers.

Which way will I go in the future—stay 5.1 forever or “go immersive”? And if the latter, which format? I can’t say right now, but I know it’s going to be a real learning experience finding out.