Is 5.1 playback merely a rest stop on the road to true immersive sound? The film community is sure betting on it, and so is Sennheiser. For the past five years, a team led by Grammy Award-winning engineer Gregor Zielinsky (Best Engineered Recording, Classical, 1991, Leonard Bernstein’s Candide) has been working on a system that offers an additional four channels of material, designed to capture the audio that floats upward during a performance.
This past September, Sennheiser sponsored a concert at London’s Central Hall Westminster featuring the Junge Deutsche Philharmonie orchestra led by Jonathan Stockhammer. Grammy Award-winner Imogen Heap made an appearance at the concert, which was given to an invited audience. Rehearsals were recorded under Zielinsky’s supervision and played for a group of journalists prior to the performance.
The results were startling. Listening to playback off of a Pro Tools system set up in a temporary control room next to the performance space (with DiGiCo SD7 desk, nine Neumann KH 120 monitors and a pair of KH 810 subwoofers), the exceptionally well-played versions of Mendelssohn’s “A Midsummer Night’s Dream” overture and pieces by John Adams, György Ligeti and Philip Glass seemed to dance around the room. A sweet spot remains with 5.1, but the move to 9.1 eliminates this limitation. The addition of “upper floors” allows music to breathe more naturally.
Following the performance, Mix had a chance to chat with Zielinsky.
If all you’re doing is adding four extra microphones to capture the information in the upper atmosphere, why has it been such a difficult process to develop?
It has been extremely difficult because we wanted to capture the room and the orchestra in its entirety. This meant that we first had to acquire some experience of how this is best done, not only from a technical point of view but also from a psychoacoustic perspective. For example, when setting up the microphones for one of our very first 3D immersive audio recordings we put up the upper front microphones and we suddenly had flutes, oboes and clarinets coming from above because these instruments project their sound upwards at a 90-degree angle from the instrument. There are many technical aspects involved but even more psychoacoustic phenomena, and we had to experiment and learn where to place which microphones to best capture the orchestra.
At a certain stage we were faced with the task of simplifying the microphone setup. In its largest “extension,” the 3D microphone setup—the so-called Zielinsky Cube—consists of nine MKH 800 Twin microphones. They deliver 18 signals, which is quite a few channels and quite an investment!
Can you share any of the thinking that has gone into the development of this technology?
The Zielinsky Cube is based on the A-B main microphone technique. Just as a spaced A-B pair for stereo recordings corresponds to the loudspeaker positions during reproduction, the 3D immersive audio system used in the recording reflects the loudspeaker setup for 9.1 reproduction. Experience has shown that this equivalent microphone setup gives an extraordinarily good 3D signal.
What is important is that in the cube setup, we not only have a horizontal AB but also a vertical AB that reproduces all vertical nuances as such. When we began recording in 3D immersive audio and demoing this new technique, we were often faced with critical remarks such as, “But there is sound coming from above.” The next time you’re sitting in an auditorium and listen to an opera, close your eyes for a while: You will be amazed at the amount of sound that is actually coming from above. Our eyes define in a certain way where the sound is coming from. This leads to a sort of recording dilemma. Do I say, “Yes, the bassoon does actually come from above?” Or do I say: “That may well be, but when I’m sitting in the venue I perceive the sound as coming from the front.” You have to find a sort of compromise there.
Peter Brandt of Remote Recording Network at the DiGiCo SD7 in the makeshift control room.
You said that purity-mixing a 9.1 recording by transferring the audio exactly as it was recorded in the cube does not yield the most satisfying results. Can you clarify this remark?
We are talking of musical phenomenology in this respect. This was developed by the conductor Sergiu Celibidache. Phenomenology focuses on the question: “What does arrive in the brain? What does the brain receive, perceive? Celibidache once said that the musical notation does not matter, it does not matter what the conductor does, or what the violinist does. All that matters is what reaches the head, what our head perceives. And this is exactly how I am considering a recording technique and recording equipment. What is important is that when you’re sitting in front of your loudspeakers, it’s important that we simulate a situation for the brain such that the listener feels as if he or she is sitting in a concert hall, including the optical information; that is, the orchestra sitting in front of him or her. And we want to produce exactly this perception in the brain, which the audience has in a concert hall seeing the orchestra in front of them. This is why you need to change a few things in the mix.
Additionally, there are some dynamic aspects, or sometimes you would like to add a spectacular effect, for example making the drums a bit louder and crisper in some places than the original can sound in the hall. That’s what this is about. On the one hand, we have the pure reception that we want to re-create, and then there are some aesthetic aspects where we would like to put a certain emphasis. Of course, this does not mean that you will put the orchestra upside down in your mix, take the back to the front and the like. Putting the grand cassa in front of the conductor would certainly be overdoing it…
You said that you’re working on an algorithm that will hopefully allow some of the material recorded in 9.1 to translate into a stereo mix.
This is actually not about an algorithm or a process that translates 9.1 loudspeakers to stereo loudspeakers. This is what every tonmeister has to do on his own. Either you do a completely separate mix for stereo or you will do a downmix from 9.1 to stereo and have to see how you can translate the many things you were able to do in 3D to stereo.
What I referred to was the virtualization, a virtualization technique that Sennheiser developed. The virtual representation of any signals in a three-dimensional space with headphones. I was their listening partner, so to speak.
If there is still a sweet spot in 5.1 playback, how does adding four extra fields in the “upper atmosphere” remove it?
The sweet spot in 5.1 reproduction results from the fact that the signals often fall very much apart and become divided into front signals and rear signals; I call this the front-rear effect. And there’s also this hole at the sides. The latter is due to the phantom sound source in the middle. 9.1 is different. Via the cube, the signal positions itself. Every signal is reproduced by at least three, four or even more loudspeakers, and thus you can clearly hear where the signal is coming from. For example, if a signal comes from the bottom right, it will be played on the front right and rear right, and its “counterpart signal” (a mix of direct sound and reflection) will come from the top left rear and top left front. Also, the remainder of the speakers will play this sound in some way or other. So both ears and the brain get involved in a three-dimensional space.
A given signal will be reproduced by all speakers; not the very same signal of course, but all room reflections. All loudspeakers will play the reflections that this signal has in the room. And here we’re back at the original idea of 3D, which is to reproduce the concert hall as it is, and we’re exactly reproducing the room, and this is why we do not have a sweet spot. Of course, the signal changes if you move to the front, it will become more direct, just as it would become more direct if you moved further to the front in the real-life concert hall. If I move further to the front, the orchestra will become more direct; if I move toward the back, it will become more indirect. And when I choose to sit at the very rear underneath the balcony it will sound crappy! So it would be good if you chose a seat in the first third of the hall or in the middle, as this would also be the best seat in a concert hall. But there is no sweet-spot effect because the brain gets more information on where a signal source is located, quite unlike as with stereo or 5.1 recordings.