Surround recording is an area many are exploring nowadays, and each of those that do so have their own applications and reasons for doing it, but it is likely that Riccardo Mazza is alone in taking on this topic to justify elephant rides in Sri Lanka and river trips in Thailand. The wife of this high-spirited, 37-year-old, Turin, Italy-based musician and engineer likes to travel, and she’s not so keen on anything as nearby as Venice or Florence.
As Mazza watched his travel bills mount, he realized he had to find a way to make this wanderlust earn money instead of hemmorraghing it. The answer lay in recording the sounds of the far-flung destinations. Mazza had already spent nearly a year devising software to let him experiment with surround motion techniques in the music he was writing. Now, recording in surround seemed the answer that would justify his perambulations and their costs, but there was a catch: Portable multitrack digital recorders were not as easy to come by in 1997 when he started the effort as was his trusty Sony portable DAT deck.
The challenge was plain: Find a way to record immersive surround recordings on two channels that could then be decoded in post-production to recapture the surround image.
The solution Mazza came up with was a family of multi-mic recording techniques and a phase-based matrix that encoded the microphone outputs into a 2-channel signal capable of being decoded by a standard Dolby Pro Logic unit. The development ignited in him a burst of technological and musical activity that produced a unique set of tools and techniques, as well as the world’s first Dolby Surround-encoded sound effects library.
Mazza began his musical career as a drummer, but that didn’t last past his teens, because, as Mazza explains, “I was crazy, rolling all the time and giving fire to the cymbals. Nobody wanted to play with me at all!”
In his 20s, Mazza began studying technology. Although he continued composing and playing music, including an abortive madcap attempt at rock ‘n’ roll stardom in Los Angeles, he also began an intensive study in sound and audio technology. While immersing himself academically, Mazza kept close to “the street,” engineering and sound designing for major Italian recording artists, and providing technical support for companies like KS Waves, Opcode Systems and Creamware. In 1995, Sony Italy released a solo album of Mazza’s “progressive pop.”
By 1996, Mazza began teaching at the Scuola di Alto Perfezionamento di Saluzzo, one of Italy’s most highly regarded schools of music, an activity he continues today. (He’s also involved with conservatories in Bologna and Milan.) Then the surround bug bit, and within a year, Mazza had created his first surround software tool, 3D Total Surround (3DTS), for use in a performance at Milan’s Magazzini Generali of a suite he composed and conducted. “It was a major step for me,” says Mazza, “experimenting with dynamic motion and the reactions to various curves of different source and speaker positions.”
3DTS is a stand-alone application that controls a DAW or digital mixer with MIDI control change commands to manipulate up to eight signals playing over up to eight virtual speakers placed within a defined space. The behavior of each speaker is specified and can be varied in real time. A built-in sequencer can record and play mouse gestures, and OMS compatibility allows MIDI controllers to be used and recorded.
In 3DTS, the user first defines the dimensions of the room in which the sound will be played (a PICT of an actual space can be pasted into the display for convenience and realism) and then the placement of speakers within the space. Next, the user creates dynamic response characteristics for each speaker by specifying a maximum output level (which roughly correlates to its throw, or distance coverage) and a table containing a curve onto which signal volume for the speaker will be mapped.
Both of these parameters are very flexible: The maximum volume can be varied in real time to effectively change the amount of leakage or separation between speakers, while the curves in the tables can be drawn in any shape. With the environment and its response created, the user then makes and records movement gestures for the sound.
3DTS typifies a very important aspect of Mazza’s work in surround: His viewpoint is musical, and so his primary concern in the tools he makes and their usage is creative rather than scientific. While his mic techniques often do not produce the most accurate representation of an acoustic event, 3DTS is not designed to be a true emulation of speaker response in a room, and many of the sounds on the CD library are not mono-compatible. Instead, his aim is to achieve the greatest dramatic impact that surround sound promises. (More information on 3DTS and the rest of Mazza’s innovations can be found at www.renaissancesfx.com. A 4-in/4-out version of 3DTS can be downloaded from www.riccardomazza.com.)
MIKING TECHNIQUES AND X-MAT
With 3DTS up and running, Mazza started developing his mic techniques, which he called “X-Technologies,” as a research project that further exploited some of the curves and ideas he explored with 3DTS, and combined them with the surround effects produced by out-of-phase material. His experiments started with the distance formula:
d = -[(X2-X1)2+(Y2-Y1)2]
Where d = distance; X1, Y1 = listener’s position coordinates; and X2, Y2 = coordinates of a sound source moving through space.
The result of this was mapped onto a mathematical matrix that assumed 360° pickup (by any number of miking techniques) and combined a time-variant response curve with a linear mapping of the space definition to yield a value representing angular distance from a directional microphone.
Using this system, a source moving past a mic array will produce an out-of-phase signal over a certain range of coverage angles and distance. When the mic outputs are properly summed, this produces a 2-channel, Dolby Surround-compatible signal without any of the lowpass filtering and other limitations of Dolby Surround encoding.
Mazza leveraged Dolby Surround steering by carefully combining microphone pickup patterns and placement, level and phase control. His research resulted in four X-techniques intended to allow acoustic events to be recorded with maximum impact.
“X-Techniques are a derivation of Mid-Side technology,” Mazza points out. “You can use many different mic techniques; the key is the principle of sum and difference combination with a figure-8 or a cardioid microphone.
“In order to use this technology, I need a double-encoding of phase. I have phase control on the first section of the channel and another phase switch on the destination, so summing and differencing both phases I get what I call ‘double-sided’ coding.”
Putting these ideas into action required building a means of performing the careful, phase-correlated level and phase manipulations necessary to get the desired effect. Mazza programmed the necessary encoding matrix, which he calls “X-Mat,” as a plug-in for Creamware’s powerful Scope environment. (Scope consists of a PCI plug-in card containing 15 Analog Devices SHARC DSP chips and a software library of audio processing modules from which you can build virtually any custom circuit.)
X-Mat has five channels for microphone inputs, each with a level control and assignment to one or more of four buses, which are then summed down to a 2-channel signal. Phase reversal is possible at numerous points in the process: independently on each channel, on pairs of channels (1/2 and 3/4) and for each of the buses.
Mazza’s “I” format is a good example of how his X-Techniques work. The I format puts an omnidirectional microphone, a shotgun and bidirectional (figure-8) mic in a near-coincident M/S-like array, where the figure-8 is at a 90° angle to the shotgun. In the X-Mat matrix, the omni and the figure-8 are assigned to buses 1 and 2, and the shotgun to buses 3 and 4. The buses are combined into Lt-Rt, but the phase of bus 3 is flipped before combining. The result is:
Lt = O+L-R+S=L+R+C+S+L-R-S=2L+C-S
Rt = O-L+R+S=L+R+C+S-L+R+S=2R+C+S
Which, once Dolby Surround-decoded, yields:
L=2L, R=2R, C=2C, S=2S
This configuration excels at capturing front-to-back motion, but yields a weak stereo image and suffers from proximity effect. Consequently, this format is optimal for recording large events, like explosions or gunshots, at a distance.
The X-O format, on the other hand, is intended to produce an accurate 360° image. It requires five microphones: four cardioids and an omni. The cardioids are placed with one facing each direction: One pair is placed at right angles to each other and captures the front L/R, while the other pair faces the rear and captures SL/SR. The two pairs are placed coincident to each other, while the omni is placed in the middle of the whole affair. In the X-Mat matrix, the cardioids, appearing on channels 1 through 4, are assigned to buses 1 and 2 (which then feed Lt/Rt outputs) in odd/even pairs (FL/FR and SL/SR to 1/2), but the phase on SL is flipped. The omni is assigned to both buses 1 and 2. Defining the pickup of the omni as (L+C+R+S), we get:
Lt = L-Ls+Omni = L-Ls+(L+R+C+S) = 2L+R+2C+(S-Ls)
Rt = R+Rs+Omni = R+Rs+(L+R+C+S) = 2R+L+2C+(S+Rs)
To make this work, the Ls signal must be normalized to compensate for the summing of Rs with the S component of the omni.
The Y-8 format uses two cardioid mics and a figure-8 to capture a 270° soundfield, best for capturing environments rather than motion. The Y-Hyper format substitutes a hypercardioid for the figure-8 and yields a 360° soundfield.
OUT IN THE FIELD
By 1998, Mazza felt it was time to take some of his ideas out in the field and put them to the test. To be practical for field recording in exotic locales, a number of compromises were necessary. The biggest was that it was obviously not possible to take a Scope system with him, eliminating the use of his X-Mat software. In its place, Mazza constructed a very simple, passive electronic circuit that accepted four inputs, summing the transformer-coupled left and right channels with an out-of-phase surround and the center channel (which was attenuated by 3 dB through empirical determination). Thus, the left-channel output was:
L+S+(C-3 dB), while the right was R-S+(C-3 dB)
Although today Mazza’s circuit is nicely mounted in a project box, the original passive matrix was nothing more than loose circuitry: Just the thing to concern airport security in far corners of the globe. The hardware box’s stripped-down nature limited the X-Techniques he could use with it, resulting on his relying most often on his Y-Hyper format.
For microphones, Mazza used various combinations of the AKG 300 Blue Line Series (for their robustness), a Shure VP88 MS mic, Earthworks SR77s, Schoeps boundary-layer (on less distant and dangerous outings), and Oktava MC 012s (mostly on gunshots because of their hardware level pad). ATI Nanoamp mic preamps fed a Sony SBM (SuperBitMapping) converter, and the output was recorded on a Sony DAT deck. Using a homemade carrying case and a mic mount, Mazza struck a deal with his wife (a photographer) that for every picture she shot, he would record a sound.
Between trips, Mazza, who considers himself to be primarily a composer, continued building tools for creative surround production. Eventually, these were brought to bear on the field recordings, as well as musical works and commercial jobs.
Mazza programmed two more applications, the Matrix and the LFE synthesizer, as plug-ins for the Scope environment. These surround tools were integrated into his production studio by connecting the Scope system to a Pro Tools Mix Plus system through an ADAT bridge.
“The Matrix creates a 5.1 environment from a stereo source, or even a mono source,” describes Mazza. “The Matrix was originally designed for me as a keyboard player. I normally use two keyboards to make one sound, so I created the Matrix because I needed it to do surround keyboards. Then I tried using the Matrix with stereo sounds, and I found that it worked very well, so I started using it for surround in normal production.”
The Matrix is a multichannel morphing filter intended to produce interesting motion effects. It accepts four inputs configured as two stereo pairs (originally often a stereo synthesizer and a sampler) and produces 5.1 output.
There are six sections to the Matrix. The first is the Input section, which provides gain and phase control, metering and bypass (for in/out comparisons) for the two input pairs.
Next, the signals are fed to the Filter section. Each stereo signal is provided a lowpass filter, a highpass filter and a comb filter, all of which can be configured in a variety of ways. For instance, the LPF could be applied to the left channel, HPF to the right channel and the comb section (which is stereo) fed in parallel, with the outputs of the whole mess being combined at the end.
The LFO section contains four sine LFOs, which cause the motion in the signals. Each LFO is assignable to one or more parameters of the Filter section. The Vector section has two virtual “mod wheels” to which MIDI controllers are mapped in order to crossfade level and pan of the two stereo channels. The level wheel simply crossfades the two stereo channels, while the pan wheel exchanges pan positions between stereo channels (stereo channel 1 L/R moves to the pan positions previously occupied by channel 2 L/R, and vice versa) or within channels (channel 1 L moves to the location of channel 1 R, and vice versa, the same thing happening with channel 2 L and R). Because the wheels are controlled by MIDI controllers, the moves can be recorded and edited by a sequencer.
Another LFO is provided as an alternative to using MIDI to control the level wheel. This LFO is more sophisticated than those in the LFO section, with definable waveform, fade in and a Sync mode.
More MIDI control is available in the MIDI Modulation section, which lets key position, velocity or aftertouch be mapped to filter parameters. Response curves for the controllers can be drawn and edited.
Last, but far from least, is the Assignment and Master section. In 5.1 mode, each of the four inputs you’ve by this point completely warped can be assigned to one or more of the five main outputs. The ability to assign an input to more than one output yields the ability to create complex interactions that will cause dramatic movement in the surround space. This mode also produces excellent Dolby Surround-compatible output. The Master section also has a Stereo mode.
The Master section contains two LFE processors, which use Tartini’s third sound principle to produce very low-frequency signals. (Tartini was an Italian violinist who documented different tones in his mid-18th century harmony manual, one of the first discussions of what we now call the “missing fundamental effect.”)
There are three versions of the Matrix: Matrix F (Full), M (Medium) and S (Small). The difference is simply the addition of a final delay section. Because delay can take up a lot of memory, the S version has no delay, while the F version provides the ability for each of L, R, SL and SR to be delayed and sent to any of the other three channels.
Taken as a whole, Mazza’s Matrix provides a nearly overwhelming wealth of options for manipulation of two stereo signals to create a rich, moving surround soundfield.
BUILDING THE LIBRARY
Three years ago, Mazza met Turin businessman Pietro Giola, a former commerical composer now running a musical rights licensing concern, and the two decided to create and market a surround sound effects library with Mazza’s tools and field recordings. A multi-CD set was planned, and Renaissance Sound Technologies was formed to produce it.
Mazza had always felt that the LFE channel played a crucial role in providing impact for surround sound, so it was decided that the CD library should contain an entire disc of LFE elements. To facilitate production of this disc, Mazza created the LFE Synthesizer (LFE-S), again using the Creamware Scope system as his development environment.
The LFE-S is a toolkit that exploits several psychoacoustic effects and synthesis techniques to generate very low-frequency materials. Five oscillators (one of which can be switched to accept an external input signal for processing) and two noise sources can be combined and assigned as modulators in an FM synthesis circuit with a sine wave carrier. The modulating oscillators can be defined by a number of parameters and further altered with waveshaping sync circuits.
From this point, the output of the FM synth and the modulating oscillators themselves are split into two signal paths. One path feeds a filtering section with 24dB/octave highpass and lowpass filters and the output of this section feeding a VCA. The other path goes to an LFE section, which is an adaptation of the LFE algorithm used in the Matrix, but tuned to lower frequencies.
A sophisticated modulation section with two full-featured LFOs, two envelopes and comprehensive MIDI mapping capabilities (the LFE-S is fully MIDI-controllable) provides more resources for complexity in the circuit.
Finally, the whole shebang: All of the various synth outputs (summed oscillators, waveshaping, FM, subtractive filtering) are combined with the outputs from the LFE section in the final output section.
With production of the CD library underway, RST began producing surround sound scores for conventions of Italian industrial giants ranging from Martini Bacardi and Fila to Alfa Romeo. Surround sound in a live, commercial event was a revelation to many clients, as Giola recalls: “Renaissance Sound Technologies started to enter the market for original music production in surround for B2B [business-to-business] purposes, [which was] something very new for the market of special events, conventions, multimedia events, exhibitions, etc.
“Marketing directors, creative managers — they know about surround production just on the cinema side; they usually don’t think to use surround sound in special events or other similar applications. When we introduce them to what surround sound is, they don’t believe it is possible that audio can have so much impact on an audience [as we tell them it can have]. As soon as they sit in Riccardo’s studio and listen to his surround-spatialized effects, all of them immediately change their point of view on audio and music and understand the real added value of surround sound production. For us, it is fundamental that they understand that surround audio is not just a better way of listening, but a new way of listening. It’s a way to be ‘in the sound,’ and can be the key for developing new creative communication ideas and products. In this way, you can give your audience new ‘sensations.’ Every time we produce original surround sound, production clients feel there is something they didn’t feel before.
“To illustrate the point, in the fall of 2000, we produced original surround sound installations for an international art exhibition on the Etruscan people at the Palazzo Grassi in Venice. In the ‘war room,’ where old Etruscan arms were exhibited, we re-created a battle environment in surround sound to convey to visitors the sensation of being in the middle of an Etruscan battle.
“On opening day, the exhibition’s official press release started with: ‘Impactful battle sounds come from the war room of the Etruscan exhibition at Palazzo Grassi in Venice opening this Sunday. [You’ll hear] really involving and fascinating sound effects of roars, screams of wounded soldiers, clanging of swords, horses running and neighing: In this room, it’s impossible to forget that Etruscan people were a warrior people as well.’”
RENAISSANCE SFX CD LIBRARY
By the end of 1999, the Renaissance SFX library was released and quickly garnered worldwide interest and distribution (with North American distribution through Sound Ideas). The Dolby Surround-encoded library, when originally released, contained seven discs (it now has 11), with a combination of natural field-recorded ambiences and effects and highly manipulated or synthetically generated ambiences and effects. The set is notable for its variety and out-of-the-ordinariness, with ambiences ranging from a monastery in the mountains of Thailand to a mall in Sao Paulo, a train in Prague and, of course, a pizza restaurant on the Italian Riviera. Alpine streams and horse pass-bys recorded in the Pampas of Uruguay illustrate the effectiveness of Mazza’s miking techniques, while science fiction sounds, ghost and goblin voices, and crazy and evil laughs demonstrate the usefulness of the post-production tools in studio work. An entire disc is also dedicated to surround musical elements: loops, pads and hits.
To differentiate effects intended to have a natural sound from those that are deliberately shaped, the sounds are identified in the library’s catalog as belonging to one of three classes, which Mazza details: “We have ‘natural’ sounds, which are from the X-Techniques, that we used on city tracks and so forth. Then we have what we call ‘motion’ sounds, which are sounds that do not have an exact spot motion but they have a character or personality, like a ghost flying around you. Then we have sounds with exact motions, named ‘panned,’ which have a defined path, like ‘gunshot LCRS,’ which means from left through center to right and into the surrounds.”
Mazza labored extensively in the authoring of the discs to ensure that the sounds not only worked individually but also in combination. “Every sound has been treated independently so that any sound can be layered with any other sound without losing phase-based images,” explains Mazza. “You can do many layers of the sounds in the library and still get the correct motion. If I have two versions of a sound, I treat one such that it will have a little phase difference from the other, so that you can always layer the sounds and build your own surround environment.” Extensive testing layering combinations of sounds in the library was conducted to ensure that phase cancellation would be unlikely.
But the phase-based nature of Dolby Surround did force some difficult decisions. “We choose not to be completely mono-compatible,” notes Mazza. “Natural sounds are mono-compatible, but the panned sounds are not mono-compatible. We do that so that you can get the maximum motion.”
The choice of the Dolby Surround format, as opposed to 5.1, was largely a practical one. For starters, releasing 5.1 audio discs could only be done on DVD, as opposed to standard CD-DA, which, in 1999 especially, was the more established format. Additionally, the library as released can be used in stereo without downmixing.
The release of the CD library was only the first volley, and production of more discs was ongoing. Since the original release, four more discs were added to the set, one being the LFE disc mentioned earlier. Mazza also wanted to release more musical elements on disc and continued developing tools toward that effort. The other new discs contain some of the results of this work.
His most recent invention is the Reflection Chamber Emulator, or RCE. Once again, Mazza took an unorthodox approach: Where recording engineers usually struggle to eliminate standing waves in recording spaces, the RCE was created to generate them. The RCE consists of a slotted Plexiglas sheet as a “floor” in the X plane and two sheets mounted at 90° to it as “walls,” the two walls being at a 30° angle to each other. On the floor, Plexiglas “boards” of carefully chosen heights are arranged so that their placement results in standing waves of desired frequencies and amplitudes. The boards can be changed to accommodate the needs of the application. Generally, Mazza arranges the RCE to produce harmonically related standing waves calculated to complement the tonality of the musical piece he is recording, and he has written a program for his Psion PDA to calculate the height and placement of the boards for a given set of desired resonances.
The RCE is positioned in front of the musicians and then miked with either an inverse M/S arrangement or one of Mazza’s X-Techniques, and recorded onto two tracks. In post-production, an M/S matrix or X-Mat is used to decode the material back into surround. Mazza’s idea is to create a rich, harmonic surround environment for musical elements, and three new CDs have been created with the help of the RCE.
All of Mazza’s tools and techniques have been built to serve his music, and even with the development time, travel and library production, he has kept up a continuous stream of performance and installation projects. In 1998, he created a system of sensors linked to works of art in Turin’s Chiesa S. Filippo gallery. In 2000, RST created an interactive surround spatialization environment for the ARTISSIMA Art Exhibition in Turin, a multisensor-controlled surround installation at an art exhibition in Milan, and the Palazzo Grassi installations described earlier, while 2001 brought yet more multisensor surround works with a performance at the Contemporary Art Museum in Turin and two installations at the Experimenta scientific exhibition, plus a surround soundtrack for the Giocathlon interactive exhibition on sports. Whew!
Mazza’s work with sensors spurred the development of yet another tool: an interactive music language he calls CSXL (Coding Source Extended Language). “It’s basically a squencer of ideas,” explains Mazza, “where you can have placement of sound components algorithmically controlled by external factors like sensors.”
Even with all of his tools, techniques and credits, Mazza still focuses on doing what it takes to get the job done, even when it calls for extraordinary measures to keep harmony in his household, as exemplified by a recent example of a recording done for the CD library: “In order to record breaking dishes [used in the library], I had to wait for my wife to go and see her mother, since I wanted to use all the marble surfaces in my house. The difficult part was cleaning everything up before she returned!” Oh, were we not supposed to print that part?
Larry the O is a musician and sound designer who has contributed to Mix since 1984.