The Emperor's New Sampling RateARE CDS ACTUALLY GOOD ENOUGH? 4/01/2008 8:00 AM Eastern
The arguments about sampling rates and word lengths in digital audio are long over with, aren't they? I mean, no less a personage than James A. “Andy” Moorer — former director of Stanford's CCRMA, co-founder of Sonic Solutions, recipient of a Lifetime Achievement Award from the AES and now senior scientist at Adobe — wrote the following in an unpublished (but oft-quoted) paper a dozen years ago: “Let us start with observations that are largely beyond question. These observations are not a subject of debate, but they beg further discussion: Ninety-six-kHz audio universally sounds better than 48- or 44.1kHz audio” (his emphasis). The great unwashed consumer base hasn't caught on to this because we're still waiting for that new medium to come along that will prove it to them and begin a long overdue renaissance in high-end audio, right?
Well, SACD and DVD-A have been on the scene for some time, but haven't made much of a splash in the consumer market. Direct Stream Digital (DSD) is being used quite a bit as a recording format in high-end classical and jazz circles; Telarc's doing everything in DSD these days. However, the problems of editing, processing and mixing recordings in DSD have never been solved well enough for the format to be adopted by the pop music world. Yet no matter how good they sound at the mastering level, the truth remains: The vast majority of DSD recordings are still delivered to the public on ordinary CDs.
According to a remarkable new study, however, the failure of new audio formats — at least the ones that claim superiority thanks to higher sample rates — to succeed commercially may in reality be meaningless. The study basically says that (with apologies to Firesign Theatre) everything you, I, Moorer and everyone else know about how much better high-sample-rate audio sounds is wrong.
The study was published in this past September's Journal of the Audio Engineering Society under the title “Audibility of a CD-Standard A/D/A Loop Inserted Into High-Resolution Audio Playback.” The study blew me away for a number of reasons. One is that it was almost identical to a study I proposed some years ago at the school where I was teaching, but it never got past the proposal stage. Second, the two authors of the study, David Moran and Brad Meyer, happen to be people whom I've known for several decades (we were all part of the crew covering audio and other technologies at The Boston Phoenix when I was starting out as a writer), but I had little idea what they were up to these days.
The main reason it knocked the wind out of me was its conclusions. It was designed to show whether real people, with good ears, can hear any differences between “high-resolution” audio and the 44.1kHz/16-bit CD standard. And the answer Moran and Meyer came up with, after hundreds of trials with dozens of subjects using four different top-tier systems playing a wide variety of music, is, “No, they can't.”
The experiment was wonderfully simple: The authors set up a double-blind comparison system in which one position played high-end SACDs and DVD-As through state-of-the-art preamps, power amps and speakers. At the other position, the output from the SACD player was first passed through the AD/DA converters of an HHB CD recorder and then through the same signal chain. The levels of the two sides were matched to within 0.1 dB, with the amplifier doing the matching in series with the CD recorder so no one could claim that it degraded the SACD signal. The test subjects used an “A/B/X” comparator to switch the signals, meaning that in some of the tests, when the subjects hit the Change button they didn't know if the signal actually changed.
There were 60 subjects, almost all of whom were people who know how to listen to recorded music: recording professionals, nonprofessional audiophiles and college students in a well-regarded recording program. In all, there were 554 trials during a period of a year. The experiment was done on four different systems, all employing high-end components and all in very quiet rooms designed for listening in both private homes and pro facilities. All subjects were given brief hearing tests to determine their response to signals above 15 kHz. That data, as well as the subject's gender and professional experience, was tabulated with the results.
The number of times out of 554 that the listeners correctly identified which system was which was 276, or 49.82 percent — exactly the same thing that would have happened if they had based their responses on flipping a coin. Audiophiles and working engineers did slightly better, or 52.7-percent correct, while those who could hear above 15 kHz actually did worse, or 45.3 percent. Women, who were involved in less than 10 percent of the trials, did relatively poorly, getting just 37.5-percent right.
So how did the audio community respond to this? Meyer tells me that he got a lot of “thank you” and “it's about time” responses. He also says that the article passed through the Journal's rigorous review process without any argument. But some loud screams were heard from various members in the audio-tweak community, and a number of heated and sometimes nasty flame wars erupted on several audio forums within hours of the article's release — many of them started by people who hadn't bothered to read it first.
Most of the objections were based on the fact that the authors didn't include in their paper the list of equipment and recordings that they used. Meyer explains that part of that reason was to keep the article from getting too long. But anyone familiar with the type of debate that often occurs in tweak circles knows that had the authors been specific about the components, they would have immediately been attacked on the basis that their equipment was, of course, inferior to what they should have used, and so, of course no one would hear any difference.
In fact, Meyer and Moran posted all the information about the signal chains and the source material within a couple of weeks of the article's publication on the Website of the Boston Audio Society, a venerable 37-year-old, independent non-profit organization, in which both authors have long been active. The equipment list included amplifiers from high-end manufacturers like Adcom, Carver, Sim Audio and Stage Accompany, and speakers from Snell and Bag End, as well as the oft-worshipped Quad ESL-989 electrostatics, which are supposed to have usable response up to 23 kHz — which is, of course, above the Nyquist frequency of the HHB recorder's converters. The subjects listened to discs that covered a wide range of material and included classical instrumental, choral, jazz, rock and pop, from audiophile labels like Mobile Fidelity, Telarc and Chesky.
So the objectors really didn't have much to object to. But if you think about it, the exact equipment list is largely irrelevant. If you assume the equipment, the listening environment and the listeners' critical faculties are all at least good, then what's most amazing about their findings is that the results were always the same, no matter what equipment they used or who was listening to it or what they were listening to. Not one listener, under any circumstances, could consistently distinguish between high-resolution audio that was passed through the 44.1kHz/16-bit CD “bottleneck” and audio that wasn't.
Does this mean that someone else couldn't do a similar experiment and end up with different results? Not at all — and Meyer and Moran are urging others to do just that. After all, this is what the scientific method is all about: If your experiment comes up with a certain result, then by publishing it you are inviting the rest of the world to copy (or expand on) what you've done and to see if their results agree or disagree with yours. I would love to see this experiment duplicated often, and I would be delighted to see someone come up with different results.
But wait a minute — haven't we all heard the superiority of high-sample-rate audio? Leaving the tweak-heads aside, there are a huge number of people in this field for whom I have real respect — Moorer among them — who have experienced high-sample-rate audio to sound more “spacious” or “detailed” or “enveloping.” You might even be one of them.
As it happens, I'm not, which is not to say I think everyone else is full of beans; I've just never experienced it in an environment that I feel was controlled enough for me to be comfortable making that kind of judgment. It's not that I'm lazy: As Meyer and Moran realized, setting up a test that could really be considered objective is not trivial. Even if I were the sole subject of the test, I'd still want lots of time, multiple music sources, incontrovertibly great equipment, an excellent level-matching system and a very quiet (and consistent) room.
I have had one experience that came close to this, but the result was inconclusive. At the press roll-out a dozen years ago of DSD at Sony's studios in New York, a group of audio writers got a demonstration of how the new system compared with a 20-bit PCM digital stream, as well as with a direct analog feed from a live band in the studio. I could hear some differences. Yet how to describe them — or whether I would hear them again in another time and place — I couldn't tell you. I did, however, mention a preference at the session for the way instrument decays sounded in PCM, to which David Smith (R.I.P.) replied, “We've heard that from others. In fact, you'd be very flattered if you knew who else said that same thing.” What the significance of that was, I guess I'll never know, but it didn't seem to get in the way of DSD ending up with plenty of fans among the recording community.
But something is causing people to say they are hearing differences. If a double-blind test can't confirm those differences, then what's going on? For one possible reason, let's go back to Moorer's paper that I quoted earlier (called “New Audio Formats: A Time of Change and a Time of Opportunity,” which can be found on his Website, www.jamminpower.com). Later in the paper, Moorer noted that humans can distinguish time delays — when they involve the difference between their two ears — of 15 microseconds or less. Do the math, and you can see that while the sampling interval at 48 kHz is longer than 15 µs, the sampling interval at 96 kHz is shorter. Therefore, he says, we prefer higher sampling rates because “probably [my emphasis] some kind of time-domain resolution between the left- and right-ear signals is more accurately preserved at 96 kHz.” It's an interesting starting point for a discussion, but to my knowledge it's never gotten past that point — as a theory, it has never been expanded upon or tested. And judging from the results of Meyer and Moran's experiment, it doesn't seem to be a factor.
Some folks think it's all simply wishful thinking on everybody's part: The system costs more and has better specs; therefore, we make ourselves believe it sounds better. There's something to that reasoning. Humans are a notoriously imperfect lot and tend to see and hear what we want to hear. Another very plausible reason is something that the authors discovered in their research. Despite the fact that no one could hear the difference in playback systems, they reported that “virtually all of the SACD and DVD-A recordings sounded better than most CDs — sometimes much better.” As it wasn't the technology itself that was responsible for this, what was? The authors' conclusion is because they are simply engineered better. Because high-end recordings are a niche market, “Engineers and producers are being given the freedom to produce recordings that sound as good as they can make them, without having to compress or equalize the signal to suit lesser systems and casual listening conditions. These recordings seem to have been made with great care and manifest affection by engineers trying to please themselves and their peers.”
But there's one more reason worth examining, among whose proponents is Ethan Winer — a musician, engineer, studio owner, manufacturer and iconoclast who's been in the recording business for some 40 years — who is definitely of the “show-me” school of audio theory and is an outspoken critic of “subjectivism” — that school of thought that encourages people to discuss the performance of audio components and systems using vaguely definable and often irrelevant adjectives instead of hard data. Winer's company, RealTraps, manufactures modestly priced acoustic treatment products for studios, so it's not surprising that he contends that anomalies caused by the listening space and our place in it far outweigh any possible subtleties we might be picking up when we change sample rates.
In an article on his Website (www.ethanwiner.com), Winer points out that in a typical room, moving one's head or listening position as little as four inches can result in huge changes in the frequency-response curves one is hearing. What could be a 10dB dip in one spot at one frequency could be a 6dB boost a couple of inches away. These wide variations are caused primarily by comb-filtering effects from the speakers and from the various reflections bouncing around the room, which are present no matter how well the room is acoustically treated. Winer blames this phenomenon for most of the unquantifiable differences people report hearing when they are testing high-end gear.
He writes, “I am convinced that comb filtering is at the root of people reporting a change in the sound of cables and electronics, even when no significant change is likely. If someone listens to their system using one pair of cables, then gets up and switches cables and sits down again, the frequency response heard is sure to be very different because it's impossible to sit down again in exactly the same place. So the sound really did change, but probably not because the cables sound different!”
The test subjects in the Meyer/Moran experiment didn't get up and move around, and so the fact that they couldn't discern any differences in the two signal paths fits nicely into Winer's theory. In fact, his response when I sent him the article was, “Nothing in here surprises me.”
Am I sure that Winer is right? No, although I think he's onto something, the way I think Moorer's thoughts about microscopic phase differences may be important in some way we haven't yet figured out. But I am delighted to read Meyer and Moran's paper for two reasons: It confirms something I've long suspected and it throws down the gauntlet for further research to be done.
Paul D. Lehrman doesn't have much frequency response above 10 kHz, but considers himself more aware than ever.