Why is Double Blind Testing Controversial?


I noticed that the concept of "double blind testing" of cables is a controversial topic. Why? A/B switching seems like the only definitive way of determining how one cable compares to another, or any other component such as speakers, for example. While A/B testing (and particularly double blind testing, where you don't know which cable is A or B) does not show the long term listenability of a cable or other component, it does show the specific and immediate differences between the two. It shows the differences, if at all, how slight they are, how important, etc. It seems obvious that without knowing which cable you are listening to, you eliminate bias and preconceived notions as well. So, why is this a controversial notion?
moto_man
Rzado: My point on retesting is this: If something really is audible, sooner or later somebody is going to hear it, and get a significant response, for the same reason that sooner or later, somebody is going to flip heads instead of tails. If you keep getting tails, eventually you start to suspect that maybe this coin doesn't have a heads. Similarly, if you keep getting non-significant results in a DBT, it becomes reasonable to infer that you probably (and we can only say probably) can't hear a difference.

As for published studies, the ones I've seen (which may not be the same ones you've seen) generally did get the statistics right. What usually happens is that readers misinterpret those studies--and both sides of The Great Debate have been guilty of that.
Good post, Bomarc - I agree with 98% of what you had to say. I guess the one thing I'm not sure about is the point you are making with respect to multiple inconclusive tests lending to a strong inference that a difference is inaudible. If you have multiple tests with high Type 2 error (e.g. Beta ~.4-.7), I do not believe this is accurate. However, if you have multiple tests where you take steps to minimize Type 2 error (high N trials), I can see where you are going. But you are correct, that can start getting messy.

Thanks for clarifying your point about statistics, though. In general, I tend to give experimenters the benefit of the doubt with respect to setting up the DBT, unless I have a specific problem with the setup. But I agree, there are numerous ways to screw it up.

However, the few studies in high-end audio with which I am familiar(e.g. the ones done by Audio magazine back in the 80's) in general suffered from the problems outlined above (small N leading to high Type 2 error, erroneous conclusions based on non-rejection of null hypothesis due to tests not achieving p value < .05). There have been a couple of AES studies with which I'm familiar where the setup was such that p_u was probably no better than chance - in that circumstance, you can say either the setup is screwed up or the interprettion of the statistics is screwed up. At least one or two studies, though, were pretty demonstrative (e.g. the test of the Genesis Digital Lens, which resulted in 124 out of 124 correct identifications).

My biggest beef with DBT in Audio is that you just need to do the work - i.e. use high N trials - which is a lot easier said than done.
Thanks, Rzado, for the refresher course. Let me try to summarize for anyone who fell asleep in class. In a DBT, if you get a statistically significant result (at least 12 correct out of 16 in one of Radzo's examples), you can safely conclude that you heard a difference between the two sounds you were comparing. If you don't score that high, however, you can't be sure whether you heard a difference or not. And the fewer trials you do, the more uncertain you should be.

This doesn't mean that DBTs are hopelessly inconclusive, however. Some, especially those that use a panel of subjects, involve a much higher number of trials. Also, there's nothing to stop anyone who gets an inconclusive result from conducting the test again. This can get statistically messy, because the tests aren't independent, and if you repeat the test often enough you're liable to get a significant result through dumb luck. But if you keep getting inconclusive results, the probability that you're missing something audible goes way down.

To summarize, a single DBT can prove that a difference is audible. A thousand DBTs can't prove that it's inaudible--but the inference is pretty strong.

As for my statement about statistics not being the weak link, I meant that there are numerous ways to do a DBT poorly. There are also numerous ways to misinterpret statistics, in this or any other field. Most of the published results that I am familiar with handle the statistics properly, however.
I think Hearhere summed up the issue well in his last post, but I would come at it from a slightly different angle. Simply put, DBT is not, in and of itself, "controversial." However, there is a great deal of misunderstanding/ disagreement regarding its use and applicability. More particularly, DBT is simply a tool, the results of which are interpreted based on statistical analysis, and must be understood in that context. While DBT does have some applicability in the audio context, it is not the be-all and end-all that some make it out to be.

There are two main problems with how DBTs are used/viewed by certain audiophiles. First and foremost, what many do not understand (but what anyone with experience in statistics can tell you) is that if there is a non statistically significant result, the DBT has not “proven” there are no differences between conditions! Rather, all that can be concluded is that the DBT failed to reject the null hypothesis in favor of the alternative hypothesis.

Second, small-trial (aka "small-N") listening tests analyzed at commonly used statistical significance levels (e.g. <.05) lead to large Type 2 error risks, thereby masking the very differences the tests are supposed to reveal.

Now breaking that down into English is a pain, but I'll give it a shot (I’m an engineer, as opposed to s statistician - thus any stats guys feel free to correct me). In a simple DBT, one attempts to determine if there are audible differences between two conditions (such as by inserting a new interconnect in a given system). This is more commonly called a hypothesis test - the goal is to determine whether you can reject a "null hypothesis" (there are in fact no differences between the two conditions) in favor of a "conjectured hypothesis" (there are in fact differences between the two conditions).

In a DBT, there are four possible results: 1) there are differences and the listener correctly identifies that there are differences; 2) there are no differences and the listener correctly identifies there are no differences; 3) there are no differences, but the listener believes there are differences; and 4) there are differences, but the listener believes there are no differences. Obviously, 1 and 2 are correct results. Circumstance 3 (concluding that differences exist when in reality they don’t) is commonly referred to as "Type 1 error". Circumstance 4 (missing a true difference) is commonly referred to as "Type 2 error". Put in terms of the hypothesis test stated above, type I error occurs when the null hypothesis is true and wrongly rejected, and type II error occurs when the null hypothesis is wrongly accepted when false.

Now, things get a little complicated. First we need to introduce a variable, p_u, which is the probability of success of the underlying process. In the listening context, this is the probability that a listener can identify a difference between conditions, which is based on the acuity of the listener, the magnitude of the differences, and the conditions of the trial (e.g. the quality of the components, recording, ambient noise, etc). Unfortunately, we can never “know” p_u, but can only make reasonable guesses at it.

We also need to introduce the variable "alpha". Alpha, or the significance level, is the level at which we can reject the null hypothesis in favor of the alternative hypothesis. By selecting a suitable significance level during the data analysis, you can select a risk of Type 1 error that you are willing to tolerate. A common significance level used in DBT testings is .05.

Finally, we need to look at the probability value. In hypothesis testing, the probability value is the probability of obtaining data as extreme or more extreme than the results achieved by the experiment assuming the null hypothesis is true (put another way, it is the likelihood of an observed statistic occurring on the basis of the sampling distribution).

Once the DBT is performed, one compares the probability value to alpha to determine whether the result of the test is statistically significant, such that we can reject the null hypothesis. In our example, if the null hypothesis is rejected, we can concluded there are in fact audible differences between ICs.

Now, here comes the fun part. It might seem that you want to set the smallest possible significance level to test the data, thereby producing the smallest possible risk of Type 1 error (i.e., set alpha to .01 as opposed to .05). However, this doesn’t work, because, as you reduce the risk of Type 1 error (lower alpha), the risk of Type 2 error necessarily increases.

Further, and a greater impediment to practical DBT testing, is that the risk of Type 2 error increases not only as you reduce Type 1 error risk, but also with reductions in the number of trials (N), and the listener's true ability to hear the differences under test. Since you really never know p_u, and can only speculate on how to increase it (e.g., by selecting only high quality recordings of unamplified music using a high quality system to test the ICs), the best ways to reduce the risk of Type 2 error in a practical listening test is by increasing either N or the risk of Type 1 error.

Now for some examples. Let's assume we use 16 tests on the IC in question. For purposes of the example, further assume that the probability of randomly guessing correctly whether the new IC was inserted is 0.5. Finally, we must make a guess at “p_u”, which we could say is 0.7. In this instance, the minimum number of correct results for the probability value to exceed .05 is 12 (our type I error in this case is = 0.0384). However, our type II error in this case goes through the roof - in this example, it is .5501, which is huge! Thus, this test suffers from a high level of type 2 error, and is therefore unlikely to resolve differences that actually exist between the interconnects.

What happens if there were only 11 correct results? Our p value is then .1051, which exceeds alpha. Thus, we are not able to reject the null hypothesis in favor of the alternative hypothesis, since the p value is greater than alpha. However, this does not allow us to concluded that there are in fact no audible differences between Ics. In other words, data not sufficient to show convincingly that a difference between conditions is not zero do not prove that the difference is zero.

So now lets increase the number of trials to 50. Now, the number of correct results needed to yield statistically significant results is 32 (p value = .0325). Assuming again p_u is 70%, our Type 2 error drops to ~ 0.14, which is more acceptable, and thus differences between conditions are more likely to be revealed by the test.

OK, one last variation. Let’s assume that the differences are really minor, or we are using a boom box to test the interconnects, such that p_u is only 60%. What happens to Type II error? It goes up - in the 50 trial example above, is goes from .1406 to .6644 - again, the test likely masks any true difference between ICs.

To sum up, DBT is tool that can be very useful in the audio context if used and understood correctly. Indeed, this is where I take issue with Bomarc, when he says "I don't want to get into statistics, except to say that's usually not the weak link in a DBT". Rather, the (mis)understanding of statistics is precisely the weak link in applicability of DBTs.
Socrates: You've asked a mouthful of questions. I'd suggest you start out with this site: www.pcabx.com, where you can download software that will allow you to conduct your own DBTs.

I don't want to get into statistics, except to say that's usually not the weak link in a DBT. As for the "fallibility of science," that's not the way I'd put it. I'd say that science is never finished, and it can always discover somethng new, or that something once "proven" to be right is in fact wrong. Science, in short, is the best explanation we have right now for whatever phenomenon we wish to explain. But you can't just wish it away. Current knowledge stands as knowledge--as fact--until somebody comes along with new knowledge that refutes it.

That said, anyone--and I mean anyone--who does serious research on either human hearing or sound reproduction uses DBTs--and ONLY DBTs. No one in the scientific community would think of doing a listening test any other way, because such tests are absolutely necessary to isolate and compare only the sound.
All this talk of DBT, could anyone provide a link to any such reputable, controlled testing done in audio, including the statistical manipulations used to obtain the conclusions, so we at least know what we are arguing here? Anyone care to philosophize on the great fallibility of science in general, or even specifically on the science and statistics involved in any such testing? Alan Chalmers, anyone?
To answer the original question, DBT is controversial because there are widely divergent views of its accuracy and applicability. One group of people feels that DBTs as a test methodology are inherently incapable of demonstrating audible differences. Another group feels the opposite. A fertile topic for discussion, in my opinion.

The rancor comes, unfortunately, when fringes on one side or the other feel the need to characterize those with whom they disagree as either "meter readers with no hearing/bad systems/no experience/etc." or as "delusional and indulging in wastful fantasy". Neither is correct (well, not in most cases), nor productive to meaningful discussion of the subject at hand.

Why some feel that this particular topic - why the controversy over DBTs - is unsuitable for discussion mystifies me. Well, not really . . .
Banning topics such as this is a very bad idea. Despite limited regressions into philosophy and politics Audiogon has consistently shown that intelligent and polite dialog regarding audio is possible.
Huh? Redkiwi and Seandt were obviously joking. None of the recent posters seem upset. And, this topic is only off limits in the cable asylum. No rule against mentioning or discussing DBT's in the general or other specialized asylums.

Simple courtesy should be sufficient here. If someone asks, as in the initiation of this thread, "what's so bad about DBT's?," it should be obvious that he doesnt think anything is wrong with the subject, and if someone does, he might either ignore the thread or give his point of view without picking a fight.
This is exactly why this topic is off limits at Audio Asylum. I think Audiogon should follow their excellent lead.
did I hit a nerve ? Or did I miss sarcasm in Redkiwi's response ? It was a joke guys .. let's laugh at ourselves occasionally. HiFi is a very unimportant topic in a world full of war, famine and death, and certainly not worth getting worked up about.
Tinged spectacles. Are we getting carried away with this? What's next white canes and seeing eye dogs. At least the dogs might be able to confirm extreme frequency responses as well as finding the stuff that really stinks.
This is a very important proposal and needs urgent consideration - and should not be treated flippantly Sean. I for one am very concerned that readers of posts are likely to be biased against individual posters - for example, had I a prejudice against the Irish I may very well have not taken the last post as seriously as it deserves. The danger if we do not have double blind posting - and I believe posts should be scattered randomly about the site, just to be sure - then readers will not get the true meaning of what is posted here, as they will read with tinged spectacles.
Wellfed, in no way did I mean to imply that audiophiles are particularly susceptable to deception, either external or internal. It really does seem to be the case that humans - all of us - are wired for sensory "over detection"; nothing bad or good about it, that's just the way we are.

Don't underestimate the other attributes of audio components - things like build quality, reliability, corporate reputation, ergonomics, visual presentation/industrial design, price, etc. are all perfectly valid areas upon which to base and build preferences. Nothing bad about that, either. I strongly suspect that my Nordost ICs don't sound any better (or worse) than the overwhelming majority of alternatives (and haven't heard any differences, either), but enjoy the fact that they are technically one of best out there.
I have a proposal ... double blind posting. Audiogon allows us to post our views with anonymity. Other Agoners then guess who posted which post, according to the content of the post, and the (extreme) opinions therein.
Suggestions for starter threads : "Power cords make no difference" and "SACD is killing digital".
To address Sean's and other's points about pulling random folks in off the street for DBTs, you are quite correct. You can't just grab someone off the street, put the music on and test away. In truly valid tests - and these are in the minority, I suspect - there is a training component. During this time, if I recall correctly, some screening is also done.

The best candidates for DBTs may be well-trained 10-year-olds!
A true double blind test wouldn't be easy to set up, but as a scientific experiment it really isn't very hard. As far as interpreting the results of an experiment goes, I don't think you can design an experiment that isn't open to misinterpretation. It's a very different statement and attitude to say that a DBT has to be well designed and rigorous than to say DBTs serve no useful purpose.
Onhwy61 -- it is specifically the motivations that yield the controversy. A DBT could, with great effort, be designed to yield reasonably valid results. The posts above suggest that such effort is seldom made because the test sponsors don't understand or believe that effort is required. Without going through all that work, the test becomes a "double deaf test" as Unsound suggests.
Those that are administering the tests do not know what is being tested at the very specific time of the DBT, but that does not mean that there isn't some type of log or record that couldn't be taken of such an event by an outside source such as a computer, etc.. Sean
>
Sean, I always thought one of the key features of a true DBT is that the subject does not know the nature of variable element. When applied to audio the listener should not know what to listen for.

I find it interesting that both sides of this issue are strongly suspicious of the motives of the other side. It seems that the interpretation of the results are more controversial than the actual DBT itself.
Wellfed: I was referring to people collected off the street as "knuckleheads". I should have said "average joe's", etc... I do agree with your point though : )

I can't understand why one would want to perform tests on subjects that have no idea as to what they are listening for or how to discern the differences. That is, unless one wanted to promote a certain ideology with the less than optimized test individuals and conditions. Most of the test results that are foisted upon us are those performed upon random individuals, not those that know how to listen and not just "hear". There is a BIG difference as far as i'm concerned.

Even within the ranks of "skilled listeners" you'll have variances as to what people can hear and what they listen for in terms of sonic cues and signatures. As such, if one wanted to make some type of "final statement" as to what was audible and what wasn't, you would have to assemble a very large group of individuals from all walks of life and go from there. At that point, one could start off with simple ( highly audible ) test differences and weed the crowd out from there. As the tests became harder, the "cream of the crop" would be left. At that point, we might be able to say that the average person off of the street will only make it from Point A up to point M in terms of audible discernment. Those that fell short of Point M would be considered to be below average in hearing and / or listening abilities. A select few might make it up to Point S, but anything beyond that would truly require excellent ears and trained listening skills. Beyond that point, it is possible to hear from point A to point Z under ideal conditions by a person with excellent hearing and listening skills. It would be these people that i would use as "guinea pigs" when trying to draw the line between what the human ear and brain is capable of detecting and processing in a linear manner. Does this make sense ?

This approach allows room for growth AND reduction based upon the individual. Obviously hearing and listening skills vary from person to person AND change over time. To me such testing would be logical and i might tend to believe the results a bit more.

If i had to pick and choose an individual to represent "audiophiles" as a group in terms of hearing acuity and listening skills, i would have gone with Enid Lumley circa the late 1970's and early 1980's. I have no doubt in my mind that she was a very skilled listener and had excellent hearing. I'm also 100% certain that she could hear things that i ( and probably most others ) can't. As such, her test results would give me a point of reference as to just how much one could hear and how much i was actually missing. Sean
>
Sean, my response to Hearhere pertained to his/her first post. While Hearhere, in my eyes, appears to be one of the more honorable, reasonable, and sincere of the DBT suporters, there is still a significant insinuation that audiophiles, as group, are subject to powerful forces of deception, along with the insinuation that such deception is prevalent. As for knuckleheads, I would suspect that they are present in both camps. It would be nice if someone could devise a DBT to determine who the knuckleheads are, but for the time being I think it best to determine these by simple subjective discernment. I truly hope I am not getting too nasty with my commentary, sigh.
Wellfed: I think that Hearhear was saying that well conducted DBT's are supposed to be able to allow researchers to identify if there is a discernable difference, not that there aren't discernable differences. Once they can verify that differences are detectable on a repeated basis under comfortable conditons, they can then dig in and try to understand exactly what those differences are and why they exist.

Personally, i have no problem with this type of test so long as suitable subjects are used. I do have a problem with knuckleheads selected at random being forced to make decisions at the drop of a hat under less than ideal / uncomfortable conditons with products / materials that they are unfamiliar with and the results from those "tests" being force-fed to us as being "the truth". Sean
>
Hearhere, again why the need to insist or imply that the average audiophile is deceived. What are the "many factors that drive preference in addition to sound" besides ergonomics, convenience, and build quality.

In a Utopian sense I love the idea of DBT, I doubt I would ever take the time to evaluate a component this way however. It just isn't efficient or necessary for me to do so. I have no difficulty accepting, even admiring, someone's efforts to evaluate upgrades in this manner, but so many of the conclusions/preconceptions of some DBT disciples (or claimed disciples; see below) are so clearly absurd, then to have these faulty conclusions presented so forcefully as truth is somewhat vexing, if allowed under one's skin. I would expect the converse to be true as well when called names.

I also believe that many identified as DBT disciples probably are mislabeled and really should be identified more accurately as skeptics hitching themselves to the DBT banner. It would be interesting to find out how many in this category truly practice DBT methodology. Presumably these are the ones that subjective audiophiles find to be ignorant and grating when addressing issues that contradict truths revealed to the subjective disciple based on their personal experience. The subjective conclusion would be supported by DBT if all variables were controlled, assuming of course that subjective disciple is not deceived. What what this thread all about again?
Sean, Sean, Sean. DBTs exist to serve ". . . those that need an explanation for all things and don't believe in things they can't explain"?? Before explaining a phenomenon, perhaps it is a good idea to demonstrate that the phenomenon exists in the first place. DBTs are used for exactly that purpose.
It's controversial mainly because most people don't understand the methodology of DB test or even how to interpret results.

First, DBTs are probably not very useful as a means of selecting components for most people. Not because they wouldn't reveal audible differences, but simply that there are many factors that drive preference in addition to sound. Even in cases where there are no *audible* difference between components (a lot more common that most A'goners will admit, clearly), that doesn't preclude differences in other attributes that lead to real, valid, non-questionable preferences for one component over the other.

Second, the claims that DBTs inherently obscure differences, or that you can't hear differences in a DBT format, just factually don't fly. DBTs have been shown to resolve differences down to the theoretical limits of hearing.

What's really hilarious, though, are the claims that those who support DBTs do so to avoid buying high-priced gear. That somehow they're all so confused by the vast array of components that they run and bury their heads in scientific sand. No, the real reason that DBTs exist is the well documented tendency for people to see things that aren't there (and the converse) and to hear things that don't exist (and the converse there as well). Humans seem to be wired this way - to "over detect" - and DBTs work to eliminate this effect, apparently to the discomfort of many.
Sean, I was not referring specifically to cables, though it seems the tone of this discourse went in that direction.

I was talking about blind testing,in general.

Jim
Oh, Sean, I didnt mean anything by it. I was referring to another thread in which I had said that salespeople couldnt tell things apart. This was, at the time of my post above, a relatively friendly thread, and I was just trying to be friendly despite our differences.

Back on topic, obviously, a lot of people feel threatened by the very mention of a DBT, apparently afraid of something. I am not an advocate of DBT's for the average audiophile. Too inconvenient for one thing, and for another, I've never participated in one. I just listen to stuff, get up, change the cables and listen to the other component. If I am acting on my imagination rather than perfectly accurate hearing, I don't really care, because I'm not designing, reviewing or selling anything, and my imagination so far is fairly consistent. Oh, yeah, there's that expectation thing - what the hell, whatever works.

I think those who hate the concept of a DBT should just go about their business secure that no one will ever force them into undergoing such an unpleasant procedure. Nothing to be afraid of. But I am pleased that the designers of the audio components and speakers I buy use them in their work and that a few legitimate audio reviewers also use them.

Paul
Paulwp: Why specifically drag me into this ? I made no comment either way. I think everybody here that has read more than a few of my posts would know where i stand and i was willing to leave it at that. Having said that....

Jwrobinson: Why would listening levels change within a system if a cable was changed ? I am talking RCA vs RCA and XLR vs XLR ( apples to apples, no change in system gain, etc... ). So long as the cables are of reasonable design ( adequate gauge so as not to incur voltage drop due to series resistance ), there should be NO change. That is, IF "wire" really is "wire" and conductors are conductors.

The only reasonable explanation would be that the equipment is loading up differently. Since it is loading up differently, wouldn't it be logical that the response of said equipment has been altered to what is a measurable, and quite possibly, an audible extent ?

As far as your concerns regarding "cleaning the connections when cables are swapped" possibly altering our sonic perceptions, any type of "reasonable" connection that has recently been plugged / unplugged should measure less than a few hundred milli-ohms. If a few hundred milli-ohms can alter our sonic perception and is audible, why wouldn't something so large as what could be a drastic change in capacitance and / or inductance due to differences in cable design have the same effect ?

You are willing to apply specific arguments as to why specific changes are not audible, but when you are asked to apply that same logic as to why they "could" be audible, those variables and equations are no longer acceptable.

THIS is the main reason that most audiophiles and "music lovers" abhor these threads and this topic. Most DBT enthusiasts are simply hypocrites with closed minds. To be fair though, there are those that do perform such tests with open minds under very controlled and realistic conditions. They do this in order to further our understanding, knowing that there is much that we do not know and still need to learn. To those folks, i say "kudo's" and "keep up the good work".

To those that need an explanation for all things and don't believe in things they can't explain, i can only respond with the following passage: "Claiming themselves to be wise, they were made as fools". Just because we can't physically see moisture and condensation being wicked up into the atmosphere and collecting in clouds, that does not mean that it doesn't rain. Just because they believed the Earth to be flat, people did not fall off the edge when travelling "too far" in one direction. Just because they believed that the Earth was the center of the universe, the planets in our solar system did not stop revolving around the Sun. As such, just because we as humans don't understand or have the knowledge to explain does not mean "it is so". If some of you can't grasp the reality of that and think that mankind knows all that we need to know, i feel sorry for you. Sean
>
It was like that with me. Some of the things that I have purchased are expensive, to me that is, probably not to some of you. I would look at the new expensive thing and want to like. I've not just listened for a few minutes either. Try months. About the time I'm sure that I now have something better, I bring my wife into the game. She knows nothing about this stuff, could care less. Put simply: When I'm doing the switching I can tell the difference every time. When my wife does it I fail half the time.
I don't mind the double blind reviews. The ones that bother me are the ones that eventually prove to be the product of double deaf.
Most people who claim to hear differences in cables, or whatever are doing something wrong in there methodology. Most likely, they are not level matching to within .1db.
This is essential for fairplay.

If there is a difference between cables, those differences can be explained by two things: The RCL characteristics and
the cleansing effects that you get when unplugging and replugging cables.

When you know what you are listening to you want very badly to hear a difference, especially if you just paid $100s or $1000s of dollars for a few feet of wire.

We wouldn't be having this conversation if listeners would quit using terms like 'jaw dropping' and such.

I've heard differences in noise levels and other such anomalies in phono sections, but I just don't hear the things some claim to hear.

Jim
to audition equipment in their home with their equipment using the DBT method. Thus controversy exits.
Gs5556, of course we did that. Do you think that a couple of young audio salesmen would not try that trick? We tried fooling each other any way we could. That was part of the fun.
Because when you DBT, some people will hear differences where none exist and some will not hear differences when they do exist. This makes any so-called scientific study invalid since it cannot show which is right - those wo say that there are no differences and ones who say there is. It is much harder (maybe impossible) to prove by testing that you cannot discern differences than to say you can.

Why? I think it's because the brain will discern differences in sound when it wants to. It's part of our survival instinct (a police officer chasing an armed person can think a twig snapping is a gun being cocked and will react defensively). Also, not to pick on Twl, if you take Twl and his co-workers informal scientific, but fun, experiment: if I were to tell each of them that I replaced one component when I in fact did not, I would bet that more than one (maybe all) would have identified something as being changed. It's the way the brain works - if it thinks there is a difference, it will try its best to find one (especially with the peer-pressure factor thrown in). Conversely, if it thinks there is no difference, whether out of prejudice or justification, then it won't find one. But if you tell the brain there may be or may not be a difference, then it's up to the listening skills to be invoked - and that too varies wildly among people; to the point where if someone hears a difference in a component might not (and vice versa) in a DBT. That's the controversy IMO.
Labtec, just because I have used blind testing in the past, does not mean that I consider it terribly useful. I can much more easily get the answer I need regarding the performance of a product, by just doing a simple listening test, and dispensing with all the blindfolds, and mystique of blind testing. It is simply not necessary for me. If I find that something sounds so close to what I have, that I can't really tell, I don't really need to be considering an upgrade to that product. Very simple really.

As far a blind testing being used by others, I don't think that it is used for testing the sound of equipment by very many people at all. I think that it is used for the purposes I stated, which is to "refer" to blind-testing, as a diversion from the real issue.

I am making a distinction here, between actual blind testing, and the way it is "referred to" in the context of this discussion. It is referred to as "scientific" reasoning that blind testing will show that audiophiles cannot hear what they claim to hear, and that is what I take issue with. It has been my experience that blind testing actually bears out the statements that audiophiles make about hearing differences in equipment. Not in all cases, because some equipment is not different enough for many, or any, to hear. In most cases that I have seen, it is different enough to hear.

I have never claimed that there are differences in ALL cases, and this is borne out in regular informal listening tests as well as it may be in blind testing.

So, I feel that blind testing is not yielding any more information than we already can obtain by simple listening testing. But I also feel that blind testing is being used for an entirely different purpose, in the context of these discussions. In these discussions, it is often claimed that no difference in sound exists between low and high priced cables, or amplifiers, which are the 2 items most often cited in this context. It is then generally stated that if we were to be placed in a blind testing situation, that we would not be able to tell the difference, and thus are wasting money by buying higher priced "audiophile" cables and amplifiers. I feel that since it is obvious, not only by my experience, but by simple empirical inspection, that items made with different designs, and different components will have a different sound, that this line of argument is not related to equipment at all, but must have another purpose. And I have stated what I feel that purpose is.

But it also has another effect. The haranguing of audiophiles that they are imagining things, and "scientific" testing will prove that they are incapable of making sonic decisions, serves only to shake the confidence of a listener, and make him feel that he is imagining things that he is actually hearing. This is a destructive effect, that may in fact lead him to make a bad decision regarding the quality of his audio gear, if he is sufficiently swayed by this line of thinking. If there is truly anyone here who thinks that lamp cord is the best speaker cable obtainable, then they need to take up another hobby.

This is really an extension of a 25 year old battle, which started out in the 70s with the "measurement" people. They read a couple of ads, and thought they knew everything. Then they went around proclaiming that a Technics reciever would be as good as anything you could buy, because it had .0000001% distortion, which we now know was provided by gross amounts of negative feedback, and had hugely destructive effects on the sound quality. But, did this stop the measurement people from running around telling everyone that they were wasting their money on "audiophile" amplifiers? No. In fact they continued to rant about how no high-end amp could be worth the money, when it had higher "measured" distortion than the Technics reciever. And that anything different we heard in these high-end amps was either our imaginations, or it was "euphonic" types of distortion, which is some unknown type of distortion which somehow makes us like the sound more, while still actually being bad. They are still at it with this one. I hear references to "euphonic" distortion all the time on these pages, whereby the implication is made that a tube amp cannot be accurate, because of distortion measurments. After the world caught on to the gag, these measurement people had to "hole-up" for a while. Now they are out again in full force with this again with the cable thing, and they are throwing the amps in again too, in case they can get any more "mileage" out of that one.

While certain measurements may be a decent indicator of whether a component will perform in certain ways, the final arbiter is the ear. This equipment is made to be listened to by ears, and the ears are the judge, regardless of whether the measurements line up or not. The most perfectly measuring amp in the world is not worth a thing, if it sounds like crap. Conversely, an amp that sounds fantastic is worth plenty, regardless of what the measurements are.

This is the same thing as we are seeing here. And the same techniques are being used. Point at a measurement or "scientific" test, and proclaim that anything that you hear that doesn't line up with this data, is imaginary. And to make matters worse, proclaim that if there is no measureable data, that it cannot exist. This is not scientific at all. In the case of blind testing, it is used as a "scare tactic" to make audiophiles think that if they were actually to be placed in a blind testing position, that they would fail miserably, and their entire system is nothing more than an imaginary delusion.

Why would people do this? It serves no purpose but one. To justify their decisions to not make high end purchases, by concocting this story. It truly amazes me that this gets any mileage at all. It is so patently absurd. It flies in the face of what is obvious. They want you to disregard what you experience, and replace it with a reliance on some number, or worse yet, replace it with another school of thought, based upon the "fear" that they try to instill in you that you could not "hold up" under the scrutiny of this blind testing. I find this very objectionable. For someone to imply to me, that I am incapable of making my own decisions based on my own experience, is not going to fly too far with me.

You can all make up your own minds on this matter. And anyone who has the schools of thought that I negatively referred to has every right to maintain them. They have every right to express them. As do I. It is up to you to separate the wheat from the chaff. And the decisions you make will ultimately define what your system sounds like.
I advocate this testing as a way of attempting to control variables in subjective evaluations, not as means of disproving the merits of high-end, high- priced components.
TWL, I'm not following your logic or maybe I just didn't read it carefully enough.

First, you make a point that DBT proved to be virtually the same as non-DBT for you. (BTW, congrats on the ~ 100% results. I wish more "reviewers" would do the same exercise.)

Then, you basically claim that DBT is primarily for people that want to save money (or claim that their lower priced system is just as good as any other.) This seems to be a stretch to me.

First, I doubt that most DBT proponents advocate it to justify their inferior system. You're taking a cynical view of an opposing viewpoint and making a generalization. Regardless, even for those who use DBT to solely justify a lower priced product, what other method would you recommend that is better? You, yourself, said the results were identical.

Bottom line -- DBT is either an accurate test or not. If it isn't, then your 100% non-DBT concurrence with DBT tests would be a bad thing wouldn't it???
Blind and double-blind is a way, as we all know, to attempt to remove the subjectivity from what is hoped to be an objective evaluation. It's great for testing some things in which there is a clear hierarchy of "poor" to "excellent." Eyes, ears, and taste buds, however, don't conform to objectivity. Try a blind wine-tasting to see how many diverse opinions are being swished around the tongue and spat out into the bucket.

I tried a blind (but not deaf!) test last night, as it so happens, of some Vampire vs. Nordost vs. van den Hul interconnects.

It's REALLY hard to change cables with one's eyes shut! I knocked my turntable onto the floor, stepped onto the open try of my CD player, connected the phono stage to the tape deck, smashed my 180gm Dylan LP, and shoved my index finger through the cone of my Cabasse.

After all that, all the cables sounded like s**t. In truth, the van den Hul bested them all, followed (at some distance) by the other two. The $15 Vampire sounded no different than the $150 Nordost.

Blind testing is fun, as Twl notes, but very difficult to implement in real life.
What an interesting thread. I've othen wondered if there would be value in SB/DB testing and why it isn't done more often. It seems to me that the reason for such tests is twofold, firstly to identify on an objective basis specific differences between components - eg transparency, ambiance, soundstage, resolution, focus, brightness, transient attack, clarity, etc; and secondly which component gives greater musicality over a period of time eg which gives the best pleasure, which is nearest to the original recording, which is nearest to live sound. The problem is that everyone hears sound differently. That's why some like the sound of the Festival Hall, and can't bear the sound of the same orchestra playing the same piece under the same conductor, in the Albert Hall (I live in the UK). I suggest the success or failiure of SB/DB testing would depend on its its ability to allow the listener to consistently pick out the component which gave them the greatest pleasure, or produced the specific type of ambiance, transparency, etc. that they wanted.

Most people can tell the difference between listening to components placed on a bog standard shelf in the living room, or on specifically designed audio stands such as the Sistrum, or Townshend: whether they prefer one over the others is dependent on how they hear sound. It's the same with cable burn-in: I defy anyone with normal hearing not to agree there is a difference between a fully burnt-in cable and a virgen cable. But that difference may be totally unimportant to them.

So maybe BS/DB testing at an individual level would have value, if one had the time and the money to spend on it. Otherwise, perhaps the knowledge - gained by quick A/B comparision, reviews, and most importantly people's opinions on forms such as this one - that component A has slightly more of what we want in terms of amibiance, transparency, etc; is about as far as we can reasonably expect to go, and very probably good enough for all but the most "golden-eared" of us.
Twl, thanks for taking the time to clearly articulate what is obvious to many of us, some products are superior to others and the decision is usually, whether the improvement is worth the cost. Also, thank God, there ARE superior products that cost less than the competion. Then there is also the issue of synergy and the proper matching of components. Judging from your system profile, it's obvious you are doing your homework. God bless.

TRUTH is controversial, to be sure - follow this link Audio Asylum post concerning hyperbole
I have done blind testing myself, on many occasions, as I have mentioned before, in an earlier thread on this subject. When I worked in a high-end audio shop, when there were slow times, when nobody was in the store. Myself and the other employees would do it for fun. It was like a contest. We would set up equipment when 1 person was out of the room, blindfold him, and bring him in to listen. Then we would see how many components in the system he could identify. Everyone was surprisingly good at this, but nobody was 100%. Of course, we were going for multiple components at the same time. With only 1 component in a known system, we all could indentify it virtually every time. Now, we were very familiar with the sound of all of the gear we tested this way, so that made it easier. But, it wasn't that tough to do. It was in a fairly relaxed environment with no real pressure. Just fun.

So, I am not afraid of what would be the result of these tests, in my case. I cannot say I could be 100% in any test, but I am sure that I can be accurate enough to satisfy any tester that I am not guessing. Even with unknown equipment, I can identify differences accurately, in a short time span. I think many people can. I also think that some people cannot. This would not be the lack of difference in equipment, but a difference in people. It is not scientific to label the results as "no difference" in equipment, when it is an inability on the part of the listener to percieve the difference. Many people think a car radio sounds like a good audio system. Clearly they are not listening to the entire presentation, but just the superficial aspects of the sound. They are not aware of how to listen for "differences" in the sounds of items. They are simply "superficial listeners". This type of listener cannot be relied upon to discern differences in equipment. They listen as "background music". A person who knows what to listen for, will easily tell the differences in components.

And again, I will state that I think that this whole issue is a "red herring", that is brought up by those who have convinced themselves that there is no difference in equipment, or cables, or whatever, in order to satisfy their own minds that they do not have to spend money on such things, and still have the best. In my opinion, the desire to save money is stronger, even in the most ardent audiophile, than the desire to spend a lot of money fruitlessly. I would say that I, and everyone else here, would rather spend less, to get the same performance, if we could do it. And there are ways to do it. The most expensive item is not always the best sounding item. But sometimes it is. And making claims that "in a real scientific environment we couldn't tell the difference" is simply a diversion.

In virtually every case where this is mentioned, it is in a context of "wasting money on snake oil". This is the crux of the matter. It is a matter of justifying expenditures.

So now, we have a whole different scenario. Now we have a subject brought up, which is the real center of the matter, which is,"Is it worth it to me to spend alot of money to get a certain level of performance increase, when I am not sure of the outcome?"

There's such a plethora of cables and components, and claims of grandeur, that some people cannot cope with it, and punt. Instead they divert their attentions to claiming that there is no difference between these things, and stick to it. They use a "scientific" argument that they know nobody is going to use, to back-up their idea.

In some cases, they will be right, and there will be no noticeable differences in some items. This only adds to the confusion, because it lends credence to the extrapolation that there is no difference in anything.

If these folks want to believe that, then that is their prerogative, and they are entitled to believe that. But to tell the rest of us that we are "deluded" by our unquenchable desires to spend money, that we would manufacture these differences in our heads, just so we can spend more money, is not passing the "smell" test.

I'll use myself as an example. I "claim" that I can hear differences. But I don't want to spend any more than I have to, in order to get the sound quality I want. I can honestly say that I may hear very small, or even no, sonic difference between certain items that have significantly different prices. When I arrive at a situation like that, I call the lower priced item a "bargain", and I buy that one. Or I may say,"I like that one better, but the small difference is not worth the extra money to me." Isn't that a more "measured" approach than what we are seeing here? That is what everyone else does. When my girlfriend goes shopping for a dress, and she sees one she likes for alot less than a similar one, she remarks on what a "good deal" she got. She doesn't come home proclaiming that "there is no difference between dresses" so she bought the cheapest one in the store. After all, they all perform the same function of a covering, right?

All of this is much ado about nothing. People will buy what they want. Then they will justify it to themselves, or others. That is life.
Well said Bomarc. I'll go a little further along that road. Most people who argue in favor of DBT's in these threads imply that audiophiles don't want to discover that they can't tell one thing from another. I think audiophiles believe that DBT's are not good because that's what they've been told by the buff mag writers (aka "reviewers," but really just story tellers), who don't want to be exposed. You see, Sean, in a DBT, Mikey can't tell those preamps apart either.

There are reviewers with real scientific credentials and experience who advocate and use DBT's, but not many.

They don't have to be short. You can listen to whole pieces of music. And you can start by familiarizing yourself with the components under test until you are sure of what it is that makes them sound different, then try to tell them apart blind, not trying to prove there is no difference, but to confirm your hypothesis that they are different. In home, one obvious difficulty is exact matching of spls.
We've covered this before. As such, all i'll add to this thread is that Craig and Albert cracked me up. If DBT threads are going to be good for laughs like that, bring them on : ) Sean
>
The use of blind and double-blind procedures presumes one is employing the logic of hypothesis testing. That is, that there is a null hypothesis (i.e., that there are no differences between two treatments—in this case, two sets of interconnects) and an alternate hypothesis (there is indeed a difference). Experimenters are more than experimental custodians. Their biases and expectations can profoundly influence a study. To the extent that all people (including experimenters) have biases, one would double-blind the treatments to reduce among other things "experimenter effects." It’s surprisingly easy for an experimenter to influence a study (e.g., Stanley Milgram’s famous obedience studies). It is also easy for other participants (formerly known as “subjects”) to influence each other (e.g., Ash’s line judgment experiments where participants tended to agree with Ash’s confederates that clearly dissimilar lines were the same).

There is a famous researcher/psychologist/statistician by the name of Robert Rosenthal who once told his students that he had obtained two breeds of rats from another famous researcher. One type of rat was called “maze smart” and the other was “maze dull.” Dr. Rosenthal asked the students to teach these rats to run though mazes (ah, the power of cheese). After a few weeks or so the students were asked to show off their rats’ maze prowess (as it were). The “maze smart” rats performed significantly better than their “dumb” counterparts. The kicker here is that the rats were OF THE SAME SPECIES. One cannot infer that the students intentionally influenced the training, but it most certainly was measurable. Moreover, when the experimenter bias was measured it turned out that the “smart" rats owners had "imparted" a greater positive measurement bias than did the “dumb” rats owners negative measurement bias.

There are probably much better examples than these, but I’m in a hurry to go downstairs for dinner :-) so I’ll wrap this up soon.

Something else to consider is that “different” does not mean “better.” People’s ability to remember sounds and colors varies greatly but rarely is the memory accurate after a short decay period. With audio equipment evaluation, it tends to result in a bias for a certain “sound” regardless of whether or not that sound is authentic. When it comes to making a decision as to whether one component is better than another, it probably makes the most sense to have a reference. In the case of audio, I’d say that reference should be THE REAL THING. It’s not practical to have live orchestra tag along on equipment tests but it doesn’t hurt to keep that in mind. Some people go on and on about how they prefer one cable to another because their favorite is “warm” or whatever. Real sounds from an orchestra or a band are not necessarily “warm.”

All that said, if one believes that a $6,000 set of interconnects sounds better (they just might sound *different*) than a $70 pair then let ‘em. The more expensive cable might even sound closer to reality. One would hope that the more expensive cables aren’t just mostly cosmetics and markup.

--Paul

p.s. and yes, spending time with a set of cables or anything else in the system is a great way to know if one really likes the sound. On a marginally related note, a friend of mine once said “I’ve never owed a handheld device that I liked after having it for a week.”
Well, you don't need to use the blind test to make your decision. The idea is to isolate, at least for the period of the test, the contribution of knowing who the manufacturer is, what the product costs, what it looks like, and so forth. Getting those variables out of the way at some point during your evaluation, even briefly, might be helpful, don't you think? Doesn't mean you won't choose the higher priced product in the end, but at least you will have the benefit of some calibration between what you hear and what you perhaps expect or hope to hear.
Tobias and Redkiwi hit it on the head. It takes prolonged exposure (post burn-in, which a lot of A/B's may not do) to really absorb the differences. "Night and day differences" aside, most A/B's are a matter of trade-offs -- e.g., a little more bass punch vs. a little less definition. What seems very nice in the short term may not be preferred longer term. What seems like a strength listening to one recording may not seem so while listening to another. In reality, all of our choices are A/B. We are replacing what we have with something else we believe is "better" after prolonged listening. That's why companies offer 30 day trials rather than 30 minute trials. From my understanding, most A/B tests tend to be 3 minute trials. Sorry, I'm just not that good.

By the way, none of this is to say that I don't harbor a secret fear that I'd pick Bose ;-) But, I have heard too many say that, with prolonged exposure, they can pick out component signatures from a distance without seeing the actual unit/cable being used.