My Analogy Must Be Wrong

Here's a thought experiment.

Flip a quarter 20 times. It comes up heads 15 times. Hmmm. I was expecting heads 10 times.

Flip the same quarter 100 times. It comes up heads 65 times. Well, that's closer to what I expected.

Now flip the same quarter 500 times. It comes up heads 275 times. Well, closer still.

The lesson is that more data points are good. They allow trends to be seen and increase our confidence level.

Here's two situations to consider.

Scenario 1a: Have 1 person judge the goodness of 10 amps.
Scenario 1b: Have 10 people judge the goodness of 10 amps.
Scenario 1c: Have 100 people judge the goodness of 10 amps.

Of the 3 scenarios, which one would you place the most confidence in? Here's another situation to consider.

Scenario 2a: Have 1 person judge the goodness of 100 amps.
Scenario 2b: Have 10 people judge the goodness of 100 amps.
Scenario 2c: Have 100 people judge the goodness of 100 amps.

Of the 3 scenarios, which one would you place the most confidence in?

Based on the lesson with a quarter, Scenario 1c should warrant more confidence than 1a and 1b and Scenario 2c should warrant more confidence that 2a and 2b.

Should Scenario 2c warrant more confidence than 1c? Since they have the same number of data points - 100 people in both cases - I don't see how.

Well, what about person A having listened to 20 amps? How much confidence would you place in that person's judgement? It's a single data point so it's better than no data, but it effectively has no value what so ever. It's the equivalent of flipping a quarter one time, it coming up heads and concluding its a two headed coin.

Well, what about person B having listened to 200 amps? How much confidence would you place in that person's judgement? Again it's a single data point so it's better than no data, but it effectively has no value what so ever.

Does the judgement of person B with 10 times the listening experience warrant any more confidence from you than person A?
I consulted my magic 8 ball and it told me "reply hazy, try again"
I don't pay much attention to testimonials. I try to give technical data to support any of my contentions because what one persons likes can understandably be totally different from another...i.e I try to remove the coin toss factor by giving valid reasons or justifications for statements I make...this often includes links to testimonials by well known and highly respected "experts" or it includes a link to test measurements.

As for self-proclaimed "experts" who just make blanket statements A is better than B and if you can't hear the difference you are either a moron or have such a crap system you can't tell. To me these kind of condescending statements provide very little value.
The moral of the story here is:

Trust your own ears...and you wont have to worry about the coin flip or test data points!
Someone one said: "A smart man knows what he knows, but a wise man, knows what he knows, and knows, what he doesn't know"

Now I'm not trying to sound like a "wise guy"....but I don't know?

Good question though.

The analogy is not appropriate. Results of coin flips are completely random whereas results of amp auditions are completely subjective with the outcomes dependent on a number of variables such as listener biases/skills. I agree with trusting your own ears.
I have wrestled with this concept for many years, on and off, and it assumes that all of the data points are of equal value. Some of the data will be more insightfull and or reliable than the rest and therefore skew ones assumptions accordingly. In other words, sometimes one persons evaluation of a piece (s) of gear will have more value then anothers. Life is a bell curve event essentially. A few will be misleading, a few will be extremely valuable and most will get you more confused! Even so, I firmly believe that there are certain fudamental characteristics to well re-produced sound that should be important for all so called "Audiophiles" i.e...accurate tone compared to live acoustic instruments, frequency extension covering said instruments, soundstaging and clarity (transparency/focus), dynamics (micro and macro), and lack of compression/hardening of the soundfield. Most people can recognise live music when they hear it...the closer a Hi Fi system can come to capturing that efemeral quality, the better it is!
Like Ray said, "How does it SOUND, man?"
Trust your own ears, not others.
No...your analogy is correct. The "law of large numbers" means that there is little likelihood of a systematic and inaccurate distortion.

Even if the listeners used are tin ear junkophiles.

For example, if a ruler factory puts out defective rulers -- some too long, some too short, a single measurement of two objects using two rulers might yield someone to conclude that a tennis ball is 100 miles high, and that a T-rex is only 2 inches tall. However, it is very unlikely that 100 people using 100 different defective rulers will always measure the tennis ball with an "it's big" rulers and the T-rex with the "it's small" rulers. More likely, there will be some mixing and matching, and even with radically defective rulers, it's likely that there will be, at worst, an inconclusive result. analogize to audio, if 100 people listen to 100 amps, and they all think that the Cary MB500 is awesome and the Rowland 201 is a little anemic given its ostensible power rating (like I think...) there might be something to that finding. If 2 people listen to 100 amps, and both agree that Cary500>Rowland201, that preference may be eclectic. However, if 100 people listen to two amps and conclude that Cary500>Rowland201, there's probably something to that preference. If, however, half prefer one and half prefer the other, you also have excellent data -- that the choice of amp is a personal preference.

So there you go. For more on this , check out the books "Innumeracy" and "Calculated Risks: How to know when numbers deceive you."

But you should still listen to gear before buying it, if you can.

there is a kind of reality-based analogy to your scenario: the test of time. look at the prices old mac gear gets, for example - this is the result of 1000's of people 'voting'. unfortunately this method takes.... time.
I wouldn't place confidence in any of the scenarios. We buy based on what sounds best to our ears - and it may or may not be what the consensus proclaims is the "best". Those hundred-people test points have different preferences and may listen at different volumes, who knows...

But to play the mathematics game, I would chose 1c provided that each of the 100 listens to the 10 amps in a different or random order. Or else everyone will say the last one or two sound the best. Law of Recency, and all that.
First, 100 people listening to 10 amps choosing 1 best amp means you have 100 choices (top rank points) out of 1000 samples. 100 people listening to 100 amps still gives you 100 top-rank points with 10,000 samples behind it. Sonically, the result may be the same, mathematically, it is not.

I am not sure the "analogy" works in any case because people bring so many other biases to the table (as is evident from the responses so far).

If you start with 2 amps, and ask 1 person, then 10 people, then 100,000 people which one is better, which answer will you "trust"? The answer is going to depend on whether you put your faith in crowds or not. If you trust a large-scale average subjective opinion more than one person's, the answer is clear. Next, extend the amp sample to 3 amps, then 6 amps, and assume there is no "Law of Recency" effect. If it is top amp of 2 or top amp of 6, or top amp of 73, do you trust that "topness" more with one set or the other?

The perfect rationalist would say large experimenter sample + large choice sample = best result. The rest of us are probably too biased for it to make much of a difference.
Oh,sorry. I thought this was going to be about Oversampling CD Players...
Is this a trick question? The answer seems to me to clearly depend on whether the observers observations get better or worse or stay the same as a result of experiencing more amps. If the observer learns nothing then the answer to your question is obvious. If the observer becomes a better observer as a result of observing more amps then the answer is obvious. And if the observer gets confused and tired as a result of observing more amps, then again the answer is obvious. But I could have missed something thru having read too many posts tonight.
Flipping quarters and listening to amplifiers.This is my kind of world.Sure beats chopping wood,carrying water and planting beans.I wonder what they talked about in the evenings.....
Forgot the obvious one piece of gear can be evaluated without hooking it up to other gear under variable conditions. This is not unlike quantum physiscs; once you observe the system, the system is altered. The best we can do is use all our resources, match them to our needs and then experience the choices under as many conditions as possible. Looking back, it was far more fun being an impulsive neophyte who chose by budget and gut and then was blissfully happy!!
I like the comparison of High Fidelity audio to Quantum Phsycis. Makes me feel smart!

The test can be done by replacing pennies with people, preferably polititians - since they are two faced and flip. But the results will be suspect.

I would never trust 10 or even 100 people "off the street" to determine what good sound is. In my experience large groups do not have the ability to quickly and accurately determine what is salient. In college early on I attended a group study session precisely one time. I found that the group felt the need to cover all the information without regard to whether that information was actually important. I could consistently through school know what a prof. would or would not consider relevant information. I could say, "This is important; he'll want me to know that," or "This is superfluous and not worth memorizing..." The group had no ability to do so as there were always people who didn't get it, couldn't see what was critical information and what wasn't. So, they tried to consider it all as critical and thereby wasted incredible amounts of time.

Now, imagine a group trying to assess sound the same way, with no clue what is good and what is poor. You'd have people who are tone deaf, people who have never owned a stereo in their life, people who live on iPod sound, people who are blowing their ears out with car audio bass, people who think 8 watts and a six inch full range driver is good sound (HA! Just kidding, all you SET fans!)... And that's the group that's supposed to dictate what sounds good? NO WAY!
Remember, there is Mean Reversion in statistical analysis. So, according to your testing methodology what would be revealed is the most average sounding amp - what appeals to the masses, not necessarily the best sounding amp according to an individual's preferences.

As has also been pointed out, so what if an amp sounds good to 100 people with specific gear. Change the associated gear and the entire test is worthless. So the cross application of the information is worth about nil. Further, because setting up the test the persons conducting it may have followed James Randi and used stock power cords the ENTIRE ENDEAVOR IS FRAUGHT WITH ERROR! This is because anyone who knows anything about audio knows that aftermarket (especially really expensive ones!) power cords are necessary to actually HEAR what an amp does! All attempts to assess equipment without proper PC's are fruitless! ;)

When I go to buy a car, I don't ask 100 people what they think of it. I do look at consumer reports for one thing: Reliability. That's not a factor in the hypothesized scenario, but it would be the most logical factor if it were sought.

And we haven't even considered the reality that initial impressions of sound change over time. Why base one's choice of equipment on strangers who have a very brief encounter with gear in an environment not familiar? The ideal is to hear gear in one's own environment.

If I found one person who consistently heard things (NO, not THAT KIND of hearing things, ie. voices!) the way I do, who had experience with a range of equipment (i.e. 20 amps) and who had an intimate knowledge of what I prefer, I think I would trust their judgment over any large group system of evaluation.

Not having found that person, I trust my ears above all other input. I do have some friends who understand my preferred sound, but they do not have experience with enough equipment to make such judgments.

So, the "Mindless Masses Method" is not too appealing to me. Maybe this is why reviewing was invented? (Disclaimer: YES, I'm a reviewer)
Douglas has augmented my views in a complimentary fashion. PC cords can make or break a component/system as can the room you inhabit for listening. What bothers me about most reviewers however, is just how ill concieved and patched together many of their systems tend to be! One reviewer comes to mind that recently got up enough scratch to afford an MIT Magnum interconnect...sheeesh, and he was reviewing high end gear? Some reviewers residences leave quite a bit to be desired as well. The biggest problem for a reviewer seems to be their ability to tell the truth. Most main stream reviewers bend over backwards to accomodate manufacturers design flaws. In other words, a piece of crap is seldom called a piece of is presented as unique and for the few who can appreciate what it does do right. I am a would be reviewer (if I didn't have a full time job taking care of my 3 girls), but I understand that the first review I would submit that did not conform to the unwritten rule of not rockin the boat (not pissin off the advertisers), I would be out of work! What say you Doug?
and to top it off they all have their own agenda! $$$$$$$$
I say there's very little in high end audio that should be called "a piece of crap", especially since usually a fairly fine sounding combination of gear can be made with nearly every piece I have reviewed.

There needs to be taken into account the vastness of tastes among audiophiles as well. I have reviewed and enjoyed hearing some components which I would never own since I do not care for that sound. There are very strong feelings about what sounds "right" ranging from low powered SET amps with extremely high efficiency speakers, to multi-amped monster speakers with arrays of drivers. And all the differences in opinion regarding sources, cabling, etc. So, literally, just because I do not care for a sound does not mean others would not like it. I try to keep from condemning a piece unless it has obvious build quality or operational issues.

I want to build a reputation based on direct, honest description of what I hear of the sonic nature of pieces and their strengths/weaknesses. Virtually no pieces I have seen are "crap". Some are less of a value, but crap?

I have worked hard to ramp up my equipment in a short period of time. It is not difficult for a reviewer to spend tens of thousands of dollars to acquire their reference system, something which is not easy for the majority of people to finance. Consider that when changing components it is not hard to have an imbalance between the perceived absolute quality of them, i.e. a $300 IC and a $6k cdp. It takes years to find the absolute best, and be able to afford it!

This also is supposed to be an "average guy" hobby, and if the reviewers are elitist and have gear priced stratospherically it reinforces the stereotype of the "out of touch" audiophile as well as the idea that great gear can't be found under X dollars.

Just some thoughts.
Average audio entusists have limited exposure to top audio. I wouldn't value their opinion too much. In addition - if you pick anything that everybody likes - movie, art etc. - it probably won't be very interesting. The fact that 100 people liked something doesn't mean anything. I prefare experts' opinion before I listen myself.

Don't bring statistics to Audio - it was already proven that tatoos are major cause of motorcycle accidents.
Douglas, by "Crap" I mean an overpriced piece of gear that proclaims superiority over the mass marketed gear, but does not deliver on the basic set of parameters all gear should attempt to address. Albeit, most gear may measure up well enough to not be included in the crap category, when a stinker hits the reviewer, or the price of admission is above and beyond the norm vs it's capabilities...well, one should be man enough to throw up the window and say, "I'm mad as hell and I'm not gonna recommend you buy this gear!" Apologizing for a manufacturer does no one any good. By the way, someone reviewing gear should not be discovering or piecing together equipment for the first time. Time, gear and experience come first. I'd rather hear what someone with a well matched reference system has to say then the trivial anecdotal ramblings of an overly excited, unseasoned would be audiophile.
Dave, you haven't seen this kind of article very often?
Kudos to you Douglas. My audiophile experience has taught me first hand how to read between the lines in most reviews. There are a handfull of reviewers around, mostly on the web, whom I respect and are consistent in their evaluations. Keep up the good work!
Mr. Schroeder made some excellent points. He also made what I have found to be an error. I wouldn't depend on Consumer Reports for anything. Their method of reviewing puts far too much emphasis on the cost of items. I have read reviews of vacuums and lawnmowers as examples where they were so far off the mark it wasn't funny. In both cases I think they chose sears products. I went with my lawnboy and am still using the 21 year old over priced lawnmower (it still starts 1st pull).

Age and experience has taught me much in regards to audio. 1) One mans holy grail won't be everyones, 2) I don't care what anybody thinks about the way audio equipment sounds (except me), and finally 3) I will probably change my mind on the matter as my human components wear out some more. Have a great one week end guys.
In response to the obvious "trust your own ears"...

Yes, of course, in an ideal world listen to every product being manufactured with every possible permutation of associated component. Most don't have the time or money.

In lieu of the brute force approach, we typically use some sort of filtering to narrow the search and that's what this thread is about - the relative merit of a large number of reviews versus a single review regardless of the experience of the reviewers.

The flipping quarters exerdise was simply a reminder.
How does a reviewer's experience provide you with any more confidence that what they consider good will also be considered good by you?
Bob, if you like the same equipment as a given reviewer, over time you would have more confidence that they can recommend "good" components for you. Much better chance of finding a winner than by random selection. Today it is relatively easy to find such reviewers. Take any piece of gear that you have heard, do a search and see what reviewer(s) say. If they are on target, that's one point for them. If they are not, in your opinion, that's one point subtracted from them.

Need three points to score goal!