Amir and Blind Testing


Let me start by saying I like watching Amir from ASR, so please let’s not get harsh or the thread will be deleted. Many times, Amir has noted that when we’re inserting a new component in our system, our brains go into (to paraphrase) “analytical mode” and we start hearing imaginary improvements. He has reiterated this many times, saying that when he switched to an expensive cable he heard improvements, but when he switched back to the cheap one, he also heard improvements because the brain switches from “music enjoyment mode” to “analytical mode.” Following this logic, which I agree with, wouldn’t blind testing, or any A/B testing be compromised because our brains are always in analytical mode and therefore feeding us inaccurate data? Seems to me you need to relax for a few hours at least and listen to a variety of music before your brain can accurately assess whether something is an actual improvement.  Perhaps A/B testing is a strawman argument, because the human brain is not a spectrum analyzer.  We are too affected by our biases to come up with any valid data.  Maybe. 

chayro

I read somewhere Amir is a famous Egyptian movie star who grows prize winning watermelons in his spare time. It's a confirmed fact.

Unless one does an actual blind test, every, and I mean every change will only result in a subjective change and won't stand up to scrutiny.

I read somewhere Amir is a famous Egyptian movie star who grows prize winning watermelons in his spare time. It's a confirmed fact.

This anecdote has merit as a credible and verifiable snippet of gossip (for the erudite, technically known as hearsay).

I remember reading somewhere (and now I forget where) that A/B testing relies on our short term memory, which isn't the best method.

Not sure if that's true or not, but I felt it was an interesting point. But that would at least explain why a lot of people like to take their time before making decisions on whether something sounds better/different/worse.

Amir and others on this thread are absolutely right: A/B comparisons are notoriously flawed by expectation bias; that's just how our brains work. In my profession (drug discovery) we therefore use "double-blind" evaluations, where the experimenter (e.g. the audio dealer) and the patient (i.e. the customer) do not know whether they are receiving a new treatment, a standard treatment, or (in non-critical cases) a sugar pill (i.e. placebo). Only such an evaluation would either confirm or put in question Amir's well-intended measurements, in the sense whether or not the data he measures are relevant to human musical enjoyment and thus would indicate - before you buy it - if a particular gear enhances or diminishes such pleasure (which is, I suppose, what this exercise should be all about). The respective measurements would indeed have to track with the "enjoyment score" after listening to a hidden piece of new gear or an old one, while the listener and the dealer would not even know what gear is being listened to. In that sense, Harry Pearson was correct in his criticism of both: lone reliance on measurements and on A/B comparisons. He also knew that a new piece of equipment might sound spectacular at the onset, only to become fatiguing after a few hours or even days, no matter how "good" the measured data were. Psychoacoustics were a budding discipline in his early days, and we are still just beginning to understand how we make esthetic decisions, and what important part THD plays in this puzzle, if any at all.