Dec 98 issue of Sound Off

navigation bar

Volume VIII Number 1
June 2000

The World's first article explaining Blind-Testing and the associated mathematics. No, it's not perfect but no one else has attempted it. In other words a simplified explanation of what's involved with blind testing including the mathematics.

Imagine you are contemplating purchasing one of two amplifiers. Both have equal output power capabilities but one costs $500 and the other costs $5,000. The $5,000 amplifier has received rave reviews by the leading audiophile publication in which said $5,000 amplifier manufacturer advertises on a regular full page spread basis. The $500 amplifier manufacturer advertises in the same magazine but only occasionally with a modest 1/8 page spread in the bowels of the magazine, words only, no pictures. You want to see (hear!) if there's an audible difference between the sound reproduction capability of the two amplifiers. Or, to put it in a more familiar way, do the amplifiers sound different. To answer this question, and possibly save yourself $4,500, you proceed as follows.

First, you make sure that the two amplifiers, which we will call T and H, are set to the same level (volume) and the channels of each amplifier are balanced (the right and left channels of each amplifier are producing equal output). You listen to both amplifiers-using your favorite CD's-and swear that there's a difference in sound between the two amplifiers. Now for the tough part. You're going to determine which amplifier is playing without peeping.

Your decision as to which amplifier is playing will be based on the sound and the sound only. Comparing two amplifiers without some sort of check might result in a preference for one over the other, a preference which has nothing to do with sound quality. After all, if you make a comparison knowing which amplifier is playing you just might be swayed by looks, brand, price, weight, circuitry (tube or solid state), advertising, a rave review, and other assorted things that have nothing to do with sound quality.

Without you knowing which is which, one of the amplifiers is chosen (randomly) and you have to make a decision as to which amplifier is playing. You write down your decision. This process is repeated a minimum of ten times. At any time between decision making choices you are allowed to casually compare the sound of the amplifiers to get a better reading on what the sound differences are, but at decision time the determination of which amplifier is playing is withheld from you. As you note your choice of T or A on each try there is also a list being kept of what amplifier was actually playing for each of the trys. You do a minimum of ten choices or trys and then compare your list of choices to the list of which amplifier was actually playing. Most reasonable people would accept a conclusion that if there is a difference between the two amplifiers then one should be able to hear this difference, i.e., your correct choices will outnumber your incorrect choices. It's at this point where statistical analysis mathematics determines (the correct word is infers) how many corrects are necessary to indicate if a difference in sound between the two amplifiers is actually audible. Now on to the math.

Imagine you have a coin which is perfectly balanced, has the letter H on one side and the letter T on the other side. It doesn't take an Ed Witten to figure out the following: If you flip the coin and let it land on the floor the result will be the H or T side of the coin showing. We are absolutely sure the coin will land with the H or T side showing. In the world of statistics something that positively, absolutely will occur-like our coin showing H or T after a flip-is assigned a numerical value of 1. In the jargon of the statistical world we say the probability of a coin toss resulting in an H or T showing is equal to 1.

A particular interesting but unstated probability of 1 for every living human being (past, present, and future) has resulted in religion! To wit: The probability of yours truly (and everyone else on earth!) dying has a value of 1; it's a certainty which I don't eagerly anticipate, but whether I like it or not the probability remains a 1. This probability is so intuitively understood and so terrifying to members of the human race that something called religion was invented to cushion the impact of dying being a certainty. Religion in turn gave us heaven to further cushion the impact of dying being a certainty. As you can see, an intuitive form of statistical analysis has been around since day one, although, unlike religion, not formalized until fairly recently! But I digress...back to the real world.

Another question about our coin flipping is what is the probability of flipping the coin and having an H showing when the coin comes to rest? Again, common sense and real world experience indicate that the coin landing with an H showing has an equal chance of occurring as the coin landing with a T showing. If that's the case then the probability of the coin landing with an H showing is one half, or 0.5. And the probability of the coin landing with a T showing is also one half, or 0.5. The probability of the coin landing with an H or T showing is 1, an absolute certainty. The probability of the coin landing with an H and T showing is nonexistent, or 0 (zero). Note carefully the meaning of "or" and "and" in the last two statements. Now for a quantum leap in relating a coin toss to the double-blind testing of two audio components.

Let's suppose you are listening to amplifiers H and T (one at a time, of course) playing at the same level (volume) and you're able to switch back and forth between the amplifiers as often as you wish. You swear there's a distinct difference between the sound of the two amplifiers. A friend wants to see if you are really hearing a difference or if you only think there's a difference. He mutes the sound, and without you seeing selects amplifier T, and restores the sound. Why did he select amplifier T? Simple. He flipped a coin and picked the amplifier according to how the coin landed! In this case he picked amplifier T because the coin landed with a T showing. Now it's your turn to make a determination as to which amplifier is playing!

If there is a perceptible difference between the two amplifiers you will correctly state amplifier T is playing. If you can't hear a difference (or, very importantly, think you can hear a difference but actually can't) you will be guessing (even though you may think otherwise) and pick amplifier H as playing. But if you're guessing you could just as easily pick amplifier T as playing! There's a 50-50 chance that you will pick the correct amplifier simply by guessing! The result of one attempt at determining which amplifier is playing really doesn't tell us anything.

So your friend flips the coin a second time and the T side comes up again. He again mutes the sound, selects amplifier T without you seeing, and restores the sound. Now you have to make a second determination as to which amplifier is playing. If there's a perceptible difference you'll correctly indicate amplifier T is playing. If there's not a perceptible difference (even though you may think otherwise) you will be guessing (even though you may think otherwise). The probability of getting an H or T by guessing is again 50-50, but here's the important point: You now have made two attempts at determining which amplifier is playing. The probability of getting a correct guess on each individual attempt is 0.5. However, the probability of getting a correct on the first attempt and a correct on the second attempt is now reduced to 0.25! This 0.25 probability is a result of the fact that when a coin is flipped two times there are four possible outcomes (HH, HT, TH, and TT) and the probability of each outcome is 1 in four, or Ľ=0.25. Those are still pretty good odds. i.e., you could have guessed and had a probability of 25% of being correct for two attempts at determining which amplifier is playing. If you don't think a 25% chance is high then you're the one I want to set my lottery odds!

Let's do the exercise a third time of selecting and determining which amplifier is playing, only this time the coin toss results in an H showing, so amplifier H is selected. If you hear a difference you select H. If you don't actually hear a difference you are in effect guessing and you can guess H or T with a probability of 50% of being correct. But(!) the probability of you guessing correctly a T on the first try, a T on the second try, and an H on the third try is only 12.5%, or 0.125. The 0.125 is arrived at because three coin tosses result in 8 possible outcomes (HHH, HHT, HTH, HTT, THH, THT, TTH, and TTT), and only one of these possibilities (TTH) is correct, or 1 chance in 8, still reasonably good odds. With the Maryland lottery, as an example, you have to correctly guess 6 of 49 numbers; the chance of doing this is only 1 in 13,983,816! Numerically this is equal to 0.000000072, or 0.0000072%, which makes 12.5% look like a sure thing by comparison.

As you increase the number of attempts (referred to as trials in the statistical world) at determining which amplifier is playing the odds of you guessing a high number of corrects becomes smaller and smaller. e.g., with 10 attempts the probability of guessing correctly all 10 trials is a rather small. It's 0.00098, or 0.098%, or 1 chance in 1,024. Getting 10 out of 10 trials correct would indicate one of two possibilities. One, there was a perceptible difference between the two amplifiers, you heard this difference, and therefore you weren't guessing. Or...you guessed but by chance alone guessed correctly 10 out of 10 times, a highly unlikely scenario!

Realizing that requiring a 100% score (all corrects) to confirm a difference in sound between the amplifiers is a bit stringent, statisticians, through a rather elaborate process (both theoretical and empirical) have established that if someone gets a number of corrects, which by chance alone would have a probability of 5% or less of occurring, then it can be assumed that the individual is very likely hearing a difference. e.g., getting 10 out of 10 (100%) trails correct has a probability of occurring by chance alone of 0.00098, or 0.098%, meeting the 5% or less criteria. Getting at least 9 out of 10 (90%) trials correct by chance alone has a probability of 0.01075, or 1.08%, also meeting the 5% criteria. However, getting at least 8 out of 10 corrects (80%) by chance alone has a probability of 0.05469, or 5.47%, exceeding the less than 5% criteria. This 5% figure is referred to as the "significance level."

If we move the number of trials to 16 then the number of corrects needed to meet the "less than or equal to" 5% criteria is at least 12. Or to put it another way the probability of getting at least 12 corrects by chance alone is equal to 0.03840, or 3.8%. The 3.8% meets the criteria of 5% or less for our test. So if an individual gets at least 12 corrects out of 16 attempts, while comparing our two amplifiers, the results would be statistically significant, i.e., there's a 95% probability that he heard a difference and no more than a 5% probability that the results were obtained by chance alone.

As I mentioned in the title of this article, it is a common sense, everyday language presentation in explaining blind testing and the associated math. There is more to blind testing, much more. But underling all of this is a simple premise: If someone makes a claim that component T sounds better than component H then they should be prepared to prove it.

Web hosting for Sound Off is sponsored by Digital Recordings
-- provider of innovative products in audio and acoustics.