Introduction - Flip a Coin
Toss a coin repeatedly.
It produces heads and tails in a binomial distribution,
as I'll describe below. One readable description is
Bernoulli Experiments, Binomial Distribution
by David Galvin of Notre Dame,
The binomial distribution describes any set of yes-no trials,
whether yes-no, heads-tails, pass-fail, valid-invalid,
make-or-miss a field goal, win-or-lose a game.
The traditional options are heads and tails;
in n trials the number of heads is h.
The number of tails is n−h.
Coins are usually thought of as unbiased with heads and tails equally likely.
In practice, most real coins are biased slightly due to
the weight distribution of their designs.
If you are flipping a US state quarter a thousand times,
you may want to chose the state carefully, each quarter has a unique bias.
In other situations like kicking field goals or playing a game,
one outcome may be more probable than the other.
In the binomial distribution, we denote the underlying bias by r;
ranging from 0.0 to 1.0.
At r = 0.0, the coin is completely biased;
every flip shows tails.
With bias r=1.0 every flip shows heads.
Our concern in this note is to determine if the coin (game, or whatever)
is biased.
What is the probability that the coin's bias r is near 0.5?
Formula for the Binomial Term
The probable outcome of two events is the product of the probability of each.
If we drill two oil well each with probability 0.2 of success,
then the probability of both succeeding is 0.2 * 0.2 = 0.04.
Conversely, the probability of failure is 1−prob-of-sucess
= 1−0.2 = 0.8. The probability of one gusher and one dry hole is
0.2 * (1−0.2) = 0.2 * 0.8 = 0.16.
And the probability of two failures is (1 − 0.2)2 = 0.64.
With success and failure as the only possibilites,
the sum of the probabilities must add to 1.0:
0.04 + 0.16 + 0.16 + 0.64.
So why add 0.16 twice? Because one success and one failure can happen
in two different orders.
For n yes-no trials with bias r,
and exactly h successes, the probability is the product
of the probability of success times the probability of failure:
rh
(1−r)(n−h)
The h successes may occur in any order among the n trials,
so in adding up the total probability of all possibilities,
we need to multiply the above term by
the number of combinations of n items taken q at a time.
This quantity is denoted by nCr(n, h), so for any specific h
the probability is
BinTerm(n, h) = nCr(n, h)
rh
(1−r)(n−h)
Note that for any given h and n
BinTerm is still a function of r, the bias.
One Hump
Suppose we toss a coin 10 times and heads arise 8 times. Is the coin biased?
We have n=10 and h=8.
We plot BinTerm(10,8) over all values of r from 0 to 1:
The graph shows for each bias r the probability that r
is the actual bias.
The graph is a hump with a maximum at r=0.8 = h/n.
(To see that hump peak must be at h/n,
set the derivative of the BinTerm(n,h) to 0 and solve for r.)
n Humps
If we add up the probabilities for all values of r we get only 0.091,
one eleventh of 1.0. Probabilities must add to 1.0.
The other 10 elevenths are in another ten humps, one for each h
This montage shows all eleven humps from plotting BinTerm(10, h)
where bias r ranges from 0 to 1.
(The peaks of the end humps extend off the charts, approaching 1.0.)
The sum of probabilities under each hump is 1/n.
For any given n, the probabilities of all outcomes
must sum to 1.0. This is then n*(1/n) = 1.0.
Probability of a Range
What we really want is the probability that the
bias is in a range near a point. I find it anti-intuitive, but the
total probability in a range is given by the area under the curve in that range.
This does make sense if we consider that the value at point r is
a zero-width rectangle. That one can add zero-width rectangles
is one of the findings of integral calculus.
Two ranges of interest for BinTerm(10, 8) are illustrated here:
For the statistics below I consider that the range around r is 0.1 wide,
extending from r−0.05 to r+0.05.
To prepare this note, I measured the area under the curve with slices.
For a minute let's denote by bt(r) the value of BinTerm(n, h)
at r. Shave the area under the curve into a thousand slices
of width .001. The slice at q has a left side of bt(q)
and a right side of bt(q+.001). The function bt is well-behaved,
so these values are close. In adding the areas I just added a rectangle
of width .001 and height in the middle of the slice: bt(q+.0005).
I used this scheme to also measure the total area under each hump.
Each had area 1/n, for a total area under all n humps of 1.0.
Is Even-Bias a Possible Explanation?
Most often, statisticians deploy the binomial distribution
to design protocols for testing manufactured parts.
They solve for n to answers to questions like,
"How many parts must be tested
to ensure that 99.5 percent meet specifications?"
Here we are asking a different question. We have conducted n trials
with h successes. Is the test biased? Could the results have been
produced by an unbiased process?
Since we know both n and h, we need only examine
BinTerm(n, h).
We noted above that the most likely bias r is h/n.
But is it possible that instead
the actual test is unbiased and really has r=0.5.
The usual jargon is that we want to reject the null hypothesis,
where that hypothesis is that r=0.5. In social science
rejection is claimed when the likelihood of the null hypothesis is less than 5%.
Physicists require
five sigma,
which maps to 0.00003%.
Physical probabilities are lots easier to measure than those in social science
(but more expensive if the tool is the LHC at Cern!)
 |
|
 |
Social Science |
|
Physics |
So. We ask whether the probability
around r=0.5 under the
BinTerm(10,8)
curve above
is more or less than 5% of that humps total area.
A little Java code.
shows that the area around 0.5 is 5%,
statistically insignificant.
Meanwhile the area around 0.8 is 33%; much more likely.
It is hard to believe that a process
yielding BinTerm(10,8) is unbiased.
The same approach can be used to evaluate
BinTerm
for any n and h.
My thanks to Bill Beggs for help and advice.
Data items above were generated with a small Java program.
The graphs were drawn with
Desmos calculator
with the expression
where n and h were varied for each graph.
(The Imagemagick commands to extract the graphs and create the montage
are embedded in the source code of this page,)
|