Puzzled By Statistics

Dr Adequate · Aug 4, 2008

I never did statistics past A Level, which has worked fine so far. However, I now find myself scratching my little head.

Suppose I have a hypothesis that a coin is fair. I toss it a thousand times, and 990 times it comes down heads. I can work out the probabilty of that, and say: now, this would only happen one time in (some large number I can't be bothered to calculate) if the hypothesis is correct, therefore the hypothesis only stands those odds of being correct.

Now, this seems to me to be fair and reasonable.

Only I can also look at any sequence of a thousand tosses, and say: now, this would only happen one time in 2¹⁰⁰⁰ if the hypothesis is correct, therefore the hypothesis only stands those odds of being correct.

Now, the reasoning is the same in both cases, but in the second case it obviously doesn't work. Therefore, something is wrong with the reasoning.

I can see why the reasoning must be untrue in the second case: for the statement "this would only happen one time in 2¹⁰⁰⁰ if the hypothesis is correct" would be true however the coins fell, whether or not the coin was fair, and so cannot be held to contradict the hypothesis.

Hence the reasoning in the first case is incorrect and incomplete. How do I remedy this? Could I see the reasoning and the math?

There must be something somewhere which explains how one's definition of "improbable" relates to the testing of a statistical hypothesis: a formal and well-founded way to tell how significant to a statistical hypothesis an improbable event really is. All events are improbable to some degree, depending on how you do the counting. What is the right way to do the counting, and how is this demonstrated?

Thanks.

Dancing David · Aug 5, 2008

Um, this is the way I would look at it.

one toss: comes up heads, somewhat probable 1/2

ten tosses: all come up heads: somewhat less probable (1/2)¹⁰
one hundred tosses: all come up heads, not very likely (1/2)¹⁰⁰
one thousand tosses: all come up heads, very unlikely (1/2)¹⁰⁰⁰

So while a fair coin has a fairly decent chance of coming up heads ten times in a row .0009765, that says to me that if we threw ten tosses 1000 times, we should get ten heads nine times.(on average)

But the chance of geting 100 heads in a row is much smaller 7.88 x 10^-31, so you would have to have 10 ³¹ trial of one hundred tosses to get about seven that were all hundreds. That is a huge number of trials like:
10,000,000,000,000,000,000,000,000,000,000. (on average)

I think it would be easier to discuss the improbable event and thenw e can discuss how to figure the odds. I like sampling mor ethan just calculation.

Dunstan · Aug 5, 2008

Dr Adequate said:
Suppose I have a hypothesis that a coin is fair. I toss it a thousand times, and 990 times it comes down heads. I can work out the probabilty of that, and say: now, this would only happen one time in (some large number I can't be bothered to calculate) if the hypothesis is correct, therefore the hypothesis only stands those odds of being correct.

Now, this seems to me to be fair and reasonable.

Only I can also look at any sequence of a thousand tosses, and say: now, this would only happen one time in 2¹⁰⁰⁰ if the hypothesis is correct, therefore the hypothesis only stands those odds of being correct.

I think you need to be careful with your terms. In your second case, what is the hypothesis? That the coin is fair (i.e. the same hypothesis as in your first case)?

And when you say "this would only happen" one in 2^1000 times, what is the "this"? Presumably you mean "this precise sequence of Hs and Ts." But that isn't the same as your hypothesis. There are many other sequences that are consistent with your hypothesis.

But anyway, it is a little trickier to speak of the confidence with which a hypothesis is true. I think what you want to do is look up statistical measures of the "power" of a test. At some point I think you may get into the whole frequentist vs. Bayesian debate, which I will leave to others.

leon_heller · Aug 5, 2008

Have a look for the "Runs Test".

Leon

sol invictus · Aug 5, 2008

Dunstan is right - the key is to define your hypothesis carefully. If by "fair" you mean only that the total number of heads follows a 50/50 binomial distribution, you are done - the first result is incredibly unlikely, while the second is quite likely so long as the total is close to 500.

If you want to be anal you can plug that into Bayes' formula - for example compare the fair theory to a theory where the total is 99% weighted towards heads, with equal priors for those two theories, and you'll conclude that (with 990 heads) the weighted theory is much more probable. Or you can take a single continuous parameter describing the degree of weighting and get a confidence interval.

Where it starts to get really sticky is when you try to include order (which is what confused you, I think). For example if you got 500 heads in a row and then 500 tails in a row, clearly something is going on, but the above definition of a "fair" coin will never detect it...

Looking for patterns like that in data is just hard (well, not in that case, but in real data it often is). There are as many patterns as there are possible data sets, and you obviously can't test for them all without getting hugely significant false positives. I don't think there's any rigorous way to answer that deeper question lurking behind what you asked - how we know when a sequence is really random, or instead has some underlying order. In fact I think it's formally undecidable.

ben m · Aug 5, 2008

Dr Adequate said:
All events are improbable to some degree, depending on how you do the counting. What is the right way to do the counting, and how is this demonstrated?

The correct thing to do is to think about what hypotheses you're trying to tell apart. You should choose to "count" whatever it is which differs mot strongly between the various hypotheses.

Hypotheses you could test on a coin flip:
"This coin tends to repeat itself" (as opposed to having independent flips.)
"This coin never gives a run of four heads" (as opposed to having independent flips.)
"This coin's flips tends to spell out ln(2) in binary" (as opposed to being random.)
"This coin is weighted to make heads slightly more likely." (as opposed to giving exactly 50-50 odds.)
"This coin has heads on both sides." (as opposed to one head, one tail.)

The measurement that's best at telling you the odds---i.e. confirming or ruling-out the "weighted" hypothesis---is simply counting heads and tails; the probability that you get from this test is a simple and useful one that relates straighforwardly to the hypothesis. What measurement would you make to rule out the "tends to repeat itself" hypothesis? Counting wouldn't work---you'd have to count conditional results. (Count "same as last flip" and "different than last flip") And so on.

If you have a hypothesis in mind, though, your statistic---the entire list of results---is indeed statistically worthwhile. Just to illustrate: for a ten-flip experiment, the exact sequence HHHTHHHTHH occurs 0.09765% of the time for a fair coin, but 0.0236% of the time for a coin with a 60% bias towards tails, and 0.26873% of the time for a coin with a 60% bias towards heads, and (at worst) 0.67% of the time for a coin with an 80% bias towards heads. Thus, this sequence is 7x more likely for a biased coin than for a fair coin---or, alternatively, it's consistent with the fair-coin hypothesis at the 15% level.

GreedyAlgorithm · Aug 5, 2008

Dr Adequate said:
Only I can also look at any sequence of a thousand tosses, and say: now, this would only happen one time in 2¹⁰⁰⁰ if the hypothesis is correct, therefore the hypothesis only stands those odds of being correct.

This seems to be where you're getting off track. You have said that you see a result (R) which has probability 1/2¹⁰⁰⁰ according to a hypothesis (H):

P(R|H) = 1/2¹⁰⁰⁰
But then you say that the hypothesis has only those odds of being correct:

?? P(H|R) = 1/2¹⁰⁰⁰ ??

That conclusion is not warranted. The actual relationship is given by Bayes' Theorem:

P(H|R) = P(R|H)*P(H)/P(R)

That's hard to compute because it involves all possible hypotheses ("not H" will appear in calculations). In an everyday evolutionary environment it happens that of the hypotheses that are useful to compare quickly to avoid being eaten by a tiger, P(H)/P(R) has order of magnitude much closer than 1 than close to 2¹⁰⁰⁰. Our brains don't intuit numbers that large, and it is true that if P(H)/P(R) is a sane value, then seeing P(R|H) be incredibly low does mean P(H|R) is really low too. But that's only if you need to take a computational shortcut to avoid being eaten by a tiger.

Dr Adequate · Aug 5, 2008

Dancing David said:
Um, this is the way I would look at it.

one toss: comes up heads, somewhat probable 1/2

ten tosses: all come up heads: somewhat less probable (1/2)¹⁰
one hundred tosses: all come up heads, not very likely (1/2)¹⁰⁰
one thousand tosses: all come up heads, very unlikely (1/2)¹⁰⁰⁰

So while a fair coin has a fairly decent chance of coming up heads ten times in a row .0009765, that says to me that if we threw ten tosses 1000 times, we should get ten heads nine times.(on average)

But the chance of geting 100 heads in a row is much smaller 7.88 x 10^-31, so you would have to have 10 ³¹ trial of one hundred tosses to get about seven that were all hundreds. That is a huge number of trials like:
10,000,000,000,000,000,000,000,000,000,000. (on average)

But my chances of getting any sequence of heads and tails is one in 2¹⁰⁰⁰.

You're answering the wrong question.

Why is one of these probabilities important in deciding whether our coin is fair, and the other is not?

Yes, I know that we both understand why. But that's just our common sense, I want to see the reasoning and the math.

jsfisher · Aug 5, 2008

Dr Adequate said:
But my chances of getting any sequence of heads and tails is one in 2^^1000p.

You're answering the wrong question.

Why is one of these probabilities important in deciding whether our coin is fair, and the other is not?

Yes, I know that we both understand why. But that's just our common sense, I want to see the reasoning and the math.

That some sequence occurred tells you nothing about whether the coin is fair or not. Yes, a particular sequence is highly improbable, but it is just as improbable as any other sequence, so it, in and of itself, contributes nothing to the question of fairness.

You need to look at other characteristics of the sequence. The ratio of heads to tails is such a characteristic. With a fair coin, we can expect the ratio to approach 50%. The further from 50%, the more probable the coin is weighted.

Dr Adequate · Aug 5, 2008

GreedyAlgorithm said:
This seems to be where you're getting off track. You have said that you see a result (R) which has probability 1/2¹⁰⁰⁰ according to a hypothesis (H):

P(R|H) = 1/2¹⁰⁰⁰
But then you say that the hypothesis has only those odds of being correct:

?? P(H|R) = 1/2¹⁰⁰⁰ ??

That conclusion is not warranted. The actual relationship is given by Bayes' Theorem:

P(H|R) = P(R|H)*P(H)/P(R)

That's hard to compute because it involves all possible hypotheses ("not H" will appear in calculations). In an everyday evolutionary environment it happens that of the hypotheses that are useful to compare quickly to avoid being eaten by a tiger, P(H)/P(R) has order of magnitude much closer than 1 than close to 2¹⁰⁰⁰. Our brains don't intuit numbers that large, and it is true that if P(H)/P(R) is a sane value, then seeing P(R|H) be incredibly low does mean P(H|R) is really low too. But that's only if you need to take a computational shortcut to avoid being eaten by a tiger.

I stared hard at Bayes' Theorem before making my OP, in the belief that it might be relevant. I'm still not quite seeing it.

Can you work through the questions about the coins and show why 990/1000 heads is strange, whereas a sequence of H and T that's 1 in 2¹⁰⁰⁰ is not strange?

If you could help me that far, I think I'd grasp the whole principle of the thing.

Dr Adequate · Aug 5, 2008

jsfisher said:
That some sequence occurred tells you nothing about whether the coin is fair or not. Yes, a particular sequence is highly improbable, but it is just as improbable as any other sequence, so it, in and of itself, contributes nothing to the question of fairness.

I know. I explained that in my OP.

You need to look at other characteristics of the sequence. The ratio of heads to tails is such a characteristic. With a fair coin, we can expect the ratio to approach 50%. The further from 50%, the more probable the coin is weighted.

I know that. But how do I know which is the right statistic to look at? I just do, it makes sense. But why? Why is that the statistic I need to look at? How do I know except for common sense?

sol invictus · Aug 5, 2008

Dr Adequate said:
Can you work through the questions about the coins and show why 990/1000 heads is strange, whereas a sequence of H and T that's 1 in 2¹⁰⁰⁰ is not strange?

Sure.

Compare two theories: theory A, that the coin is fair, meaning what I said above, and B, that the coin is weighted 99% towards heads. Let's assign equal priors (1/2) to the two theories.

Consider first the data of 990 heads, call that D1. P(A|D1)/P(B|D1) = P(A)P(D1|A)/(P(B) P(D1|B)) = P(D1|A)/P(D1|B) = some very small number I'm too lazy to compute.

Now consider a "normal" random sequence D2 with roughly as many heads as tails, and that ratio will be some very large number I'm too lazy to compute.

Dunstan · Aug 5, 2008

Dr Adequate said:
Can you work through the questions about the coins and show why 990/1000 heads is strange, whereas a sequence of H and T that's 1 in 2¹⁰⁰⁰ is not strange?

If you could help me that far, I think I'd grasp the whole principle of the thing.

Well, in one sense, they are both equally "strange." Both sequences are equally unlikely.

But in another sense, they aren't. We have reasons to believe that coins are "fair" and thus tend to produce H and T with equal frequency. Your first sequence one of a small set of sequences that are inconsistent with that prior belief or hypothesis, whereas the second sequence is perfectly consistent with the "fair coin" prior or hypothesis.

Thus the first sequence causes us to re-evaluate our belief, whereas the second doesn't. I'm tempted to say that the first provides new information and the second doesn't, but I'm not familiar enough with information theory to do more than throw it out there for someone else to correct.

technoextreme · Aug 5, 2008

jsfisher said:
That some sequence occurred tells you nothing about whether the coin is fair or not. Yes, a particular sequence is highly improbable, but it is just as improbable as any other sequence, so it, in and of itself, contributes nothing to the question of fairness.

You need to look at other characteristics of the sequence. The ratio of heads to tails is such a characteristic. With a fair coin, we can expect the ratio to approach 50%. The further from 50%, the more probable the coin is weighted.

Yup. That's all you need to know. Unfortunately, Dr. A picked an example where calculating the mean of a statistical phenomenon is trivial.

I know that. But how do I know which is the right statistic to look at? I just do, it makes sense. But why? Why is that the statistic I need to look at? How do I know except for common sense?

The mean is the value you want. Though techincally in this case the mean would be between heads and tails. Then you would need the standard deviation.

sol invictus · Aug 5, 2008

Dr Adequate said:
I know that. But how do I know which is the right statistic to look at? I just do, it makes sense. But why? Why is that the statistic I need to look at? How do I know except for common sense?

technoextreme said:
The mean is the value you want.

What kind of answer is that?

The real answer is that the hypothesis you are testing concerns the mean. When we say a coin is weighted, we are presenting a theory which predicts the coin is more likely to flip heads than tails (say). So when we compute the probability of some data given the theory, the answer depends on the mean of that data, but not on its sequence.

You might want to include in your theory some more ingredients, e.g. that successive flips are independent. That's entirely up to you. If you do so, those will also figure in to computing P(data|theory). Regardless, you can always apply Bayes' theorem so long as your theory about the coin is well-defined.

technoextreme · Aug 5, 2008

sol invictus said:
What kind of answer is that?

It was just as good answer as yours is. If he picked an example that was a little less trivial I'd be able to show him what he needs to be looking at. I'm getting a vibe that if the coin was fair the central limit theorem would come into play and then if your distribution does not look Gaussian after a really large number of events you have a problem.
EDIT:
Just checked wikipeida your answer is just as good as my two word answer was. The Central Limit Theorem can come into play and it's a relatively simplistic manner to determine whether or not a coin is fair.
EDIT:
Yup. I have all the pieces to the puzzle. Unfortunately, it's eleven at night and I fee like my ability to communicate will suffer.

Jeff Corey · Aug 5, 2008

Dr Adequate said:
I never did statistics past A Level, which has worked fine so far. However, I now find myself scratching my little head.

Suppose I have a hypothesis that a coin is fair. I toss it a thousand times, and 990 times it comes down heads. I can work out the probabilty of that, and say: now, this would only happen one time in (some large number I can't be bothered to calculate) if the hypothesis is correct, therefore the hypothesis only stands those odds of being correct.

Now, this seems to me to be fair and reasonable.

Only I can also look at any sequence of a thousand tosses, and say: now, this would only happen one time in 2¹⁰⁰⁰ if the hypothesis is correct, therefore the hypothesis only stands those odds of being correct.

Now, the reasoning is the same in both cases, but in the second case it obviously doesn't work. Therefore, something is wrong with the reasoning.

I can see why the reasoning must be untrue in the second case: for the statement "this would only happen one time in 2¹⁰⁰⁰ if the hypothesis is correct" would be true however the coins fell, whether or not the coin was fair, and so cannot be held to contradict the hypothesis.

Hence the reasoning in the first case is incorrect and incomplete. How do I remedy this? Could I see the reasoning and the math?

There must be something somewhere which explains how one's definition of "improbable" relates to the testing of a statistical hypothesis: a formal and well-founded way to tell how significant to a statistical hypothesis an improbable event really is. All events are improbable to some degree, depending on how you do the counting. What is the right way to do the counting, and how is this demonstrated?

Thanks.

It's like getting dealt all spades in a bridge hand. All outcomes are probable, some are more impressive than others.

Vorpal · Aug 5, 2008

Dr Adequate said:
I know that. But how do I know which is the right statistic to look at? I just do, it makes sense. But why? Why is that the statistic I need to look at? How do I know except for common sense?

Once you define your null and alternative hypotheses, the main issue would be if your statistic can adequately distinguish them (bad statistic: has a head, y/n?); otherwise, you'll just get no reasonable conclusion. Continuing Sol's post, let's use the one you're worried about: the entire sequence. You'll see that under the assumption that the flips are independent, the exact sequence won't matter, just the number of heads and tails.

Scenario: a coin was flipped 1000 times, and gave some particular sequence of (for simplicity) exactly 500 heads and 500 tails. Call the particular sequence data D1.
Null hypothesis: the coin is fair (theory A).
Alternative hypothesis: the coin is biased with probability p≥2/3 of giving heads (theory B).
Observation: the polynomial p^k(1-p)^k is strictly decreasing on 1/2<p<1.
P(D1|A) = 2^-1000 ≅ 9.3E-302
P(D1|B) ≤ 2⁵⁰⁰/3¹⁰⁰⁰ ≅ 2.5E-327
That's a difference of over two dozen orders of magnitude. [Edit: Somehow, 'dozen' was swallowed by a black hole... ugh.]

Dancing David · Aug 5, 2008

Dr Adequate said:
But my chances of getting any sequence of heads and tails is one in 2¹⁰⁰⁰.

You're answering the wrong question.

Why is one of these probabilities important in deciding whether our coin is fair, and the other is not?

Yes, I know that we both understand why. But that's just our common sense, I want to see the reasoning and the math.

i would use a frequentist model to test the coin. As to a particular sequences , that changes the probabilities and gets into all the factorial stuff.

If the hypothesis that is being tested is:

Is this coin fair?

Then the easiest method is to run trials of say twenty flips and record the numbers of heads and tails. Idealy using different people as flippers. If after the use of say five trials of twenty tosses by twenty people, the distribution approackes 50%/50% then I would say that it is a fair coin.

What is the exact question?

Walter Wayne · Aug 5, 2008

If we had a different question, maybe it would be easier to see why the sequence is less important.

Lets suppose your friend tells you that the road out front his house is a well-travelled road. So you decide sit out front his house and count cars for a day. If you sat there the whole time and saw nothing, you would suspect that it isn't a well travelled road. All things being equal it is very unlikely that a well travelled would have 0 cars pass during a given day.

Now suppose you saw thousands of cars, would the fact that they were a red Honda, blue Ford, a green mercedes and so on have any affect on your hypothesis that it is a well travelled road? That particular sequence of cars may be astronimically small, but it is irrelevent to your thesis that the road is well travelled.

The same is true for the coins. If the null-hypothesis is "this is a fair coin" you don't care about the odds of a fair coin coming up with a particular sequence, so much as the odds that a fair coin would perform as well as it did on your particular test for fairness.

Walt

Puzzled By Statistics

Banned

Penultimate Amazing

Illuminator

Graduate Poster

Philosopher

Unregistered

Muse

Banned

ETcorngods survivor

Banned

Banned

Philosopher

Illuminator

Illuminator

Philosopher

Illuminator

New York Skeptic

Extrapolate!

Penultimate Amazing

Wayne's Words