4
$\begingroup$

Recently I faced with the problem, and I can't explain it to my colleagues.

The thing is, that they want to conduct a surveys with one question and the question may have only two answers, and they ask me how many surveys conducted is enough for the statistical significance. I tried to explain, that for the statistical significance you need the hypothesis about the approximate distribution of answers, that you need to accept or decline, but they don't understand, and I cannot explain it with easy examples and non-mathematician language.

Help me with the explanation, or explain it to me if I'm wrong

Thanks

  • 1
    The most important thing is to point out that statistical significance is never a proof. We only can have a high probability that some hypothesis is false.2017-02-28
  • 0
    I am not knowledgeable enough to be of much help, but +1 for being willing to help them out, and also for looking for good answers. The statistics SE site might also be helpful.2017-02-28
  • 0
    Moreover, the level of significance is not fixed. You can take $5$%, $1$% , $0.1$% , $\cdots$ . Whether a hypothesis will be rejected can depend on the used level.2017-02-28
  • 0
    @Peter they understand it about the error assumption and p-values. What they don't understand is that for the different questions, with different answer distribution, the amount of samples, enough for statistical significance can differ.2017-02-28

1 Answers 1

3

In the frequentist perspective, "statistical significance" is an inference based on some kind of burden of proof.

One way to explain a hypothesis test to a non-statistical audience is to give an example about coin tossing.

Suppose I have a coin. I know whether it is a fair or unfair coin; i.e., I know whether it lands heads or tails with equal frequency, or if it does not. I let you borrow the coin, but I don't tell you about its fairness. How would you try to infer this property of the coin?

Naturally, your intuition tells you to flip it, preferably "many" times, and see whether the coin lands heads or tails with "approximately" equal frequency. But how much is "many?" How "approximately equal" do the frequencies of outcomes need to be, or how "approximately unequal" do they need to be for you to be confident in asserting that the coin is unfair?

Let's address this second question first by momentarily supposing that I have only allowed you to keep the coin for ten flips, after which I will want it back. So you perform the experiment, flip the coin ten times, and you observe that the coin has landed heads 9 times and tails once. This intuitively seems very odd to you. You might think, "wow, this coin seems biased." But is it? And if it is, how confident are you of this assertion?

After all, even a fair coin could behave this way. It may not be very likely, but it isn't impossible.

This leads us to reason that if the coin had indeed been fair, what was the chance of observing an outcome so extreme? That is to say, how likely is it that a fair coin, when flipped ten times, could show at least:

  • 9 heads, 1 tail
  • 10 heads, 0 tails
  • 9 tails, 1 head
  • 10 tails, 0 heads?

We count all of these outcomes because in a sense, you would be just as surprised to see the last three outcomes as you would be to see the first. 9 tails and 1 head from a fair coin is just as surprising as 9 heads and 1 tail. And 10 of the same outcome is even more surprising.

Well, all 10 heads has a chance of $1/2^{10} = 1/1024$ of happening. Exactly 9 heads and one tail has a chance of $10/2^{10} = 10/1024$. So by symmetry, the total probability of an outcome at least as extreme as what you observed, assuming the coin was fair, is $$\frac{1+10+10+1}{1024} = \frac{11}{512} \approx 0.0214844 \approx 2.15\%.$$ This means, if you had a hundred people perform the same experiment on my coin as you did, you should expect about $2$ of those people to see the same face (heads/tail) at least 9 times in 10 trials.

So, unlikely, but not impossible. And this speaks to your level of confidence of your assertion that the coin is unfair: "statistically significant" might mean that you are willing to be wrong about the coin (in the sense of erroneously concluding it is unfair when it actually was fair) as much as $5\%$ of the time, in which case, your experiment met your definition of significance. But if I now tell you, "if you make an incorrect claim that the coin is unfair, you will be penalized \$1000," you might be much less willing to take a 5 percent chance on being wrong in this way, especially if there is no penalty for failing to detect if the coin is unfair. You would naturally want to demand that you be allowed to flip the coin more times and your burden of proof would be higher--perhaps you'd need to see at least 19 out of 20 heads, or 99 out of 100 heads, before you're willing to assert the coin is unfair.

And this goes back to our first question: how many tosses is "enough?" If you say you need to be at least 99% confident, then clearly, being allowed to toss the coin only five times is nowhere near good enough: even if the outcome is all heads, or all tails, there is a $1/32 + 1/32 = 1/16 = 0.0625 > 0.01$ chance of a fair coin giving such an outcome.

Without going into too many mathematical details, then, the sample size is clearly related to our error tolerance. If I let you toss the coin a thousand times, and your standard of statistical significance was that you need to be at least 99% confident that the coin is unfair, you would need to observe at least 542 heads or 542 tails out of 1000 tosses to conclude that the coin is unfair.

  • 0
    Thank you for your full answer. The coin with 10 flips - that what I was thinking about too, but I'm pretty sure, that the answer from my colleagues to this example will be like "that means that 10 flips is not enough, but maybe 100 is".2017-03-01
  • 0
    It's very good explanation why some amount of experiments can be enough and not enough to get the statistical significance at the same time. But more interesting is how to explain that there is no enough amount of flips to be sure for every distribution.2017-03-01