0
$\begingroup$

Say the probability of success is 56% and we're trying to find the probability of success of the 60% mark. For example:

Say we have a population of 10, the probability of success is 56%, and we want to know how likely it is to get a 6. Whatever the result is, it's higher than the % than when the population is 1000 and we're trying to find the probability of the 600 mark. Why is this?

I realize the SD for the population = 10 scenario is relatively smaller than when the population = 1000 scenario, but why? Why is this?

The SD when the population is 1000 = $$ sqrt(245) = 15.6 $$ vs when the population is 10 = $$ sqrt(2.46) = 1.56 $$

What's the intuition behind the SD being bigger for when n is bigger?

2 Answers 2

1

While smaller populations do have smaller standard deviation than large ones, they have larger percentage standard deviations.

You can consider the number of successes as the sum of independent random variables $$ N = X_1+X_2+\ldots +X_n$$ where each $X_i=1$ if the $i$-th trial is a success and $X_i=0$ if it is a failure. The mean number is of course $$E(N) = nE(X) = np$$ where $p$ is the probability of success. For independent variables, the variance adds as well $$ \mathrm{Var}(N) = n\mathrm{Var}(X) = np(1-p).$$

So, as you said, the standard deviation is larger for larger samples and in fact increases proportional to $\sqrt{n}.$

However, when you look at a question like the one you started with: what is the probability of having more than $60\%$ successes, you aren't interested in the standard deviation, you are interest in the standard deviation relative to $n.$ So this is $\sqrt{\mathrm{Var}}(N)/n$ which is proportional to $1/\sqrt{n}$ which goes down with $n$ (in other words the size of percentage fluctuations, not absolute fluctuations). This is why there is a higher likelihood of more than $60\%$ successes for small samples when $p=.56$.

The intuitive reason for why the percentage fluctuations go down with $n$ is that as you you take a larger and larger sample you expect your true percentage of success to be very close to its long run average (things tend to average out to their true means... this is the law of large numbers). In other words the width of the distribution approaches zero.

The intuitive reason why your standard deviation goes up as $n$ increases is that there are more possible combinations of successes and failures as $n$ increases, and even though they tend to cancel out toward the mean, the amount by which they miss the mean increases with $n$. For $1000$ trials with $p=.5$ so that the mean is $500$, it could over or undershoot the mean by up to $500$ whereas for $10$ trials it could only over/undershoot by $5.$ Now, it's very rare for it to overshoot by that much, particularly in the $n=1000$ case, so things are a bit more complicated than this, but it's still true that the distribution is wider by a factor of $10$ in the case with $n=1000$ compared to $n=10.$

  • 0
    this was clear thanks. The takeaway for me is that as n gets larger, the absolute value of the SD goes up proportional to $ \sqrt{n} $ but the percentage SD goes down is proportional to $ 1/ \sqrt{n} $2017-02-17
0

There are actually several related things happening here

  • Repeat the process more times and the range of possible numbers results spreads out

  • The spread of the first effect is proportional to the square root of the sample size

  • The proportion of successes gets closer to the expectation

As an illustration of the first effect, the probability of seeing $6$ out of $10$ if you have $p=0.6$ is about $0.251$. But the probability of seeing exactly $600$ out of $1000$ if you have $p=0.6$ is about $0.0257$. This shouldn't be a surprise: it would be a more precise result and so is less likely. If you allowed a wider range of possibilities, then with $p=0.6$ you would have a probability of about $0.253$ of seeing say from $596$ through to $605$ out of $1000$, roughly the same as seeing $6$ out of $10$.

The second effect is shown in your standard deviation calculation. Multiplying the sample size increased the standard deviation a factor of ten, which is why it was sensible to take ten values in the illustration. More generally the standard deviation is $\sqrt{np(1-p})$, which you can calculate using the variance of the sum of independent samples

As an illustration of the third effect, take your example of $p=0.56$. This would make $6$ out of $10$ about $0.258$ standard deviations from the expectation of $5.6$, while it would make $600$ out of $1000$ about $2.58$ standard deviations away from the expectation of $560$. So this time the probability of seeing exactly $6$ out of $10$ is about $0.243$, but (allowing for the first effect) of seeing from $596$ through to $605$ out of $1000$ is about $0.0099$ which is much less. With $1000$ attempts, being about $0.258$ standard deviations above $560$ would be equivalent to $564$, and allowing for the first effect might suggest looking at seeing from $560$ through to $569$ out of $1000$ which has a probability of $0.240$, much closer to the probability of seeing exactly $6$ out of $10$