2
$\begingroup$

Consider a Bernoulli-trial with $n$ trials and probability $p$ of each trial. Following the sigma-rules the probability the probability that the number of successes $k$ lies in the interval $[\mu - 1,96\sigma;\mu + 1,96 \sigma]$ where $\mu = np$ is approximately 95 percent. I.e. with this probability the following inequality holds:

$$ np - 1,96 \sqrt{np(1-p)} \leq k \leq np + 1,96 \sqrt{np(1-p)} $$

or with $h = \frac{k}{n}$:

$$ p - 1,96\sqrt{\frac{p(1-p)}{n}} \leq h \leq p + 1,96 \sqrt{\frac{p(1-p)}{n}} $$

You get the limits of this interval by solving the equation:

$$ p \pm 1,96\sqrt{\frac{p(1-p)}{n}} = h $$

This derivation is from a (german) high school level math book. Now the book proceeds that one may circumvent the pain to solve this equation exactly, one may easiliy get an approximate solution as follows:

If $n$ is very large ($n > 1000$) or $h$ between $0,3$ and $0,7$ or $\sigma = \sqrt{np(1-p)}$ larger than $3$, one may get the following approximation:

$$ p(1-p) = h(1-h) $$

i.e.

$$ p = h \mp 1,96\sqrt{\frac{h(1-h)}{n}} $$

Why is this approximation valid? I don't see it for none of the three conditions.

  • 0
    For large $n$ it follows from the [law of large numbers](https://en.m.wikipedia.org/wiki/Law_of_large_numbers#Examples), so you may look into that. I'm not sure where other conditions came from, but someone will have a good answer hopefully.2017-02-08

1 Answers 1

0

First, remind Central Limit Theorem for Bernoulli trials:

If $k=k(n)$ is a number of successes in $n$ trials with probability of success $p$ in each trial, then the distribution of r.v. $\frac{k-np}{\sqrt{np(1-p)}}$ approaches the standard normal distribution.

We can write this fact as $$\frac{k-np}{\sqrt{np(1-p)}}\stackrel{d}{\to} \mathcal N(0,1).$$

This statement means in particular that the probability of l.h.s. to be inside interval $(-1{,}96,\,1{,}96)$ tends to the same probability for standard normal distribution, which is $0,95$.

For $n$ sufficiently large $$P\left(-1{,}96\leq\frac{k-np}{\sqrt{np(1-p)}}\leq 1{,}96\right)\approx 0{,}95$$

The accuracy of this approximation depends on the value of $p$. If denominator $\sqrt{np(1-p)}$ is small, the accuracy is not so good. So, the normal approximation provides good accuracy for large $n$ and for $p$ separated from 0 and 1.

We know also, that $h=\frac{k}{n}$ tends to $p$ in probability when $n$ increases. This is the statement of Bernoulli Law of Large Numbers. Therefore $h(1-h)$ tends in probability to $p(1-p)$, and also the ratio $$\frac{\sqrt{np(1-p)}}{\sqrt{nh(1-h)}}$$ tends to $1$ in probability.

By Slutsky's Theorem, we can multiply two r.v.'s: $$\frac{\sqrt{np(1-p)}}{\sqrt{nh(1-h)}}\cdot \frac{k-np}{\sqrt{np(1-p)}}\stackrel{d}{\to} \mathcal N(0,1).$$

Reducing the numerator of the first fraction with the denominator of the second one. Thus we have replaced $p(1-p)$ at $h(1-h)$.

$$\frac{k-np}{\sqrt{nh(1-h)}}\stackrel{d}{\to} \mathcal N(0,1),$$ so $$-1{,}96 \leq \frac{k-np}{\sqrt{nh(1-h)}} \leq 1{,}96$$ with approximately the same probability as above.

When $p$ is unknown, there is a unique way to control the value of $p$ to be separated from $0$ and $1$ for good approximation. You can see at $h$ which is consistent estimator for $p$. If $n$ is large and $h$ is not too close to $0$ and $1$, we can hope that approximation is quite good.

Therefore, this recipe is given: If $n$ is very large ($n>1000$) or $h$ between $0,3$ and $0,7$. This tips can be slightly different in various sources.