I have been told that an event occurs about once in 50 times. In my experience, it is more like once in 40 or 45, which would not be insignificant (if correct). However because this event does not happen very often I cannot be sure. What sample size would be sufficient for me to state that it's once in 40 , ofr example ?
What sample size do I need to justify my suspicions?
- 
0(Note that this depends on the degree of certainty [e.g. 95%] you require before stating your conviction, which could depend on a number of things. See also: http://xkcd.com/1132/) – 2012-11-26
- 
0Yes. Let's say I need 95% confidence, if that helps. – 2012-11-27
1 Answers
Are the events (assumed to be) independent? Do you expect each event to occur with the same probability in every trial? If so, you can set $X$ = # of events in $n$ trials, $n$ = #of trials, and $p$ = probability of the event occurring. Under the above assumptions, $X\sim Binomial(n,p)$ and if both $np>5$ and $n(1-p)>5$ we can apply the normal approximation to the binomial distribution Hogg, et. al. (2005), page 222. The distribution of $X$ is approximately
$$ X\sim N(np,\sqrt{np(1-p)}) $$
or equivalently for the sample proportion $\hat{p}=X/n$
$$ \hat{p}\sim N(p, \sqrt{p(1-p)/n}) $$
We can estimate the standard deviation of $\hat{p}$ as $s.e.\{\hat{p}\}=\sqrt{\hat{p}(1-\hat{p})/n}$ and then an approximate 95% confidence interval for $p$ would be
$$ \hat{p}\pm z_{0.975}\cdot \sqrt{\hat{p}(1-\hat{p})/n} $$
where $Z_{0.975}$ is the 97.5% quantile of the standard normal distribution. Using calculus, we can bound $s.e.\{\hat{p}\}$ by $\frac{1}{\sqrt{4n}}$. Then for any desired margin of error $M$ we need,
$$ z_{0.975}\cdot \frac{1}{\sqrt{4n}} < M $$
or
$$ n>(\frac{z_{0.975}}{2M})^2 $$
If in your case, you set the $M=p_0-p_1=\frac{1}{40}-\frac{1}{50}$, you could be certain that $p_0=\frac{1}{50}$ would not be in a 95% CI if you are correct that $p_1=\frac{1}{40}$ is the true probability. This would be equivalent to doing a two sided hypothesis test where $H_0: p=\frac{1}{50}$ and $H_1: p> \frac{1}{50}$. Note that the while large enough $n$ ensures that the margin of error is small, whether you can reject $H_0$ will depend on the true probability $p$ which is unknown.
- 
0Pretty good analysis ! My problem comes from the real world and I am willing to accept a practical solution, even an inaccurate one. Could it be that H1 is p < 1/50 instead ? Doing the math in the last formula it follows that n > 6972 or so. Wow ! Really big number. – 2012-11-27
- 
0Yes, that would be the alternative hypothesis in your case. – 2012-11-28
