5
$\begingroup$

I think I just need to be pushed to the right formula or algorithm... Imagine you've got a "real" dice which is not an optimal one. And you want to know the confidence intervals for each result. So you rolled the dice a couple of times and get the following absolute probabilities as result:

#eyes #occurrences ------------------ 1     10 2     11 3     24 4     13 5     14 6     11 

You actually want to know weather e.g. this 24 times 3 eyes is just a random result or weather it's really more probable. If so, how much more probable is it (for sure)? So I would like to calculate a 99%-confidence interval for the probabilities.

How to calculate this? I probably know this from statistics in university, but just forgot it... so you don't need to go to much into detail. Just need the right formula/algorithm to look for...

Thanks for your help.

--- edit --- Just to make clear, why I do not just lookup "Confidence Interval" at wikipedia. I would know how to calculate everything if there would be only two cases (e.g. like a coin... 0 and 1). Then I would be able to apply the formula, but I just didn't use such statistics for some years now and just don't see the solution how to reduce the problem. I just think about taking the result in question (e.g. 3 eyes) as "p" and all other results as "\not p"; does that work?

3 Answers 3

2

Let $\{X_i\}_{i=1}^n$ be independent identically distributed die point random variables, corresponding to the output of each die trow. Let $N_k = \sum_{i=1}^n [ X_i = k ]$ be the number of occurrences of score $k$.

The vector $(N_1, N_2, \ldots, N_6)$ follows a multinomial distribution $\operatorname{Mult}(n, \{p_1, p_2, \ldots, p_6\})$, where $p_1 = \mathbb{P}(X=1)$, $p_2 = \mathbb{P}(X=2)$, etc.

Consider the following statistics: $ S = \left( \frac{N_1}{n} - p_1\right)^2+\left( \frac{N_2}{n} - p_2\right)^2 + \cdots + \left( \frac{N_6}{n} - p_6\right)^2 $

For the current sample, $n=83$ and $S=0.01957$, and under the null hypothesis of a fair die, the probability $ \mathbb{P}(S > 0.01957) \approx 0.08 $ Thus the hypothesis can not be rejected at the 5% level, but can be rejected at 10% level. enter image description here

  • 0
    @StefanK. Yes assuming Sasha got 0.08 for a two-sided test you would reject at the one-sided 0.05 level.2012-08-21
5

The most common way to do this is to use binomial proportion confidence interval. Let $p$ be the probability that the dice shows 3. This probability is not known but you did an experiment where $24$ out of $83$ trials is 3. Now, your estimated $p$, i.e. $\hat{p}$, is $24/83 = 0.2892$. To get the 95% interval for this estimate, the formula is: $z_{1-\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$, where $\alpha=5\%$ is the error level, $z_x$ is the $x$ percentile of the standard normal distribution. For $\alpha=5\%$, $z_{1-\alpha/2}=1.96$. And $n$ is the number of trials.

Check this page on wikipedia for reference.

  • 1
    Thanks... even more detail than needed ;-) That saves me even the time looking that formula up.2012-08-21
4

The outcome can be dichotomized to the events A={you roll a 3} and B={ you roll something other than a 3}. Let p be the probability of rolling a three. Then the number of threes rolled is binomial with n=83 (in the example) and p unknown. Using the binomial distribution you can construct your 99% confidence interval for p. If the interval doesn't contain 1/6 and the lower bound is above 1/6, you can conclude at the alpha=0.01 level that the die tends to roll more 3s than you would get by chance.

  • 0
    I'm accepting this answer, as it's more high level... a bit better adjusted to my needs.2012-08-21