3
$\begingroup$

We throw a fair dice $n$ times. For each digit between 1 and 6, we count its frequency - the number of times the dice landed on this digit. What is the probability that at least two digits have the same frequency?

Initially I thought it is simple... the frequency of each digit is just a binomial random variable with $p=1/6$, so the probability that its frequency is $k$ is just:

$${n\choose k} (1/6)^k(5/6)^{n-k}$$

We can find, for each $k$, the probability that at least two digits have frequency $k$, then sum over all $k$.

The problem is, the variables are dependent, since the sum of all frequencies is $n$.

So what is the solution?

  • 1
    Are the $2$ digits chosen allready before throwing? Do you mean: **exactly** $2$ digits? I suspect the answer to both questions is "no", so that the answer equals $1$ minus the probability that all digits have different frequencies. Is that so?2017-02-24
  • 0
    @drhab You are right. I clarified the question.2017-02-24

1 Answers 1

2

It's easiest to think of the multinomial distribution here. Let $X_i$ be the number of times you roll $i$ in your $n$ trials so that the probability that $X_i=x_i$ for all $i$ is $\frac{n!}{x_1!x_2!x_3!x_4!x_5!x_6!}\frac{1}{6^n}$. You can then proceed in two ways: the first is to find the probability of the complement: $\mathbb{P}[X_i\neq X_j\text{ for }i\neq j]=\sum_{x_i\neq j\text{ for }i\neq j,\sum_ix_i=n}\frac{n!}{x_1!x_2!x_3!x_4!x_5!x_6!}\frac{1}{6^n}$. What may be easier is find the probability of exactly $k$ numbers having equal likelihood. The probability they're all equally is easily seen to be $\frac{n!}{\left(\left(\frac{n}{6}\right)!\right)^6}\frac{1}{6^n}$ if $n$ is divisible by $6$ and 0 otherwise. The probability that 5 of them are equal is $6\mathbb{P}[X_1=X_2=X_3=X_4=X_5]=\frac{n!}{6^{n-1}}\sum_{k:k\neq \frac{n}{6}}\frac{1}{\left(\frac{n-k}{5}!\right)k!}$, where we only sum over the $k$ so that the factorial arguments are non-negative integers, of which there are roughly $\frac{n}{5}$. Proceeding this way (with sums that have roughly $\frac{n^{6-k}}{k}$ terms for the probability that exactly $k$ are equal) you have a computer algorithm with $O(n^5)$ completion time. You also have something that you can work with to find asymptotic probabilities as $n\rightarrow\infty$, but getting closed forms of the above sums might be asking too much.

  • 0
    As an aside, the vector $(X_i)/n$ converges to the Dirichlet distribution, so you can show that for any $a_n=o(n)$, $\mathbb{P}[|X_i-X_j|$n\rightarrow\infty$, and for $a_n=sn$ for some $s<1$ its positive and easy enough to calculate in the limit. – 2017-02-24