2
$\begingroup$

Given $N$ trials of a die roll, where we have defined $D$ as the number of distinct outcomes, what would be the mean and standard deviation of $D$?

If we have defined $I(k)$ as an indicator random variable which equals 1 if outcome $k$ (such as 6) appears at least once, and 0 otherwise, for $k\in\{ 1,\dots,6\}$, then by definition $$D = \sum\limits_{k=1}^6 I(k)$$ How do the dependencies between the $I(k)$ play into the solution? (Which is the part that is tripping me up the most.)

  • 0
    Are you able to find the expectations, variances, and covariances of these indicator random variables?2017-02-10

2 Answers 2

1

We approach this problem from a combinatorial perspective. The number of $n$-roll sequences with $k$ distinct values ($1\le k\le6$), out of $6^n$ sequences total, is $$D(n,k)=\binom6kk!\left\{n\atop k\right\}=\binom6k\sum_{j=0}^k(-1)^{k-j}\binom kjj^n$$ where $\left\{n\atop k\right\}$ is the Stirling number of the second kind and counts the number of ways to partition the rolls into homogeneous subsets, and $\binom6kk!$ is the number of ways to fill those subsets with dice rolls. Letting $n$ vary across the positive integers we get $$D(n,1)=6$$ $$D(n,2)=15\cdot2^n-30$$ $$D(n,3)=-60\cdot2^n+20\cdot3^n+60$$ $$D(n,4)=90\cdot2^n-60\cdot3^n+15\cdot4^n-60$$ $$D(n,5)=-60\cdot2^n+60\cdot3^n-30\cdot4^n+6\cdot5^n+30$$ $$D(n,6)=15\cdot2^n-20\cdot3^n+15\cdot4^n-6\cdot5^n+6^n-6$$ $\frac{D(n,k)}{6^n}$ then gives the probability an $n$-roll sequence will have $k$ distinct values. The expected value of $D$ for a given $n$ is then $$\mu_n=\sum_{k=1}^6k\cdot\frac{D(n,k)}{6^n}=6\left(1-\left(\frac56\right)^n\right)$$ and the standard deviation is $$\sigma_n=\sqrt{\sum_{k=1}^6\frac{D(n,k)}{6^n}(k-\mu_n)^2}=\sqrt{\frac{5\cdot144^n-6\cdot150^n+180^n}{6^{3n-1}}}$$ Note that $$\lim_{n\to\infty}\mu_n=6\text{ and }\lim_{n\to\infty}\sigma_n=0$$ which match our intuition, since for large $n$ almost all roll sequences should contain all six outcomes.

The SymPy code that generated these results can be found here.

  • 0
    Awesome! This looks really interesting. One question though, how would you calculate mu if n was known (lets say n= 4 for example)? The limit as n approaches infinity makes a lot of sense to me2017-02-10
  • 0
    @Anthony Because I have obtained the formulas for $\mu_n$ and $\sigma_n$, all you need to do is to substitute $n$ into them to find them out. For example, when $n=3$ $\mu_n=91/36$.2017-02-10
  • 0
    oh wow that was a complete oversight on my part. Thanks!2017-02-10
2

Good tactic.   The distributions of the indicators are identical, though not independent, as you noticed.   Fortunately that is not a critical issue.

A really useful though counterintuitive, property of Expectation is that it is Linear and this holds whether the random variables are independent or not.   So the expectation of the count of distinct results is the sum of the expectation of these indicator random variables.

$$\begin{align}\mathsf E(D) ~&=~ \sum_{k=1}^6 \mathsf E(I(k))\\[1ex] & =~ 6~\mathsf E(I(1))\end{align}$$

Unfortunately this does not quite apply to variance; however, Covariance is Bilinear, which is almost as useful.

$$\begin{align}\mathsf {Var}(D) ~&=~ \sum_{k=1}^6 \mathsf {Var}(I(k))~+ \mathop{2~~\sum}\limits_{k,j~:~1\leq k < j\leq 6}\mathsf {Cov}(I(k), I(j))\\[1ex] &=~ 6~\mathsf {Var}(I(1)) + 30~\mathsf{Cov}(I(1),I(2)) \end{align}$$

You can also use the definition:

$$\begin{align}\mathsf{Var}(D) ~&=~ \mathsf E\left(\left(\sum_{k=1}^6 I(k)\right)^2\right)-\left(\mathsf E\left(\sum_{k=1}^6 I(k)\right)\right)^2 \\[1ex] &=~ 6~\mathsf E\left(I(1)^2\right) + 30~\mathsf E(I(1)I(2))- 36~\mathsf E(I(1))^2 \end{align}$$

So find $\mathsf E(I(1)), \mathsf {Var}(I(1)), \text{ and }\mathsf {Cov}(I(1),I(2))$ and apply them.

Shall we leave that up to you?


You should have for all supported $k$ that $\mathsf E(I(k)^2)=\mathsf E(I(k)) = 1-(\tfrac 56)^N$, and for all supported $j\neq k$ that $\mathsf E(I(j)\cdot I(k))=1-2(\tfrac 56)^N+(\tfrac 46)^N$.

  • 0
    Hey! Thanks for this! A couple of questions, how exactly does the k,j reduce to 1,2 in the last step. Also in this approach to the same question (http://imgur.com/a/3p2cv), would it be possible to calculate E(D^2) by having i^2 in the summation instead of i, then calculate Var(D) by doing E(D^2)-E(D)^2, given that we just calculated E(D)? Thanks!2017-02-10
  • 0
    @Anthony, (1) The identicallness of distributions means $\mathsf E(X_1)=\mathsf E(X_2) =\ldots$ and so forth. (2) No, not quite. Remember you are dealing with the square of a sum, not the sum of a square. $$\mathsf E\left[\left(\sum_k I(k)\right)^2\right]-\mathsf E\left[\sum_k I(k)\right]^2 = \sum_i\sum_j \mathsf E[I(i)\cdot I(j)]-\left(\sum_k\mathsf E[I(k)]\right)^2$$2017-02-10
  • 0
    I understand that the E(I(1)I(2)) = Cov(I(1),I(2) + E(I(1))E(I(2)) by definition of covariance. And the last term is just simple multiplication. But I'm having a lot of trouble calculating the covariance in this case. I don't really have an idea where to start so I'd like a push in the right direction!2017-02-11
  • 0
    It's easier to find $\mathsf E(I(1)\cdot I(2)) =\mathsf P(\text{both numbers appear at least once})$2017-02-11