9
$\begingroup$

Here's a question I've been considering: Suppose you roll a usual 6-sided die 10 times and sum up the results of your rolls. What's the probability that it's divisible by 10?

I've managed to solve it in a somewhat ugly fashion using the following generating series: $(x+x^2+x^3+x^4+x^5+x^6)^{10} = x^{10}(x^6 - 1)(1+x+x^2+\cdots)^{10}$ which makes finding the probability somewhat doable if I have a calculator or lots of free time to evaluate binomials.

What's interesting though is that the probability ends up being just short of $\frac{1}{10}$ (in fact, it's about 0.099748). If instead, I roll the die $n$ times and find whether the sum is divisible by $n$, the probability is well approximated by $\frac{1}{n} - \epsilon$.

Does anyone know how I can find the "error" term $\epsilon$ in terms of $n$?

  • 0
    It turns out that for large $n$ the approximation $\frac{1}{n}$ is not very good (for example for $n=100$ the probability is about $0.0006344$ rather than $0.01$) so $\epsilon$ is about $\frac{1}{n}$ or, if you prefer, $\epsilon n \to 1$.2012-03-07

3 Answers 3

6

Define $$ \begin{align} P_n(x) &=\left(\frac{x+x^2+x^3+x^4+x^5+x^6}{6}\right)^n\\ &=\sum_{j=n}^{6n}a_j\,x^j\tag{1} \end{align} $$ The probability that the the sum, when rolling a $6$-sided die $n$ times, is divisible by $n$ is the sum of the $a_k$ where $k$ is a multiple of $n$. To find the sum of those coefficients, we will compute $$ \begin{align} \frac1n\sum_{k=0}^{n-1}P(e^{i2\pi k/n}) &=\frac1n\sum_{k=0}^{n-1}\sum_{j=n}^{6n}a_k\,e^{i2\pi jk/n}\\ &=\sum_{j=n}^{6n}a_j\left(\frac1n\sum_{k=0}^{n-1}e^{i2\pi jk/n}\right)\tag{2} \end{align} $$ because $$ \frac1n\sum_{k=0}^{n-1}e^{i2\pi jk/n}=\left\{\begin{array}{}1&\text{if }n\,\vert \,j\\0&\text{if }n\!\!\not{\vert}\,j\end{array}\right.\tag{3} $$ Note that $$ P_n(x)=\left(\frac16\frac{x^7-x}{x-1}\right)^n\tag{4} $$ so that when we set $x=e^{i2\pi k/n}$, we get $$ \begin{align} P_n(e^{i2\pi k/n}) &=\left(\frac16\frac{\sin(6\pi k/n)}{\sin(\pi k/n)}e^{i7\pi k/n}\right)^n\\ &=(-1)^k\left(\frac{\sin(6\pi k/n)}{6\sin(\pi k/n)}\right)^n\tag{5} \end{align} $$ Thus, the probability that the the sum, when rolling a $6$-sided die $n$ times, is divisible by $n$ is $$ \begin{align} \frac1n\sum_{k=0}^{n-1}P_n(e^{i2\pi k/n}) &=\frac1n\sum_{k=0}^{n-1}(-1)^k\left(\frac{\sin(6\pi k/n)}{6\sin(\pi k/n)}\right)^n\\ &=\frac1n+\frac1n\sum_{k=1}^{n-1}(-1)^k\left(\frac{\sin(6\pi k/n)}{6\sin(\pi k/n)}\right)^n\tag{6} \end{align} $$ For large $n$, $\left(\dfrac{\sin(6\pi k/n)}{6\sin(\pi k/n)}\right)^n$ is very close to $1$ for a large range of $k$, so I do not see a way to get a simple estimate for the error, but the error is given exactly by $$ \frac1n\sum_{k=1}^{n-1}(-1)^k\left(\frac{\sin(6\pi k/n)}{6\sin(\pi k/n)}\right)^n\tag{7} $$ Staring at $(7)$, it is evident why, for $n\in\{1,2,3,6\}$, $\frac1n$ is exact, as Henry notes in his answer (Hint: consider $\sin(6\pi k/n)$).

  • 1
    @Aryabhata: I did compute the series $\frac{\sin(6x)}{6\sin(x)}=1-\frac{35}{6}x^2+O(x^4)$. For the OP's example of $n=10$, the terms in $(7)$ for $k=1$ and $k=n-1$ dominate the rest and $-\frac2n\left(\frac{\sin(6\pi/n)}{6\sin(\pi/n)}\right)^n$ looks like a good estimate for the error. However, for larger $n$, the terms in $(7)$ with $k$ further from $0$ and $n$ start being significant, so that simple estimate breaks down. Furthermore, even at $n=10$, the series expansion needs several terms to become decent. Larger $n$, more terms in $(7)$; smaller $n$, more terms in the series.2012-03-06
  • 0
    Yeah, I had a mistake earlier (so I deleted the comment).2012-03-06
  • 0
    Wow, that's a really cool way of approaching it, and thanks for showing me how to find the exact error!2012-03-07
  • 1
    @HermanChau: what is a bit surprising to me is that $$\sum_{k=1}^{n-1}(-1)^k\left(\frac{\sin(6\pi k/n)}{\sin(\pi k/n)}\right)^n\in\mathbb{Z}$$2012-03-07
2

One way would be to calculate it: it is not difficult using a spreadsheet and recursion, and a short piece of R code is shown below.

$\epsilon$ seems to be zero for $n=1,2,3$, to be positive $\frac{1}{648}$ for $n=4$, negative $- \frac{1}{9270}$ for $n=5$, zero for $n=6$, and then to be positive and increasing up to a maximum of about $0.01223$ for $n=52$ after which it starts to decline.

Rather more seriously $\frac{1}{n}$ soon becomes a very poor approximation to whether the sum is divisible by $n$. For $n \gt 43$ it is more than twice the true value and then gets rapidly worse. There is a better alternative.

Added

A better approximation for large $n$ would be to assume the central limit theorem applies well enough and work out the probability of achieving $3$ times the number of rolls or $4$ times the number of rolls ($1$, $2$, $5$, and $6$ times the number of rolls are too unlikely to be worth worrying about). These are half the number of rolls away from the mean, and the variance is $\frac{35}{12}$ times the number of rolls. So a possible approximation for the required probability is

$$\sqrt{ \frac{24}{35 \pi n} } \exp\left( - \frac{3 n}{70} \right).$$

It is a good approximation for a wide range of values as shown in the chart drawn by the following R code which calculates the actual values. It is a better approximation than $\frac{1}{n}$ for $n \ge 14$.

maxrolls <- 1000
probmultofrolls <- rep(0,maxrolls)
rolls <- 1:maxrolls
probs <- 1 # after 0 rolls score is zero with probability 1
for (r in rolls) {
    probs <- (c(0,probs) + c(0,0,probs) + c(0,0,0,probs) + c(0,0,0,0,probs) + 
              c(0,0,0,0,0,probs) + c(0,0,0,0,0,0,probs) ) / 6
    probmultofrolls[r] <- sum( probs[r * (1:6) + 1] ) # +1 because started at 0
   }     
plot(  probmultofrolls ~ rolls, log="xy" )
lines( 1/rolls ~ rolls, col="blue" )
lines( exp( -3*rolls/70) * sqrt(24 / (35 * pi * rolls) ) ~  rolls, col="red" ) 

which produces the following chart with logarithmic scales and $1 \le n \le 1000$: black circles are the actual values; the blue line is $\frac{1}{n}$; and the red line is the approximation based on the central limit theorem.

enter image description here

Even this may not hold for even larger $n$, since the central limit theorem is often not a good approximation in the extreme tails of a distribution. It is within 5% of the true figure when $9 \le n \le 191$.

  • 0
    Indeed, for $n\in\{1,2,3,6\}$, the error is $0$. I have explained this in my answer.2012-03-06
1

The distribution of the sum converges to normal distribution with speed (if I remember correctly) of $n^{-1/2}$ and that error term could be dominating other terms (since you have got just constant numbers (=six) of samples out of resulting distribution). However, there is a small problem -- probability that your sum will be divisible by $n$ tends to $0$ with speed $n^{-1}$. Still, I typed something into Mathematica and this is what I've got:

In[1]:= 
    X := Sum[
         Erfc[(6*Sqrt[2]*(7n/2 - k*n+1))/(35n)]/2
        -Erfc[(6*Sqrt[2]*(7n/2 - k*n))/(35n)]/2, 
         {k, {1,2,3,4,5,6}}
    ]
In[2]:= Limit[X, n -> Infinity]
Out[2]= 0
In[3]:= N[Limit[X*n, n -> Infinity]]
Out[3]= -0.698703
In[4]:= N[Limit[(X+1/n)*n, n -> Infinity]]
Out[4]= 0.301297

The Erfc is the cumulative probability function of normal distribution, the formula inside is to adjust for mean and variation. $X$ should be approximation of what you are looking for. What it shows (In[3] and In[4]) is that there is a typo in my formula or this does not converge to what you think it should (in fact that may be true (I am not sure), i.e. in each sixth part you are always off from center (or wherever the mean of that part value is) by constant margin). Hope that helps ;-)