5
$\begingroup$

Say I collect 40 perfectly random integers between 1 and 400. What's the chance that any integer is repeated consecutively six times in such a random draw?

What I'm looking for is the chance of sequences like [372, 193, 42, 42, 42, 42, 42, 42, 274, 42, 7, ...], [372, 193, 42, 42, 42, 42, 42, 42, 274, 242, 7, ...], or [372, 193, 42, 42, 42, 42, 42, 42, 42, 42, 42, ...] as they all fulfills what I'm looking for. As a counter example [372, 193, 42, 42, 42, 42, 42, 77, 274, 42, 7, ...] does not satisfy my conditions because the six 42's are not consecutively repeated.

The Birthday problem gives that it's an 87% chance that two of the 40 are the same number but I'm failing to go from that to calculating the chance of a certain integer consecutively repeating itself n number of times in the random collection.

  • 0
    @amWhy `from a pool of 400` - no, not a pool. Is it more clear if I call it a collection of 40 random integers between 1 and 400?2011-06-30

3 Answers 3

9

You can attack this with a Markov chain, with six states (the number of consecutive numbers of the last run), and with an absorbent final state. So the stochastic matrix is given by

 N = 400  P =  [ 1-1.0/N   1.0/N    0.0      0.0      0.0      0.0    ;         1-1.0/N   0.0      1.0/N    0.0      0.0      0.0    ;         1-1.0/N   0.0      0.0      1.0/N    0.0      0.0    ;         1-1.0/N   0.0      0.0      0.0      1.0/N    0.0    ;         1-1.0/N   0.0      0.0      0.0      0.0      1.0/N  ;         0.0       0.0      0.0      0.0      0.0      1.0    ] 

And so the desired probability is

octave > ([1 0 0 0 0 0 ] * P^39 )(6) ans = 3.4097e-012 

Because the probabilities are so low, the most naive approximation (just compute the probability that some 6-run have consecutive numbers: the upper bound in Yuval's answer) is quite precise. You might want to try with other numbers to get more interesting/enlightening results.

Added : An asymtotic approach, for large values of tries $L$ and not so low probabilities: Think of the lenghts of the runs as random variables. Approximate them by $m$ iid geometric variables $x_i$ , with $p=1-1/N$, and $m$ chosen so that $m E(x_i) = L$, and compute the probability that all $x_i < k$ (in the original question $N=400$, $L=40$, $k=6$). This approximation (barring errors) gives :

$ \approx 1 - \left( 1 - \frac{1}{N^{k-1}}\right)^{\frac{N-1}{N} L }$

  • 1
    @TomasVana The Markov chain does not compute the length of the longest run, but the probability that it's at least 6. It has an absorbing state, corresponding to "a run of length 6 was found", so, it will be at that state at the end if that event happened at any time.2013-06-10
4

You can use inclusion-exclusion to get bounds on this. For $A \subset [40]$ of length $6$, let $S_A$ be the event that all numbers in $A$ are the same. The probability $p$ that you want is then bounded by $ \sum_A \Pr[S_A] - \sum_{A \neq B} \Pr[S_A \text{ and } S_B] \leq p \leq \sum_A \Pr[S_A]. $ Since $\Pr[S_A] = 400^{-5}$, it is easily to calculate the upper bound: it is $ \binom{40}{6} 400^{-5} \approx 0.000000375.$ Calculating the lower bound is more messy so I'll leave it to you. My guess is that it will be pretty close to the upper bound. If it isn't, you can always take more terms in the inclusion-exclusion formula.

  • 3
    @Jonas: Imagine the 40 tries numbered from 1 to 40. The probability of a 6-run starting at (say) position 1 is $400^{-5}$. The same if we consider the starting position 2, 3... 35. You have 40-k+1=35 posible positions for the 6-run. The approximation comes from just summing them, and that's only reasonable because, having so low probabities, the events we are "forgetting" (more than one 6-run) are negligible.2011-06-30
1

Here's my simplistic approach to calculate the probability.

We can look at this like rolling a dice with 400 sides 40 times. It's 1/400 to draw any one number. Gives $1/(400^k)$ chance to draw a certain number $k$ times. We are not interested in a certain number but any of the 400 will do and that gives $400/(400^k)$.

We have $40-6+1$ chances to hit a streak of $6$ consecutive repeated numbers.

$ (L-k+1)N/N^{k} $

$N=400, L=40, k=6$

$ (40-6+1)400/400^{6} \approx 3,418e{-12}$

But as leonbloy and Yuval already pointed out $N/N^{k}$ is equal to $N^{1-k}$ and thus it can be written as

$ (L-k+1)N^{1-k} $

$ 35*400^{-5} \approx 3,418e{-12}$

That's a tiny chance indeed.

  • 0
    Ah, thanks! A classic off-by-one error.2011-06-30