7
$\begingroup$

Given an infinite sequence $a_n$ of uniformly random integers $0$ to $9$, what is the probability there exist an integer $m$ such that the sequence $a_1$ to $a_m$ is equal to that from $a_{m+1}$ to $a_{2m}$?

What if we restrict to two symbols, or $k$ symbols?

  • 0
    @Fixee: I don't -- disjoint events are the opposite of independent. If $A$ and $B$ are independent, observing $A$ gives you no information about $B$. If $A$ and $B$ are disjoint, observing $A$ gives you complete information about $B$ (namely that it didn't occur). In numbers, independence means $p(A\cap B)=p(A)p(B)$, whereas disjointness means $p(A\cap B)=0$ -- the two conditions are mutually exclusive (unless one of the events is certain not to occur).2012-10-24

2 Answers 2

6

I can't give an exact answer, but an interesting connection and some numerical results.

Some dabbling with short examples shows that the events for different integers $m$ are nearly but not exactly independent. For instance, some sequences that don't repeat at $m=3$ repeat at $m=2$ but not at $m=1$, whereas for sequences that do repeat at $m=3$ repetition at $m=2$ implies repetition at $m=1$. However, the effect from the dependence appears quite small, so

$ \prod_{m=1}^\infty(1-k^{-m}) $

should be a good approximation to the probability that there is no repetition. Incidentally, if $k$ is a prime power, this is the probability that a large random square matrix over $\mathbb F_k$ is invertible; see Probability that a random binary matrix is invertible. In that answer, I computed the product for $k=2$ to be approximately $0.288788$, which would make the probability for a binary sequence to exhibit repetition approximately $0.711212$.

Here are numerical results for $k=2$ for repetitions up to $m=17$. Each row contains the number of strings of length of $2m$ that repeat, the number of strings that don't repeat, the proportion $p_m$ of strings that repeat, and a value $\alpha_m=-\log_2(1-(1-p_m)/(1-p_{m-1}))$ that would be $m$ if repetition at $m$ were independent of previous repetitions (here's the code):

$ \begin{array}{rrrlr} m&\text{repeating}&\text{non-repeating}&\text{proportion}&\alpha_m\\\hline 1 & 2 & 2 & 0.5 & 1.00000\\ 2 & 10 & 6 & 0.625 & 2.00000\\ 3 & 44 & 20 & 0.6875 & 2.58496\\ 4 & 182 & 74 & 0.7109375 & 3.73697\\ 5 & 738 & 286 & 0.720703125 & 4.88753\\ 6 & 2972 & 1124 & 0.7255859375 & 5.83794\\ 7 & 11924 & 4460 & 0.7277832031 & 6.96450\\ 8 & 47768 & 17768 & 0.7288818359 & 7.95290\\ 9 & 191214 & 70930 & 0.7294235229 & 8.96725\\ 10 & 765136 & 283440 & 0.7296905518 & 9.98483\\ 11 & 3061104 & 1133200 & 0.7298240662 & 10.98340\\ 12 & 12245530 & 4531686 & 0.7298904657 & 11.99044\\ 13 & 48984342 & 18124522 & 0.7299235761 & 12.99397\\ 14 & 195941804 & 72493652 & 0.7299401015 & 13.99640\\ 15 & 783776080 & 289965744 & 0.7299483567 & 14.99761\\ 16 & 3135122038 & 1159845258 & 0.7299524820 & 15.99838\\ 17 & 12540523572 & 4639345612 & 0.7299545438 & 16.99901\\ \end{array} $

The agreement with $0.711212$ is OK but not marvelous. As suggested by the $m=3$ example, the dependence slightly increases the probability of repetition because a repetition implies correlations between the conditions of repetition at lower values of $m$; but this effect is strongest at $m=3$ and becomes negligible at higher values of $m$, where $\alpha_m\approx m$ shows that almost exactly the expected proportion of sequences repeat for the first time. Thus we can get an accurate estimate of the limit probability from $p\approx p_{17}+(p_{17}-p_{16})\approx0.7299566056$, which should be accurate to eight or nine digits.

I checked OEIS for the sequences of both the repeating and the non-repeating counts; no hit.

P.S.: Note that as t.b. pointed out in a comment under the answer linked to above, by the pentagonal number theorem the above product is given by

$ \begin{align} \prod_{m=1}^\infty(1-k^{-m})&=\sum_{j=-\infty}^\infty(-1)^jk^{-j(3j-1)/2)}\\&=1-k^{-1}-k^{-2}+k^{-5}+k^{-7}-k^{-12}-k^{-15}+k^{-22}+k^{-26}-\dotso\;, \end{align} $

which leads to an interesting pattern of digits in its representation in base $k$:

$ \begin{align} \prod_{m=1}^\infty(1-2^{-m})&=0.01001001111011100000010000111111110111110000000000100\ldots_2\;, \\ \prod_{m=1}^\infty(1-10^{-m})&=0.89001009999899900000010000999999998999990000000000100\ldots_{10}\;. \end{align} $

-2

(I'm not an expert on this (I'm a newbie), but I suddenly hit something while reading the question, it may be wrong)

Assuming $_m$ is a constant:

It will depend on how big $_m$ is. If the $_m = 1$, then the probability would be $\frac1{10}$. If $_m = 2$, then $\frac1{100}$. (ie if you had $19$ as the first two, there are $100$ possibilties of the second two, so the probability of $19$ coming up is $\frac1{100}$.

Therefore, the probablity is $\frac1{10^m}$

If $_m$ can be any number:

Then..... I don't know how to work that out. (Unless I suddenly have a brainwave...)

(This is my first answer, I like mathjax.)

  • 0
    @Jean-Sébastien, yeah I think you're right. However, if $_m$ can change, I see all sorts of compliacted stuff, and I don't think I can work that out.2012-10-23