2
$\begingroup$

I tried to solve this problem, and I used something similar to the probability of the Bernoulli distribution to find how many tosses you need to obtain a probability of almost one for one occurrence. Then I just divide the number of tosses in the problem statement by the number of tosses I get.

Is there a better way to do this?

Edit: What I tried is

m = $12$

N = $10^6$

Probability of having X = "m heads followed by m tails" is $\frac{1}{2^m}$

Probability of X at exactly at position i is $1/2^m*(1-1/2^m)^i$

Probability of X after p tries is $\sum_{i=0}^{p}1/2^m*(1-1/2^m)^i =1-(1-1/2^m)^{p+1}$

Now when p is big enough the probability is 1. And then I can just divide . But it depends what you define as big enough and lead to speculative results. There must be a solution based on statistics but It's been a while and I forgot quite a bit.

  • 0
    You might find this article about the distribution of the length of the longest run in coin tossing of interest: http://mathdl.maa.org/images/upload_library/22/Polya/07468342.di020742.02p0021g.pdf2012-02-19

2 Answers 2

4

Well, your question suggest that you want to calculate the expected value of the number of substrings of the form $h^6t^6$ in your million-character string. However your calculations seem like trying to calculate the expected value of an indicator saying if there is a $h^6t^6$ substring (one or more!) in the million-character string.

If I were to answer your question, then I would repeat the hints given above: the probability of $h^6t^6$ at $i^{th}$ position is $1/2^m$, so the expected value of such an indicator is $1/2^m$ and the result is (by linearity) the sum of expected values of all the indicators ($10^6-m+1$ of them).

The problem you are trying to solve is a little bit harder. First, you shouldn't write things like:

Probability of X at exactly at position i is $1/2^m*(1-1/2^m)^i$

because the events "no $h^6t^6$ at $k$" and "$h^6t^6$ at $k+1$" aren't independent, so you can't multiply their probabilities to compute the probability of the intersection.

If the numbers were smaller, then we could use the inclusion-exclusion principle to get a precise answer. In this case I can think of a Markov chain with states $nothing, h, hh, hhh, hhhh, h^5, h^{6+}, h^6t, h^6t^2, h^6t^3, h^6t^4, h^6t^5, success$. You can prepare a 13 x 13 transition matrix $P$ (it is not hard since it is very regular and there are mainly zeros in it, some 1/2 and one 1) and use a computer to compute it's $10^{6th}$ power $R$, multiply $e_1^T \cdot R$ and then the last coordinate of the result vector is the answer. Maybe it is possible to avoid using a computer here because of the regularity of $P$, but I don't feel like checking it now :)

0

Let's take a simpler example to get the gist:

flip m times (m = 5), how many times can we get a 2H followed by 2T (n = 2 + 2 = 4). If you think about it, this can only happen in two circumstances, which are: HHTT_ or _HHTT (where _ could either be a H or a T and it would not matter). This can be deduced by m - n + 1. The Expected number of times that this could happen is the E(x) = m-n+1 * prob(HHTT) = 2/(2^4).

Similarly for m = 1000000 and n = 12 we have E(x) = (1000000-11)/(2^12)