Suppose I have 2,096,896 randomly generated letters (they were actually derived from pi). How can I calculate the probability that a word of length N will appear? I took discrete math a few years ago, but I am a bit rusty...could someone help me out here? Thanks! :)
Calculating probability, given any word of length N?
1 Answers
Note: As @Henry noted below, this answer is wrong. I confused conditional independence with regular independence. The probability that you draw a word starting at index i is related to the probability that you draw a word starting at i+1.
Assuming you have $m$ different letters (with equal probability), the probability of generating a length $n$ word from the first letter is $p = \left(\frac{1}{m}\right)^n$. Each trial is independent, and you have to get the letters exactly right, so you can just multiply the probabilities.
You can then simplify your question by asking for each index, what is the probability that I will get my word from this index onwards? This is the same as above, of course, although it only works for $2,096,896 - n + 1$ places, because after that there's no more space for the full word.
So now you have $2,096,896 - n + 1$ independent trials. The easiest approach is to ask what the probability is that the word doesn't appear, which is $1-p$ for each index, and $(1-p)^{2,096,896 - n + 1}$. Subtract this from $1$ and you have the probability that your word appears once or more.
If you want to actually compute these values, you may want to take their logarithms, as they can go to zero very quickly.
-
0You're absolutely right. I've edited my answer. – 2012-03-15