2
$\begingroup$

I have two random strings over an $N$-letter alphabet: one is a shorter $M$-letter string, and one is a longer $L$-letter string. Assuming that two or more instances of the shorter string can overlap, what is the probability, in general, that the $M$-letter string appears $k$ times in the $L$-letter string?

To clarify the comment about overlapping short strings, consider two strings over a binary alphabet: a longer string "000000", and a shorter string "00000". Here we would say that there are two overlapping instances of the short string in the longer string, with a four-character overlap.

  • 2
    @Mark: Yes: the expected number of times the word $w$ appears is the sum over the positions $i$ of the probability $p(w,i)$ that $w$ appears beginning at $i$. The sequence is stationary hence $p(w,i)=p(w)$ does not depend on $i$. The letters are i.i.d. and uniform hence $p(w)=N^{-|w|}$ where $|w|$ is the length of $w$. QED. (Remember: the expectation of a sum is the sum of the expectations, even for dependent random variables.)2012-06-21

0 Answers 0