1
$\begingroup$

I'm working on a problem to find the expected frequency of a pattern. Say there is a sequence of alphabets - A, B, C and D.

The sequence is: ABDACDBADA.

I want to find the expected frequency of a pattern ACD given the sequence above. So I calculated the frequencies of A (0.4), B(0.2), C(0.1) and D(0.3) separately.

Initially I thought, multiplying the frequencies of A, C and D would suffice, i.e., 0.4 * 0.1 * 0.3 = 0.012. But, this is not what i need as I need to conserve the order of ACD.

Can anyone tell me how to proceed with this?

Thanks!!

  • 0
    Basically, to make it short I just need know to the probability of the pattern, 'ACD' from the given frequencies of each alphabet.2012-08-27

2 Answers 2

1

If A,C,D have probabilities $4/10$, $1/10$, $3/10$ respectively of appearing in any position, independent of what appears elsewhere, then any given triple of distinct positions has probability $.4 \times .1 \times .3 = .012$ of getting ACD (in that order).

0

Assume that the letters $A_i$ $\ (1\leq i\leq m)$ have given a-priori probabilities $p_i\,$, and that at each position of the string one of these letters appears with the given probability and independently of everything else.

Now let a keyword $w:=A_{i_1}A_{i_2}\ldots A_{i_r}$ be given (repetitions allowed; in your case $w:=\,$ACD), and consider a random string of $N$ letters. The probability that $w$ appears in this string starting at a given position $k$ between $1$ and $N-r+1$ inclusive computes to $p=\prod_{\ell=1}^r p_{i_\ell}\ .$ In particular the expected number of appearances of this word at this particular place is $p$. Since this word may start at $N-r+1$ positions in all, by linearity of expectation the expected total number of appearances of $w$ is given by $E=(N-r+1) p\ .$