This is too long for a comment to the answer by @joriki, but it's intended to fill in some details for those commenters who questioned whether the sum of squared probabilities is correct (it is).
For an alphabet of $k$ letters $\{a_1\ldots a_k\}$, we're given two random words on that alphabet: $X_1\ldots X_m$ and $Y_1\ldots Y_n,\ \ mIverson brackets:
$$E\bigg(\text{number of occurrences of }X_1\ldots X_m\text{ in }Y_1\ldots Y_n\bigg)\\
\begin{align}
&= E\bigg(\sum_{i=1}^{n-m+1}[Y_i\ldots Y_{i+m-1}=X_1\ldots X_m]\bigg)\\
&= \sum_{i=1}^{n-m+1}P(Y_i\ldots Y_{i+m-1}=X_1\ldots X_m)\\
&=(n-m+1)\ P(Y_1\ldots Y_m=X_1\ldots X_m)\\
&=(n-m+1)\ P\bigg(\bigcap_{i=1}^m \{Y_i=X_i\}\bigg)\\
&=(n-m+1)\prod_{i=1}^m P(Y_i=X_i)\\
&=(n-m+1)\big( P(Y_1=X_1)\big)^m\\
&=(n-m+1)\bigg( \sum_{i=1}^k P(\{Y_1=a_i\}\cap\{X_1=a_i\}\bigg)^m\\
&=(n-m+1)\bigg( \sum_{i=1}^k P(Y_1=a_i)P(X_1=a_i)\bigg)^m\\
&=(n-m+1)\bigg( \sum_{i=1}^k p_i^2\bigg)^m\\
\end{align}
$$
If the shorter word were fixed instead of random, say $x_1\ldots x_m$, then similar derivation gives
$$E\bigg(\text{number of occurrences of }x_1\ldots x_m\text{ in }Y_1\ldots Y_n\bigg)\\
\begin{align}
&=\qquad ...\\
&=(n-m+1)\prod_{i=1}^m P(Y_i=x_i)\\
&=(n-m+1)\prod_{i=1}^m p(x_i)\\
\end{align}
$$
where $p()$ is the probability mass function for the given distribution on the finite alphabet.