One can explain the ordering of the two mean first hitting times as follows.
The appearance of a word may be modeled through the sequence of the lengthes of the longest prefixes of this word achieved at each time. If the word to reach is U=HHHH
then the word W=HTTHTHHTTTTH
of your example yields the sequence $(0,1,0,0,1,0,1,2,0,0,0,0,1)$. If the word to reach is V=HTHT
then W
yields the sequence $(0,1,2,0,1,2,3,1,2,0,0,0,1)$.
In both cases the resulting sequence is a Markov chain on the state space $\{0,1,2,3,4\}$ and one asks for the mean hitting time of state $4$ starting from state $0$. The transition probabilities of the valid transitions are all $\frac12$. The transitions valid in both models are $0\to0$, $0\to1$, $1\to2$, $2\to0$, $2\to3$ and $3\to4$. The transitions valid in model U
only are $1\to0$ and $3\to0$. The transitions valid in model V
only are $1\to1$ and $3\to1$.
One can couple the two Markov chains in such a way that they both perform steps $n\to n+1$ simultaneously. Then the position of the V
Markov chain is at least as high as the position of the U
Markov chain at the same time. Hence the V
chain reaches $4$ sooner, in the mean.
Caveat: this coupling does not mean that the V
sequence of a given word is always at least as high as the U
sequence, only that one can find two processes $(U_n)_n$ and $(V_n)_n$ such that $(U_n)_n$ is distributed like the sequence of the lengthes of the prefixes of U
, $(V_n)_n$ is distributed like the sequence of the lengthes of the prefixes of V
, and $U_n\geqslant V_n$ almost surely, for every $n$.
A consequence of this simultaneous construction of the hitting times $T_U$ and $T_V$ is that, for every $t$, $\mathrm P(T_U\geqslant t)\leqslant\mathrm P(T_V\geqslant t)$. Hence $\mathrm E(T_V^k)\geqslant\mathrm E(T_U^k)$ for every nonnegative $k$ and, more generally, $\mathrm E(a(T_V))\geqslant\mathrm E(a(T_U))$, for every nondecreasing $a$.