5
$\begingroup$

In order to learn HMM thoroughly, I am implementing (in Matlab) the various algorithms for the basic questions of HMM. I've implemented the Viterbi, posterior-decoding, and the forward-backward algorithms successfully, but I have one question regarding the Baum-Welch algorithm for the estimation of the HMM parameters.

In the classic paper by Rabiner, the re-estimation of the transition probabilities matrix is given in equation (95), in terms of the scaled forward ($\hat\alpha$) and backward ($\hat\beta$) variables. The numerator is $ \sum_{t=1}^{T-1}\hat\alpha_t(i)a_{ij}b_j(O_{t+1})\hat\beta_{t+1}(j) $ where $a$ is the transition matrix, $b$ the observation matrix, and $O$ the observation sequence.

However, in this HMM project guide, section 4.4, (also by Rabiner), as well is in the implementation in Matlab's function hmmtrain.m (from the statistics toolbox), there is an extra factor of $1/c_{t+1}$ in the numerator, where $c_t$ is the scaling factor of time step $t$. I followed the algebra of the definition of the re-estimation of $a$, and I still fail to understand where this factor is coming from. Any help is appreciated.

  • 0
    @ Didier Piau: No, since there is summation over $t$, and the factor depends on $t$ so it is not simply cancelled out.2012-01-05

1 Answers 1

1

The answer is that in Rabiner's paper, $\hat\beta_t$ is scaled with $c_t$, wile in the implementation I was looking at it is scaled with $c_{t+1}$. This introduces a "shift" in the scaling time index, so a factor of $c_{t+1}$ must be introduced in order to get the $C_tD_{t+1}$ term which you can cancel from both the numerator and the denominator, since it is not dependent on $t$.

With this scaling convention, equation (97) in Rabiner's paper becomes $ \hat\beta_{t+1}(j)=\left[\prod_{s=t+2}^Tc_s\right]\beta_{t+1}(j)=D_{t+2}\beta_{t+1}(j)=D_{t+1}\beta_{t+1}(j)/c_{t+1} $