In order to learn HMM thoroughly, I am implementing (in Matlab) the various algorithms for the basic questions of HMM. I've implemented the Viterbi, posterior-decoding, and the forward-backward algorithms successfully, but I have one question regarding the Baum-Welch algorithm for the estimation of the HMM parameters.
In the classic paper by Rabiner, the re-estimation of the transition probabilities matrix is given in equation (95), in terms of the scaled forward ($\hat\alpha$) and backward ($\hat\beta$) variables. The numerator is $ \sum_{t=1}^{T-1}\hat\alpha_t(i)a_{ij}b_j(O_{t+1})\hat\beta_{t+1}(j) $ where $a$ is the transition matrix, $b$ the observation matrix, and $O$ the observation sequence.
However, in this HMM project guide, section 4.4, (also by Rabiner), as well is in the implementation in Matlab's function hmmtrain.m
(from the statistics toolbox), there is an extra factor of $1/c_{t+1}$ in the numerator, where $c_t$ is the scaling factor of time step $t$. I followed the algebra of the definition of the re-estimation of $a$, and I still fail to understand where this factor is coming from. Any help is appreciated.