0
$\begingroup$

I am trying to derive the conditional distribution of the visible variables, $\rho(v_i^k| h_{1:F})$, for the Replicated Softmax Model (RSM) or equivalently, the Restricted Boltzmann Machine (RBM) for word counts, according to the paper: "Replicated Softmax: an Undirected Topic Model" by Salakhutdinov and Hinton.

Paper can be found at: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=B04C8D67D381B8106FF6FA4203A86264?doi=10.1.1.164.71&rep=rep1&type=pdf

However, despite all efforts, I've been unable to get how the conditional can turn out to be a softmax distribtution:
$\rho(v_i^k| h_{1:F}) = \frac{\exp(b_i^k + \sum_{j=1}^F h_j W_{i,j}^k)}{\sum_{q=1}^K \exp(b_i^q + \sum_{j=1}^F h_j W_{i,j}^q)}$

Also, I'm confused if $W_{i,j}^k$ is a 3D matrix and $b_i^k$ a 2D matrix or is it instead a 2D matrix and vector respectively. I believe it is the latter. Hoping someone can demonstrate the derivations.

1 Answers 1

0

To demonstrate the derivation, we start from the full conditional distribution: \begin{equation}\begin{split} \rho(\mathbf{v}|\mathbf{h}) & = \frac{\rho(\mathbf{v},\mathbf{h})}{\sum_{\mathbf{v}}\rho(\mathbf{v},\mathbf{h})} \\ & = \frac{\frac{1}{Z}\exp(\sum_jh_ja_j)\exp\big(\sum_kv^k(b_i^k + \sum_jh_jW_{i,j}^k)\big)} {\frac{1}{Z}\exp(\sum_jh_ja_j)\sum_{\mathbf{v}}\exp\big(\sum_kv^k(b_i^k + \sum_jh_jW_{i,j}^k)\big)} \end{split}\end{equation}
Here, the sub scripts, $i$, are dropped from the model parameters to reduce clutter. Which is fine because they in fact, index the same parameters. \begin{equation}\begin{split} \rho(\mathbf{v}|\mathbf{h}) & = \frac{\exp\big(\sum_k\sum_iv_i^k(b^k + \sum_jh_jW_{j}^k)\big)} {\sum_{\mathbf{v}}\exp\big(\sum_k\sum_iv_i^k(b^k + \sum_jh_jW_{j}^k)\big)} \\ & = \frac{\exp\big(\sum_k\sum_iv_i^k(b^k + \sum_jh_jW_{j}^k)\big)} {\sum_{\mathbf{v}}\exp\big(\sum_k\sum_iv_i^k(b^k + \sum_jh_jW_{j}^k)\big)} \\ & = \prod\limits_{i=1}^D \frac{\prod_{k=1}^K\exp\big(v_i^k(b^k + \sum_jh_jW_{j}^k)\big)} {\sum_{v_i}\prod_{k=1}^K\exp\big(v_i^k(b^k + \sum_jh_jW_{j}^k)\big)} \end{split}\end{equation}

This is the part which confused me. $v_i^k$, is a multinomial r.v. with a single trial or similarly, a categorical r.v. Therefore the sum of the denominator would be:
\begin{equation}\begin{split} \sum_{v_i}\prod_{k=1}^K\exp\big(v_i^k(b^k + \sum_jh_jW_{j}^k)\big) &= \exp(b^1 + \sum_jh_jW_{j}^1) + \exp(b^2 + \sum_jh_jW_{j}^2)+ \\ & \ \ \ \ \dots + \exp(b^K + \sum_jh_jW_{j}^K) \\ & = \sum_{q=1}^K\exp\big(b^k + \sum_jh_jW_{j}^k\big) \end{split}\end{equation} Hence, $\rho(\mathbf{v}|\mathbf{h})$, would be: \begin{equation}\begin{split} \rho(\mathbf{v}|\mathbf{h}) & = \prod\limits_{i=1}^D\prod\limits_{k=1}^K \frac{\exp\big(v_i^k(b^k + \sum_jh_jW_{j}^k)\big)} {\sum_{q=1}^K\exp\big(b^q + \sum_jh_jW_{j}^q\big)} \end{split}\end{equation} Lastly, for each i and k: \begin{equation}\begin{split} \rho(v_i^k|\mathbf{h}) & = \frac{\exp\big(v_i^k(b^k + \sum_jh_jW_{j}^k)\big)} {\sum_{q=1}^K\exp\big(b^q + \sum_jh_jW_{j}^q\big)} \end{split}\end{equation}