1
$\begingroup$

I have a small problem when watching this video https://youtu.be/2pEkWk-LHmU?t=12m59s, on proving minimizing KL divergence is the same as maximizing ELBO.

The equation goes:

KL(q(z) || p(z|x)) = E_q[log(q(z))] - E_q[log(p(z|x))]

I know that p(z|x) = p(z,x)/p(x), so the later half should expand to

E_q[log[p(z,x)/p(x)]]
= E_q[log(p(z,x)) - log(p(x))]
= E_q[log(p(z,x))] - E_q[log(p(x))]

but in the video, the later half is shown as log(p(x)) with expectation sign dropped....why is it that the first term we can't drop but the second term we can??

1 Answers 1

1

As he states in the video,

"The expectation of $q$ goes away on this last term here because there's no $q$ here."

The expectation $\mathbb{E}_q$ is with respect to the randomness in $z$ which follows the distribution $q$. Since $\log p(x)$ has no $z$, it is deterministic/constant, so the expectation can be dropped.


Edit for clarification: throughout the derivation, $x$ is constant. However, $z$ is a random variable. In the derivation above, the density of $z$ is $q$. [Note that it is important to clarify this because $z$ can follow other distributions. For example, in the latent variable model the distribution of $z$ is $p$, not $q$.]

So, $\mathbb{E}_q[z]$ is just the expectation of $z$ when it follows the distribution $q$. More generally, for any function $f$, $\mathbb{E}_q[f(z)]$ is the expectation of $f(z)$ when $z$ follows the distribution $q$. For example, you begin with $\mathbb{E}_q[\log p(z \mid x)]$ which is a special case where $f(z):= \log p(z \mid x)$.

Now, if $c$ is some constant (deterministic, does not depend on the random variable $z$), then $\mathbb{E}_q[c]=c$. This is the case here with $c=\log p(x)$; since $x$ is constant, $\log p(x)$ is a constant.

  • 0
    how come a distribution has some respect to randomness of a variable? (sorry...)2017-01-03
  • 0
    @AllenNie See my edit2017-01-03
  • 0
    Thank you for the clarification! so basically $E_q[z] = \int_z q*z$?2017-01-03
  • 0
    @AllenNie I am not familiar with your notation, but I think that is right. I would write it as $\mathbb{E}_q[z] = \int z \cdot q(z) \mathop{dz}$, where the integral ranges over all possible values $z$ can take.2017-01-03