0
$\begingroup$

enter image description here

According to the post the second term disappeared because the expected gradient is not affected by the baseline b but since the expectation is not with respect to b, I wonder why that will play a role in disappearing the second term. Instead I think it is because expectation is taken over the same distribution twice and since the first expectation will lead to a number the second one will lead it to 0.

Is my reasoning correct?

  • 0
    I'm not familiar with the subject so going to need a bit more information, we are taking the expectation with respect to the distribution given by $p_{\theta}(\tau)$? Is that a distribution over $\tau$? Or a distribution over $X$ parameterised by $\theta(\tau)$? And $\pi$ is another density over $x$?2017-02-25

1 Answers 1

0

I think you misread the statement. If the gradient is not affected by the baseline, the following derivation with respect to b should be zero.

But it is difficult to say from this snippet, so treat my answer with care.

Anyway, taking expectation value twice yields the same result as taking it only once, since the expectation value is already stripped of its dependency of the variable that was used for expectation calculation. Example: The expectation value of a normal 6-sided dice is 3.5, and taking the expectation value of the number 3.5 yields 3.5 again.