0
$\begingroup$

So, I am trying to take the derivative of the following equation, because it is needed in an optimization problem. I want to make sure I am on the right track. The equation is:

$$ -3 \mathbb E[(w^Tz)^2]^2 $$

So my question is, what is:

$$ \frac{\delta (-3 \mathbb E[(w^Tz)^2]^2)}{\delta w} = ? $$

Please assume here that $w$ is a 2-dimensional column vector, just like $z$. $z$ is also a zero mean, unit variance (joint) random variable. ($w$ is a deterministic vector).

I would like a break down of the steps for evaluating the derivative here - I half syspect the chain rule is involved, however I am getting thrown off by the presence of the expectation operator.

Thanks!

1 Answers 1

5

Notice that $(w^T z)^2 = w^T z z^T w = w^T (z z^T) w$. Thus $f(w) = E_z (w^T z)^2 = w^T E_z (z z^T) w$. The derivative is given by $\frac{\partial f(w)}{\partial w} = 2 w^T E_z (z z^T)$. Just as a reminder, $\frac{\partial (w^TAw)}{\partial w} = (A+A^T)w$. This becomes $2Aw$ when $A$ is symmetric.

You wished to compute the derivative of $\phi(w) = -3 f(w)^2$. This can be computed using the usual calculus rules as $\frac{\partial \phi(w)}{\partial w} = -6 f(w) \frac{\partial f(w)}{\partial w} = -12 \, (w^T E_z (z z^T) w) \, w^T E_z (z z^T)$.

Now $E_z(zz^T) = I$ because $z$ is zero mean with covariance $I$. Hence, the final answer is: $-12 \, (w^T w) \, w^T = -12||w||_2^2 w^T$, where $||w||_2$ is the $L2$-norm of $w$.

  • 0
    +1. And $\phi(w)$ is simply $\phi(w)=-3(w^Tw)^2=-3\|w\|_2^4$.2012-08-11
  • 0
    Thanks @Kartik Audhkhasi for the additional contribution. I missed the unit variance statement in the question.2012-08-11
  • 0
    Thanks you copperhat and @Kartik Audhkhasi for that, it is in fact the correct answer. I see that what you did here is 'open up' $f(w)$ a little before taking the derivative. What I am not clear about is, how do we generally treat the expectation operator in relation to derivatives? Would it have been wrong to move the $\frac{\delta }{\delta w}$ inside the expectation and go from there? When might it be ok or not ok to do so? Thank you.2012-08-11
  • 0
    For some more general conditions under which you can exchange differentiation and expectation, look at Qiaochu Yuan's answer here http://math.stackexchange.com/questions/12909/will-moving-differentiation-from-inside-to-outside-an-integral-change-the-resu2012-08-11
  • 0
    @copper.hat I took a look but did not quite get it. Let me phrase it another way. Why didn't you take the derivative operator inside the expectation in this example? Is it because the outside squaring makes it non linear?2012-08-12
  • 0
    Not really, it was just simpler to eliminate the integral completely, reducing the problem to a simpler one. If I let $\phi(w,z) = (w^T z)^2$, then the link above shows that $\frac{\partial E_z(\phi(w,z))}{\partial w} = E_z(\frac{\partial \phi(w,z)}{\partial w})$. The inside 'squaring' remains, the outside 'squaring' is dealt with using the usual product rule. Both squarings resulting in non-linearities.2012-08-12