1
$\begingroup$

I want to compute the gradient of the following function with respect to $\beta$

$$L(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T \cdot \beta)^2$$

Where $\beta$, $y_i$ and $x_i$ are vectors. The $\phi(x_i)$ simply adds additional coefficients, with the result that $\beta$ and $\phi(x_i)$ are both $\in \mathbb{R}^d$

Here is my approach so far:

\begin{align*} \frac{\partial}{\partial \beta} L(\beta) &= \sum_{i=1}^n ( \frac{\partial}{\partial \beta} y_i - \frac{\partial}{\partial \beta}( \phi(x_i)^T \cdot \beta))^2\\ &= \sum_{i=1}^n ( 0 - \frac{\partial}{\partial \beta}( \phi(x_i)^T \cdot \beta))^2\\ &= - \sum_{i=1}^n ( \partial \phi(x_i)^T \cdot \beta + \phi(x_i)^T \cdot \partial \beta))^2\\ &= - \sum_{i=1}^n ( 0 \cdot \beta + \phi(x_i)^T \cdot \textbf{I}))^2\\ &= - \sum_{i=1}^n ( \phi(x_i)^T \cdot \textbf{I}))^2\\ \end{align*}

But what to do with the power of two? Have I made any mistakes? Because $\phi(x_i)^T \cdot \textbf I$ seems to be $\in \mathbb{R}^{1 \times d}$

$$= - 2 \sum_{i=1}^n \phi(x_i)^T\\$$

  • 0
    Do you know about the [chain rule](http://en.wikipedia.org/wiki/Chain_rule)?2012-05-04
  • 0
    @ChrisTaylor Sure, I added something to the end of my question.2012-05-04

1 Answers 1

0

Vector differentiation can be tricky when you're not used to it. One way to get around that is to use summation notation until you're confident enough to perform the derivatives without it.

To begin with, let's define $X_i=\phi(x_i)$ since it will save some typing, and let $X_{ni}$ be the $n$th component of the vector $X_i$.

Using summation notation, you have

$$\begin{align} L(\beta) & = (y_i-X_{mi}\beta_m)(y_i - X_{ni}\beta_n) \\ & = y_i y_i - 2 y_i X_{mi} \beta_m + X_{mi}X_{ni}\beta_m\beta_n \end{align}$$

To take the derivative with respect to $\beta_k$, you do

$$\begin{align} \frac{\partial}{\partial\beta_k} L(\beta) & = -2 y_i X_{mi} \delta_{km} + 2 X_{mi} X_{ni} \beta_n \delta_{km} \\ & = -2 X_{mi} \delta_{km} (y_i - X_{ni} \beta_n) \\ & = -2 X_{ki} (y_i - X_{ni} \beta_n) \end{align}$$

which you can then translate back into vector notation:

$$ \frac{\partial}{\partial\beta} L(\beta) = -2 \sum_{i=1}^n X_i (y_i - X_i^T\beta) $$

Does that help?

  • 0
    Yes indeed, thanks for the explanation in an alternative notation, this will help me in the future a lot.2012-05-04
  • 0
    I have redone the derivation in my original notation, and I found it quite easy, by just computing the derivative of $(y - x^T \cdot \beta) = -x$ and then applying the product rule which gets me to the same result as your final line.2012-05-04
  • 0
    Glad to hear it!2012-05-04