1
$\begingroup$

I want to compute the gradient of the following function with respect to $\beta$

$L(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T \cdot \beta)^2$

Where $\beta$, $y_i$ and $x_i$ are vectors. The $\phi(x_i)$ simply adds additional coefficients, with the result that $\beta$ and $\phi(x_i)$ are both $\in \mathbb{R}^d$

Here is my approach so far:

\begin{align*} \frac{\partial}{\partial \beta} L(\beta) &= \sum_{i=1}^n ( \frac{\partial}{\partial \beta} y_i - \frac{\partial}{\partial \beta}( \phi(x_i)^T \cdot \beta))^2\\ &= \sum_{i=1}^n ( 0 - \frac{\partial}{\partial \beta}( \phi(x_i)^T \cdot \beta))^2\\ &= - \sum_{i=1}^n ( \partial \phi(x_i)^T \cdot \beta + \phi(x_i)^T \cdot \partial \beta))^2\\ &= - \sum_{i=1}^n ( 0 \cdot \beta + \phi(x_i)^T \cdot \textbf{I}))^2\\ &= - \sum_{i=1}^n ( \phi(x_i)^T \cdot \textbf{I}))^2\\ \end{align*}

But what to do with the power of two? Have I made any mistakes? Because $\phi(x_i)^T \cdot \textbf I$ seems to be $\in \mathbb{R}^{1 \times d}$

$= - 2 \sum_{i=1}^n \phi(x_i)^T\\$

  • 0
    @ChrisTaylor Sure, I added something to the end of my question.2012-05-04

1 Answers 1

0

Vector differentiation can be tricky when you're not used to it. One way to get around that is to use summation notation until you're confident enough to perform the derivatives without it.

To begin with, let's define $X_i=\phi(x_i)$ since it will save some typing, and let $X_{ni}$ be the $n$th component of the vector $X_i$.

Using summation notation, you have

$\begin{align} L(\beta) & = (y_i-X_{mi}\beta_m)(y_i - X_{ni}\beta_n) \\ & = y_i y_i - 2 y_i X_{mi} \beta_m + X_{mi}X_{ni}\beta_m\beta_n \end{align}$

To take the derivative with respect to $\beta_k$, you do

$\begin{align} \frac{\partial}{\partial\beta_k} L(\beta) & = -2 y_i X_{mi} \delta_{km} + 2 X_{mi} X_{ni} \beta_n \delta_{km} \\ & = -2 X_{mi} \delta_{km} (y_i - X_{ni} \beta_n) \\ & = -2 X_{ki} (y_i - X_{ni} \beta_n) \end{align}$

which you can then translate back into vector notation:

$ \frac{\partial}{\partial\beta} L(\beta) = -2 \sum_{i=1}^n X_i (y_i - X_i^T\beta) $

Does that help?

  • 0
    Glad to hear it!2012-05-04