3
$\begingroup$

I want to comprehend the derivative of the cost function in linear regression involving Ridge regularization, the equation is:

$L^{\text{Ridge}}(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T\beta)^2 + \lambda \sum_{j=1}^k \beta_j^2$

Where the sum of squares can be rewritten as:

$L^{}(\beta) = ||y-X\beta||^2 + \lambda \sum_{j=1}^k \beta_j^2$

For finding the optimum its derivative is set to zero, which leads to this solution:

$\beta^{\text{Ridge}} = (X^TX + \lambda I)^{-1} X^T y$


Now I would like to understand this and try to derive it myself, heres what I got:

Since $||x||^2 = x^Tx$ and $\frac{\partial}{\partial x} [x^Tx] = 2x^T$ this can be applied by using the chain rule:

\begin{align*} \frac{\partial}{\partial \beta} L^{\text{Ridge}}(\beta) = 0^T &= -2(y - X \beta)^TX + 2 \lambda I\\ 0 &= -2(y - X \beta) X^T + 2 \lambda I\\ 0 &= -2X^Ty + 2X^TX\beta + 2 \lambda I\\ 0 &= -X^Ty + X^TX\beta + 2 \lambda I\\ &= X^TX\beta + 2 \lambda I\\ (X^TX + \lambda I)^{-1} X^Ty &= \beta \end{align*}

Where I strugle is the next-to-last equation, I multiply it with $(X^TX + \lambda I)^{-1}$ and I don't think that leads to a correct equation.

What have I done wrong?

1 Answers 1

3

You have differentiated $L$ incorrectly, specifically the $\lambda ||\beta||^2$ term. The correct expression is: $\frac{\partial L(\beta)}{\partial \beta} = 2(( X \beta - y)^T X + \lambda \beta^T)$, from which the desired result follows by equating to zero and taking transposes.

  • 0
    ah I see that now, thanks2012-07-02