1
$\begingroup$

I want to compute the gradient for the following function:

$L(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T \cdot \beta)^2 + \sum_{j = 1}^k l(\beta_j)$

where $l(\beta_j) = \begin{cases} \beta_j - \varepsilon/2 & \textbf{if } |\beta_j| \geq \varepsilon\\ \beta_j^2 / (2\varepsilon) & \textbf{if } |\beta_j| < \varepsilon\\ \end{cases}$

Here $y$, $\beta$ and $\phi(x_i)$ is a vector $\in \mathbb{R}^n$ and $\varepsilon$ is just a small number, like $0.00001$

Computing the gradient for the first part of the term yielded no problem:

$ \frac{\partial}{\partial \beta} \sum_{i=1}^n (y_i - \phi(x_i)^T \cdot \beta)^2 = -2 \sum_{i=1}^n - \phi(x_i) \cdot (y_i - \phi(x_i)^T \cdot \beta)$

But I have no idea for the second term which is the sum of the function $l(\beta)$ applied to each component of the vector $\beta$, since it does not contain the whole vector anymore, but its single components plus the additional function confuses me. How can I proceed here?


Or is it that easy, that I just compute the derivation of it like this:

$\frac{\partial}{\partial \beta_j} l(\beta_j) = \begin{cases} 1 & \textbf{if } |\beta_j| \geq \varepsilon\\ \beta_j / \varepsilon & \textbf{if } |\beta_j| < \varepsilon\\ \end{cases}$

But then I still see the problem that the gradient of the first term results into a vector, but the second does not.

1 Answers 1

1

This function doesn't have a gradient everywhere, since it's discontinuous (and thus not differentiable) whenever one of the $\beta_j$ is $-\epsilon$. Perhaps you omitted some absolute value signs in the definition of $l$?

At all points with $\beta_m\ne\epsilon$ for all $m\le k$, the function has a gradient, and you can indeed determine it using the calculation in the second part of the question. I don't understand your statement that the gradient of the second term doesn't result in a vector. The gradient is by definition a vector; it cannot not result in a vector. You've calculated the derivatives $\frac{\partial}{\partial\beta_m} l(\beta_m)$; the derivatives $\frac{\partial}{\partial\beta_m} l(\beta_j)$ for $j\ne m$ are trivially zero. The sum in the second term contains a term $l(\beta_m)$ if and only if $m\le k$. Thus the components of the gradient of the second term are

$ \frac{\partial}{\partial\beta_m}\sum_{j=1}^kl(\beta_j)= \begin{cases} 0&m\gt k\;,\\ 1&m\le k\;,|\beta_m|\ge\epsilon\;,\\ \beta_m/\epsilon&m\le k\;,|\beta_m|\lt\epsilon\;. \end{cases} $

If this seems slightly confusing, I suggest you first treat the easier case $k=n$, where the sum contains one term for each component of $\beta$, and then think about what happens when $k\lt n$ and some of the terms are missing.

  • 0
    @Mahoni: You're welcome! And no worries.2012-05-06