I want to compute the gradient for the following function:
$L(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T \cdot \beta)^2 + \sum_{j = 1}^k l(\beta_j)$
where $l(\beta_j) = \begin{cases} \beta_j - \varepsilon/2 & \textbf{if } |\beta_j| \geq \varepsilon\\ \beta_j^2 / (2\varepsilon) & \textbf{if } |\beta_j| < \varepsilon\\ \end{cases}$
Here $y$, $\beta$ and $\phi(x_i)$ is a vector $\in \mathbb{R}^n$ and $\varepsilon$ is just a small number, like $0.00001$
Computing the gradient for the first part of the term yielded no problem:
$ \frac{\partial}{\partial \beta} \sum_{i=1}^n (y_i - \phi(x_i)^T \cdot \beta)^2 = -2 \sum_{i=1}^n - \phi(x_i) \cdot (y_i - \phi(x_i)^T \cdot \beta)$
But I have no idea for the second term which is the sum of the function $l(\beta)$ applied to each component of the vector $\beta$, since it does not contain the whole vector anymore, but its single components plus the additional function confuses me. How can I proceed here?
Or is it that easy, that I just compute the derivation of it like this:
$\frac{\partial}{\partial \beta_j} l(\beta_j) = \begin{cases} 1 & \textbf{if } |\beta_j| \geq \varepsilon\\ \beta_j / \varepsilon & \textbf{if } |\beta_j| < \varepsilon\\ \end{cases}$
But then I still see the problem that the gradient of the first term results into a vector, but the second does not.