9
$\begingroup$

I have given a high dimensional input $x \in \mathbb{R}^m$ where $m$ is a big number. Linear regression can be applied, but in generel it is expected, that a lot of these dimensions are actually irrelevant.

I ought to find a method to model the function $y = f(x)$ and at the same time uncover which dimensions contribute to the output. The overall hint is to apply the $L_1$-norm Lasso regularization.

$$L^{\text lasso}(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T \beta)^2 + \lambda \sum_{j = 1}^k | \beta_j |$$

Minimizing $L^{\text lasso}$ is in general hard, for that reason I should apply gradient descent. My approach so far is the following:

In order to minimize the term, I chose to compute the gradient and set it $0$, i.e.

$$\frac{\partial}{\partial \beta} L^{\text lasso}(\beta) = -2 \sum_{i = 1}^n \phi(x_i)(y_i - \beta)$$

Since this cannot be computed simply, I want to apply the gradient descent here. The gradient descent takes a function $f(x)$ as input, so in place of that will be put the $L(\beta)$ function.

My problems are:

  • is the gradient I computed correct? I left out the regularizationt erm in the end, which I guess is a problem, but I also have problems to compute the gradient for this one

    In addition I may replace $| \beta_j |$ by the function $l(x)$, which is $l(x) = x - \varepsilon/2$ if $|x| \geq \epsilon$, else $x^2/(2 \varepsilon)$

  • one step in the gradient descent is: $\beta \leftarrow \beta - \alpha \cdot \frac{g}{|g|}$, where $g = \frac{\partial}{\partial \beta}f(x))^T$

    I don't get this one, when I transpose the gradient I have computed the term cannot be resolved due to dimension mismatch

  • 2
    1. Yes, your gradient should be over the regularization term as well. But, I guess you have already realized that the regularization term is not differentiable. So how does one go about it? Subgradient descent is a reasonably good choice. You will see a lot of discussion on this [here](http://www.di.ens.fr/~fbach/mlss08_fbach.pdf). 2. I don't understand your second problem.2012-05-04
  • 2
    Boyd and Parikh have recently published a monograph about proximal methods that explains techniques for solving non-differentiable convex optimization problems like this. For example, instead of using gradient descent one could use the proximal gradient method, which can handle the non-differentiable term.2013-11-30
  • 0
    I agree with littleO. Since $L^{\mathrm{lasso}}(\beta)$ is not differentiable, you should have a look at proximal gradient methods. Have also a look at this paper : http://bodonoghue.org/publications/adap_restart.pdf (especially at the "sparse regression" part).2014-01-30
  • 0
    If doing gradient descent, yes you have to do all terms. The partial derivatives of $L_1$ term should not be too difficult to calculate. One other possible way would be with iteratively reweighted least squares.2015-10-05
  • 0
    Yes you can use Huber to get a smooth approximation of L_1: https://en.wikipedia.org/wiki/Huber_loss2015-10-05
  • 0
    Many software packages have LASSO built in.2016-12-31

2 Answers 2