I have given a high dimensional input $x \in \mathbb{R}^m$ where $m$ is a big number. Linear regression can be applied, but in generel it is expected, that a lot of these dimensions are actually irrelevant.
I ought to find a method to model the function $y = f(x)$ and at the same time uncover which dimensions contribute to the output. The overall hint is to apply the $L_1$-norm Lasso regularization.
$$L^{\text lasso}(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T \beta)^2 + \lambda \sum_{j = 1}^k | \beta_j |$$
Minimizing $L^{\text lasso}$ is in general hard, for that reason I should apply gradient descent. My approach so far is the following:
In order to minimize the term, I chose to compute the gradient and set it $0$, i.e.
$$\frac{\partial}{\partial \beta} L^{\text lasso}(\beta) = -2 \sum_{i = 1}^n \phi(x_i)(y_i - \beta)$$
Since this cannot be computed simply, I want to apply the gradient descent here. The gradient descent takes a function $f(x)$ as input, so in place of that will be put the $L(\beta)$ function.
My problems are:
- is the gradient I computed correct? I left out the regularizationt erm in the end, which I guess is a problem, but I also have problems to compute the gradient for this one - In addition I may replace $| \beta_j |$ by the function $l(x)$, which is $l(x) = x - \varepsilon/2$ if $|x| \geq \epsilon$, else $x^2/(2 \varepsilon)$ 
- one step in the gradient descent is: $\beta \leftarrow \beta - \alpha \cdot \frac{g}{|g|}$, where $g = \frac{\partial}{\partial \beta}f(x))^T$ - I don't get this one, when I transpose the gradient I have computed the term cannot be resolved due to dimension mismatch 
