1
$\begingroup$

Consider the following sparse optimization problem:

$\min\{\|x\|_1\,:\,\|Ax-b\|_2\leq\delta,\,x\in\mathbb{R}^n\}$

This problem is equivalent to some other versions of LASSO, according to: http://www.math.ucla.edu/~wotaoyin/summer2013/slides/Lec02_BasicSparseOptimizationModels.pdf

I want to show that the dual problem is:

$\max\{\langle b,y\rangle-\delta\|y\|\,:\,\|A^Ty\|_{\infty}\leq 1,\,y\in\mathbb{R}^m\}$

I tried to write a Lagrangian (using Lagrange multiplier times the constraint $\|Ax-b\|_2-\delta\leq 0$), but it seems to not lead in the correct direction.\ I found an article that proves some theorem about the uniqueness of solution for this problem (in http://link.springer.com/article/10.1007/s10957-014-0581-z), but I think their proof is much more abstract than the proof I'm looking for. Any other suggestions?

  • 0
    Does [this paper](http://www.optimization-online.org/DB_FILE/2016/09/5638.pdf) help?2017-01-14

3 Answers 3

1

It helps to know that $\inf_u f(u) = - \sup_u \left(-f(u) \right)$. We use that a lot when deriving dual problems. Also, look for conjugate functions to appear when forming dual problems. Another tip is to sometimes reformulate the primal problem a bit, perhaps introducing a new variable, before deriving the dual problem. The example below is pretty typical.

First rewrite your problem as \begin{align} \operatorname{minimize}_{x,y} & \quad \| x \|_1 \\ \text{subject to} & \quad \| y - b \|_2 \leq \delta \\ & \quad y = Ax. \end{align} (We have introduced a new variable $y$ which is constrained to be equal to $Ax$.)

The Lagrangian is \begin{align} L(x,y,w,z) &= \| x \|_1 + \langle z, Ax - y \rangle + w \| y - b \|_2 - w \delta \\ &= \| x \|_1 - \langle -A^T z, x \rangle - \langle z, y \rangle + w \| y - b \|_2 - w \delta. \end{align} The dual function is \begin{align} G(w,z) &= \inf_{x,y} \quad L(x,y,w,z) \\ &= -w \delta + \left(\inf_x \, \| x \|_1 - \langle -A^T z, x \rangle \right) + \inf_y \, \left( -\langle z, y \rangle + w \| y - b \|_2 \right) \\ &= -w \delta - \left( \sup_x \, \langle -A^T z, x \rangle - \| x \|_1 \right) - \sup_y \, \langle z, y \rangle - w \| y - b \|_2 . \end{align}

In the first supremum, we're evaluating the conjugate of the $\ell_1$-norm, which is the indicator function of the $\ell_\infty$-norm unit ball. In the second supremum, we're evaluating the conjugate of the $\ell_2$-norm, which is the indicator function of the $\ell_2$-norm unit ball. So it should be straightforward (if a bit messy) to finish the calculation from here.

3

This is where you normally pull out the Fenchel-Rockafellar duality theorem (FRDT) and solve the problem in one line. The previous answers by Michael Grant and littleO (BTW, they're wonderful answers, self-contained, and in some sense much better than mine :) ) have more or less re-derived this theorem from first principles.

Let $A$ be an $m$-by-$n$ matrix. Your problem can be written as

$$ \underset{x \in \mathbb R^n}{\text{minimize }}f(x) + g(Ax), $$

where $f = \ell_1$-norm on $\mathbb R^n$ and $g = i_{\{z \in \mathbb R^m\text{ s.t }\|z-b\|_2 \le \delta\}}$.

Exercise: Show that $f^* = i_{\{x \in \mathbb R^n \text{ s.t }\|x\|_\infty \le 1\}}$ and $g^* = \delta\|.\|_2 - \langle b,.\rangle$, where $$ f^*(x) := \sup_{y}x^Ty-f(y), $$ defines the convex conjugate of $f$.

Now, by the FRDT (it's easy to check that sufficient constraint qualification conditions are verified), we have

$$ \begin{split} \min_{x \in \mathbb R^n,\;\|Ax-b\|_2\le\delta}\|x\|_1 &= \min_{x \in \mathbb R^n}f(x) + g(Ax)\\ &= -\max_{z \in \mathbb R^m}-f^*(-A^Tz) - g^*(z) \\ &= -\max_{z \in \mathbb R^m}-i_{\|-A^Tz\|_\infty \le 1} + b^Tz - \delta \|z\|_2 \\ &= -\max_{z\in \mathbb R^m,\; \|A^Tz\|_\infty \le 1}b^Tz - \delta\|z\|_2. \end{split} $$

Thus the dual problem is to maximize $b^Tz - \delta\|z\|_2$ over the polyhedron $$ \{z \in \mathbb R^m \text{ s.t }\|A^Tz\|_\infty \le 1\}. $$

  • 1
    Nothing wrong with having multiple approaches, and I did concede that my approach is likely strange ;-)2017-01-15
1

I'm kind of strange but I prefer to use a conic approach. Rewrite the original as \begin{array}{ll} \text{minimize} & t \\ \text{subject to} & \|x\|_1 \leq t \\ & \|Ax-b\|_2 \leq \delta \end{array} and then like so: \begin{array}{ll} \text{minimize} & t \\ \text{subject to} & (x,t) \in \mathcal{K}_{\ell_1} \\ & (Ax-b,\delta) \in \mathcal{K}_{\ell_2} \end{array} where $\mathcal{K}_{\ell_1}$ and $\mathcal{K}_{\ell_2}$ are the cones defined by the epigraphs of the $\ell_1$ and $\ell_2$ norms, respectively. Then the Lagrangian becomes \begin{aligned} L(x,t,z_1,z_2,z_3,z_4) &= t - \langle (x,t), (z_1,z_2) \rangle - \langle (Ax-b,\delta), (z_3,z_4) \rangle \\ &= t - x^Tz_1 - tz_2 - (Ax-b)^T z_3 - \delta z_4 \end{aligned} Our Lagrange multipliers lie in the dual cones: $$(z_1,z_2)\in\mathcal{K}_{\ell_1}^* = \mathcal{K}_{\ell_\infty}, \quad (z_3,z_4)\in\mathcal{K}_{\ell_2}^* = \mathcal{K}_{\ell_2}$$ which means that $$\|z_1\|_\infty \leq z_2, \quad \|z_3\|_2 \leq z_4$$ Since $L$ is linear in $x$ and $t$, the dual function $g(z_1,z_2,z_3,z_4) = \inf_{x,t} L(x,t,z_1,z_2,z_3,z_4)$ is finite only for those values of the dual variables if the terms involving $x$ and $t$ vanish. This leads to the dual constraints $$1 - z_2 = 0 \quad -z_1-A^T z_3 = 0$$ So the dual problem becomes \begin{array}{ll} \text{maximize} & b^T z_3 - \delta z_4 \\ \text{subject to} & z_2 = 1 \\ & z_1 = - A^T z_3 \\ & \|z_1\|_\infty \leq z_2 \\ & \|z_3\|_2 \leq z_4 \end{array} Eliminating $z_2$ and $z_1$ yields \begin{array}{ll} \text{maximize} & b^T z_3 - \delta \|z_3\|_2 \\ \text{subject to} & \|-A^Tz_3\|_\infty \leq 1 \\ \end{array} So it looks like setting $y=z_3$, and eliminating the superfluous negative sign in the dual constraint, gets you what you want.