1
$\begingroup$

I'm trying to interpret the two results when deriving the dual of the convex optimization problem:

$$\text{minimize }\|x\|_2$$ $$\text{subject to }Ax=b$$

where we assume that $x \in \mathbb{R}$; that the domain, $D$, of the problem is the set of reals.

We begin by writing the problem's Lagrangian:
$$L(x, \nu)=\|x\|_2+\nu^TAx - \nu^Tb$$

and the dual function: $$g(\nu)=\underset{x \in D}{\inf}L(x,\nu)=\underset{x \in D}{\inf}(\|x\|_2+\nu^TAx - \nu^Tb)$$


Solution #1: minimization by zero gradient condition

Standard practice at this point for differentiable $L(x,\nu)$ seems to be to find the optimal $x^*$ that satisfies: $$\nabla_xL(x,\nu)=\frac{x}{\|x\|_2} + A^T\nu=0$$

which gives: $$x^* = \left\{\begin{array}{ll} -A^T\nu & \|A^T\nu\|_2=1 \\ -\infty & otherwise \end{array}\right.$$

Then, substituting $x^*$ into $g(\nu)$ we have one (incomplete) interpretation of the dual function (for $ \|A^T\nu\|_2=1 $): $$\begin{align} g(\nu) &= \|-A^T\nu\|_2 - \|A^T\nu\|^2_2-b^T\nu\\ &= (1-1)-b^T\nu\\ &= \left\{\begin{array}{ll} -b^T\nu & \|A^T\nu\|_2=1\\ -\infty & otherwise \end{array}\right. \end{align}$$


Solution #2: general minimization over x

Alternately, we can choose to analyze the dual function to minimize the lagrangian over the domain of $x$ to give a more complete picture of the dual function. Again we start with the definition of the dual function: $$g(\nu) = \underset{x\in D}{\inf}L(x,\nu)=\underset{x\in D}{\inf}(\|x\|_2+\nu^TAx - b^T\nu) $$

and we make the choice of $x=-t\cdot A^T\nu, t\ge0$ (though $x=t\cdot A^T\nu, t\le0$ should work similarly) and substitute into $g(\nu)$ giving: $$ \begin{align} g(\nu) &= \underset{t\ge0}{\inf}(t\cdot(\|A^T\nu\|_2 - \|A^T\nu\|_2^2) - b^T\nu) \\ &= \left\{\begin{array}{ll} -b^T\nu & \|A^T\nu\|_2 \le 1 \\ -\infty & otherwise \end{array}\right. \end{align} $$


Question:

In solution #2 we get the additional finite expression, $-b^T\nu$, for the dual function when $\|A^T\nu\|_2\lt 1$.

My questions are:

  1. what is the interpretation of the different solutions when $\|A^T\nu\|_2\lt1$?
  2. why is it necessary to restrict $t$ to be non-negative in the substitution step for solution #2? How can we justify this constraint, and does this constraint influence the answer to question #1?

I'm looking for some geometric intuition into this problem that can explain the apparent difference here. I appreciate the help!

  • 3
    Why make it more difficult on yourself than necessary? Minimizing $\|x\|_2^2$ is equivalent, and it's a heck of a lot easier to work with.2017-02-18
  • 1
    Thanks @MichaelGrant, I'll keep that in mind. The example was presented to me as the unsquared $L_2$ norm, probably as a demonstration of the zero gradient condition failing for this choice of non-differentiable objective, so I was just looking for some clarification of the understanding.2017-02-18
  • 1
    I would take Michael's suggestion here as it simplifies things. If not, for #1, note that the objective is not differentiable, so you have the condition $0 \in \partial_x L(x, \nu)$ (subdifferential). A little work shows that if $\|A^* \nu \| <1$ then $x^* = 0$. We have $\partial_x L(0, \nu) = \overline{B}(0,1) + \{ A^* \nu \}$.2017-02-20

1 Answers 1

2

This is an elaboration of my comment.

Michael's suggestion hits the nail on the head as it avoids the pitfalls encountered in both Solution #1 & Solution #2.

In both cases, the issue is that the objective is not differentiable everywhere, and so a little care is needed in the characterisation of a solution. To get some intuition, plot the function $x \mapsto |x|+bx$ for $|b|<1, |b|=1, |b|>1$ to see how the infimal value varies with $b$.

Regarding Question #1, let $\lambda(x) = L(x,\nu)$, we are looking for a solution of $\inf_x \lambda(x)$. Note that $\lambda$ is not differentiable everywhere, so we need to take care at points where it is not differentiable. Since it is convex and defined everywhere, we can use the subdifferential instead.

Since $\lambda$ is convex, we have a minimiser (that is, some $\hat{x}$ such that $\lambda(\hat{x})=\inf_x \lambda(x)$) iff $0 \in \partial \lambda (x)$ for some $x$, where $\partial \lambda (x)$ is the subdifferential at $x$.

We have $\partial \lambda (x) = \partial \|\cdot\|_2 (x) + \{ A^T \nu \} = \begin{cases} \overline{B}(0,1)+ \{ A^T \nu \}, & x=0 \\ \{ { x \over \|x\|_2 } \} + \{ A^T \nu \}, & \text{otherwise}\end{cases}$. From this we see that there is no minimiser if $\|A^T \nu \|_2 > 1$, the minimiser is $\hat{x}=- A^T \nu$ if $\|A^T \nu \|_2 = 1$, and the minimiser is $\hat{x}=0$ if $\|A^T \nu \|_2 < 1$.

From this we get $\inf_x \lambda(x) = \begin{cases} -\nu^T b, & \|A^T \nu \|_2 \le 1\\ -\infty, & \text{otherwise}\end{cases}$, which matches Solution #2. (Note that the non existence of a minimiser does not imply that the function is unbounded below, but it is easy to explicitly choose an $x$ in this case that shows that the function is unbounded below.)

Regarding Question #2, as you have (mostly) observed, we have $\inf_x \lambda(x) = \inf_t \lambda(t A^T \nu) $. Expanding, $\lambda(t A^T \nu) = |t| \|A^T \nu\|_2 - t \|A^T \nu \|_2^2 - \nu^T b$ (note again the non differentiability at $t=0$ because of the $|t|$ term).

We can either perform a case analysis or compute the subdifferential as above to compute $\inf_t \lambda(t A^T \nu) = \begin{cases} -\nu^T b, & \|A^T \nu \|_2 \le 1\\ -\infty, & \text{otherwise}\end{cases}$.