I'm trying to interpret the two results when deriving the dual of the convex optimization problem:
$$\text{minimize }\|x\|_2$$ $$\text{subject to }Ax=b$$
where we assume that $x \in \mathbb{R}$; that the domain, $D$, of the problem is the set of reals.
We begin by writing the problem's Lagrangian:
$$L(x, \nu)=\|x\|_2+\nu^TAx - \nu^Tb$$
and the dual function: $$g(\nu)=\underset{x \in D}{\inf}L(x,\nu)=\underset{x \in D}{\inf}(\|x\|_2+\nu^TAx - \nu^Tb)$$
Solution #1: minimization by zero gradient condition
Standard practice at this point for differentiable $L(x,\nu)$ seems to be to find the optimal $x^*$ that satisfies: $$\nabla_xL(x,\nu)=\frac{x}{\|x\|_2} + A^T\nu=0$$
which gives: $$x^* = \left\{\begin{array}{ll} -A^T\nu & \|A^T\nu\|_2=1 \\ -\infty & otherwise \end{array}\right.$$
Then, substituting $x^*$ into $g(\nu)$ we have one (incomplete) interpretation of the dual function (for $ \|A^T\nu\|_2=1 $): $$\begin{align} g(\nu) &= \|-A^T\nu\|_2 - \|A^T\nu\|^2_2-b^T\nu\\ &= (1-1)-b^T\nu\\ &= \left\{\begin{array}{ll} -b^T\nu & \|A^T\nu\|_2=1\\ -\infty & otherwise \end{array}\right. \end{align}$$
Solution #2: general minimization over x
Alternately, we can choose to analyze the dual function to minimize the lagrangian over the domain of $x$ to give a more complete picture of the dual function. Again we start with the definition of the dual function: $$g(\nu) = \underset{x\in D}{\inf}L(x,\nu)=\underset{x\in D}{\inf}(\|x\|_2+\nu^TAx - b^T\nu) $$
and we make the choice of $x=-t\cdot A^T\nu, t\ge0$ (though $x=t\cdot A^T\nu, t\le0$ should work similarly) and substitute into $g(\nu)$ giving: $$ \begin{align} g(\nu) &= \underset{t\ge0}{\inf}(t\cdot(\|A^T\nu\|_2 - \|A^T\nu\|_2^2) - b^T\nu) \\ &= \left\{\begin{array}{ll} -b^T\nu & \|A^T\nu\|_2 \le 1 \\ -\infty & otherwise \end{array}\right. \end{align} $$
Question:
In solution #2 we get the additional finite expression, $-b^T\nu$, for the dual function when $\|A^T\nu\|_2\lt 1$.
My questions are:
- what is the interpretation of the different solutions when $\|A^T\nu\|_2\lt1$?
- why is it necessary to restrict $t$ to be non-negative in the substitution step for solution #2? How can we justify this constraint, and does this constraint influence the answer to question #1?
I'm looking for some geometric intuition into this problem that can explain the apparent difference here. I appreciate the help!