6
$\begingroup$

For the following convex minimization problem:

\begin{equation} \begin{array}{rl} \textrm{minimize} & f(x)\\ \textrm{subject to} & Ax=b, \end{array} \end{equation}

where $f$ is differentiable, the optimality conditions are: $$Ax^*=b, \qquad \nabla f(x^*)+A^T\nu^*=0.$$

In Boyd & Vandenberghe's "Convex Optimization" (p521), $Ax^*=b$ are called primal feasibility equations, and $\nabla f(x^*)+A^T\nu^*=0$ are called dual feasibility equations. The naming of the former makes perfect sense to me, since $Ax=b$ is a set of equations that define the feasible set of the primal problem (within the domain).

However, I'm not so sure why $\nabla f(x^*)+A^T\nu^*=0$ are called the "dual feasibility equations"? Isn't the dual problem an unconstrained concave maximization:

$$\sup_{\nu} \left[\inf_{x\in \mathcal D} f(x)+\nu^T(Ax-b)\right]?$$

Is it because we view the attainability of the infimum within the square brackets for a given $\nu$ as the feasibility condition for the dual problem? (This just defines the domain of the dual problem, doesn't it?)

  • 2
    Your reasoning seems to be correct2017-01-16
  • 1
    It is not about attainability, but about value. If the infimum is $-\infty$, we say the corresponding $\nu$ is infeasible.2017-01-16
  • 0
    @LinAlg Thanks for pointing out this subtlety! Indeed, $\nabla f(x^*)+A^T\nu^*=0$ does not define the domain or feasible set of the dual problem. It is only a _sufficient_ condition for a $\nu$ to be in the domain (and hence feasible). In light of this, I wonder if there's some better reason why we call $\nabla f(x^*)+A^T\nu^*=0$ the dual feasibility equations. Or is it simply because it's a sufficient condition for feasibility of the dual problem?2017-01-16
  • 2
    I'm not sure I understand your last statement. If $\nabla f(x)+A^Tv\neq 0$, then the inner infimum has not been attained, so that's not the minimizing value of $x$. And in practice, it is often the case that the infimum is $-\infty$ for many particular values of $\nu$---which means those values of $\nu$ are not in the domain of the dual function. So that condition *does* define the domain of the dual function, and therefore its feasible set.2017-01-16
  • 0
    I was thinking about the dual of a problem with $f(x) = 1/x$, $x\geq 1$, where the infimum is not attained. However, in the context of optimality conditions for the primal, I can see why these are called dual feasibility equations.2017-01-16
  • 0
    @MichaelGrant Let me try... Here's what I thought: Suppose the domain of the _primal_ problem is an open set, and the $\nabla f(x)+A^T\nu \ne 0$ (in that domain) for some $\nu$. Then the inner infimum may exist, but may not be attainable. So, that $\nu$ is in the domain of the dual problem (hence feasible) even though $\nabla f(x)+A^T\nu \ne 0$. Am I mistaken somewhere?2017-01-16
  • 0
    If $\nabla f(x) + A^T \nu \neq 0$ you have _no way of knowing_ whether or not $\nu$ is in the domain.2017-01-16
  • 0
    $\nu$ is feasible _only if_ $\nabla f(x)+A^T\nu=0$ for _some_ $x\in\mathcal{D}$.2017-01-16
  • 0
    Suppose $\mathcal{D}=\mathbb{R}^n_+$ and $f(x)=f^Tx$ on that domain. Then this condition reduces to $f+A^T\nu=0$, a standard dual feasibility criterion.2017-01-16
  • 0
    @MichaelGrant Thanks for the clarification. But I guess I still missed something. For simplicity, let $\mathcal D=\mathbb R_{++}^2$ and $f(x)=x_1+x_2$, subject to $x_1=1$. Then the dual function $g(\nu)=\inf_x x_1+x_2+\nu(x_1-1).$ In this case, $g(1)=-1$ and $\nu=1$ is in the domain of the dual problem. But the infimum is not attained, and $\nabla f(x)+A^T\nu=[2\:1]^T.$ I'd appreciate it if you'd point out where I went wrong!2017-01-16
  • 0
    OK, I think I see what you're saying here. That one's a little more difficult because we're not enforcing the primal domain constraint with a Lagrange multiplier. But I also think that we're talking about a case that is not being considered by Boyd & Vandeberghe.2017-01-16
  • 0
    Suppose $f(x)$ wasn't differentiable---then clearly, the conditions laid out by Boyd & Vandenberghe would have to be modified. So implicit in their discussion are some assumptions on $f$.2017-01-16
  • 0
    I would argue that $f(x)$ as we have defined it in this example is not differentiable in the sense that B&V need it to be for their assumption. Yes, its domain is open, and yes, it is differentiable on that domain, but in fact it's not continuous in an extended-real sense.2017-01-16

1 Answers 1

4

Based on our discussions, I would say this. Using an extended-real convention, the domain $\mathcal{D}$ is absorbed by $f$ itself, and the Lagrange dual is \begin{array}{ll} \text{maximize} & g(\nu) \triangleq \inf_x L(x,\nu) = \inf_x f(x) + \nu^T ( Ax - b ) \\ \end{array} There is no explicit dual constraint for this problem, because the Lagrange multiplier for an equality constraint is itself unconstrained. In contrast, for an inequality constraint $Ax\leq b$, the dual problem would have an explicit constraint $\nu \geq 0$.

However, we know that in practice, there are often values of $\nu$ such that $\inf_x L(x,\nu) = -\infty$. These serve as an implicit constraint on $\nu$ reflected in the domain of $g$. It is common practice to identify those implicit constraints and make them explicit. (Indeed, this is necessary if one does not wish to adopt an extended-real convention.) So the dual becomes \begin{array}{ll} \text{maximize} & g(\nu) \triangleq \inf_x f(x) + \nu^T ( Ax - b ) \\ \text{subject to} & -\infty < \inf_x f(x) + \nu^TAx \end{array} This is true even if $f$ is not differentiable. If $f$ is differentiable on all of $\mathbb{R}^n$, then this is equivalent to \begin{array}{ll} \text{maximize} & g(\nu) \triangleq \inf_x f(x) + \nu^T ( Ax - b ) \\ \text{subject to} & \exists x ~ \nabla f(x) + A^T \nu = 0 \end{array} [EDIT: as the OP points out, there are cases where it is not truly equivalent; rather, it is a sufficient condition.] This will also hold if $f(x)$ is differentiable in an extended-real sense. That is, if:

  • $\mathop{\textrm{dom}} f$ is open;
  • $f$ is differentiable on its domain;
  • $f$ serves as a barrier for its domain; that is, $f(x)\rightarrow +\infty$ as $x\rightarrow\mathop{\textrm{Bd}} \mathop{\textrm{dom}} f$.

Note that this specifically excludes cases where the domain of $f$ is "artificially" constrained, like the case we considered in the comments.

Anyway, subject to these assumptions, it is reasonable to call $\nabla f(x) +A^T\nu=0$ a "dual feasibility constraint". [EDIT: I still maintain that this is reasonable in practice, despite the exceptions found by the OP.] It may seem a bit restrictive to require this assumption at first. But I would suggest that if artificial domain constraints are replaced with explicit equality and inequality constraints instead, these restrictions are somewhat light.

  • 0
    Thanks a lot for writing up an answer for this question. However, I'm still stuck at one point: It appears that the problem isn't just the artificial constraint we impose on the domain of $f$. For example, consider $\mathrm{minimize } f(x)=e^{x_1}+e^{x_2}-x_1$, subject to $x_1=1$. $f(x)$ is strictly convex and differentiable on $\mathbb R^2$. Its dual function is $g(\nu)=\inf_{x} e^{x_1}+e^{x_2}-x_1+\nu(x_1-1)$. So again $g(1)=-1$ and $\nu=1$ is in the domain of $g$. But the infimum is not attained, and $\nabla f(x)+A^T\nu=[e^{x_1}+\nu-1, e^{x_2}]^T\ne 0$.2017-01-17
  • 0
    I'm not sure if I follow you. For $\nu=1$, we have $g(1)=\inf_{x} e^{x_1}+e^{x_2}-1=-1,$ isn't it? So the $\nu=1$ is in the domain of $g$. But for $\nu=1$, $\nabla f(x)+A^T\nu=[e^{x_1}, e^{x_2}]^T$ is never zero on the entire $\mathbb R^2,$ isn't it?2017-01-17
  • 0
    OK. We have $g(\nu) = \inf_x e^{x_1} + e^{x_2} - x_1 + \nu( x_1 - 1)$. Indeed, for $\nu=1$, we have $g(\nu)=\inf_x e^{x_1} + e^{x_2} - 1$ as you indicated, so $g(\nu)=-1$. This is certainly an interesting wrinkle, and yet it still satisfies the condition that $-\infty < \inf_x L(x,\nu)$.2017-01-17
  • 0
    So it does look like there's a technical detail remaining that I've missed in my answer above. I'll edit to state that it's not a perfect equivalence.2017-01-17
  • 0
    Remember, the dual function and its effect on the dual feasibility set hold _even if $f$ is not differentiable_. And as a result, _we already knew_ that the gradient condition is sufficient, but not necessary. So I think we're spending too much time on exceptions like these. I think it's fair to say that this constitutes an error in Boyd & Vandenberghe, though.2017-01-17
  • 0
    I appreciated your time and attention on this minute question of mine, Michael. And just one last comment. On a second thought, even if $\nabla f(x)+A^T\nu=0$ is only a sufficient condition of the feasbility of the dual problem, it appears to me that it's not that un-reasonable to call it the "dual feasibility equations." After all, $Ax=b$ is also just a necessary condition of the feasibility of the primal problem, since the primal feasible set is $\mathcal D \cap \{x\mid Ax=b\}.$2017-01-17
  • 0
    You're welcome. I tell you, I need to take a break though! I'm not reading your comments very well. Anyway, be well!2017-01-17