4
$\begingroup$

I was reading Boyd & Vandenberghe's "Convex Optimization" (Ch10), where the following equality constrained convex program is considered (p526):

\begin{equation} \begin{array}{rl} \textrm{minimize} & f(x)\\ \textrm{subject to} & Ax=b, \end{array} \end{equation}

and Newton's method is employed to solve the optimality conditions, $$Ax=b, \qquad \nabla f(x)+A^Tw=0.$$

More specifically, given a feasible point $x$ (which satisfies $Ax=b$), we find the next feasible point, $x+\Delta x_{nt}$, by solving

$$\left[\begin{array}{cc}\nabla^2 f(x) & A^T\\A & 0\end{array}\right] \left[\begin{array}{c}\Delta x_{nt} \\ w\end{array}\right] = \left[\begin{array}{c}-\nabla f(x) \\ 0\end{array}\right].$$

This is straightforward, if the so-called KKT matrix,$\left[\begin{array}{cc}\nabla^2 f(x) & A^T\\A & 0\end{array}\right]$, is nonsingular (invertible). And it is stated in the book that the Newton step, $\Delta x_{nt}$, is defined only at points for which the KKT matrix is non-singular.

So what do we do when the KKT matrix happen to be singular during the iterations? Conceivably, if the above system of equations is consistent, we can just pick one solution as $\Delta x_{nt}$ and continue with the iterations, right?

However, if the above system of equations is not even consistent, it's not clear to me what a good strategy would be? Are there better ways than just picking another feasible initial point and restarting the Newton's method all over again, wishing that we may end up with a sequence of non-singular KKT matrices that lead to an optimal (acceptable) solution?

Does it make sense to use the projection of the negative gradient $-\nabla f(x)$ on the null space of matrix $A$ as $\Delta x_{nt}$? (This ensures that $x+\Delta x_{nt}$ remains feasible, but it's not clear if it's still a descent direction?)

I'd appreciate any pointers or comments. Thanks a lot!

  • 1
    I would use [least squares solution](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse#Linear_least-squares) and hope for the best.2017-01-11
  • 0
    @zaq Thanks! Yeah, that's worth trying. However, one problem with the LS solution is that it may not be feasible. BTW, the orthogonal projection of the negative gradient $-\nabla f(x)$ on null space is also an LS solution.2017-01-12
  • 1
    You can force feasibility and do LS on the linearized optimality equations. In other words you can minimize $\| Bx-c \|$ subject to $Ax=b$.2017-01-12
  • 1
    That said, if I found a point where the KKT matrix was singular, I would probably just take a small random perturbation of the point I'm already at. (It should not be *extremely* small; you want the KKT matrix to have a good condition number, not just be nonsingular.) Unless the underlying problem is quite weird, this should fix the issue.2017-01-12
  • 0
    @Ian Thanks for the suggestions! Would you maintain feasibility with the perturbation? Or allow it to step out of the feasible set temporarily, and then bring it back in later iterations?2017-01-12
  • 1
    off the top of my head, I would definitely go with the minimum-norm solution to the system, which would involve computing the Moore-Penrose pseudoinverse of the coefficent matrix.2017-01-12
  • 0
    @syeh_106 It doesn't much matter. This is a corner case anyway, so it's not much of a programming issue to put in a special case to force you back to the feasible set after the perturbation. Or you can try staying in the feasible set (taking the perturbation to be a small random element of the null space of $A$), though that is a bit less likely to fix the singularity. (That said, have you actually encountered this singularity in a real situation?)2017-01-12
  • 0
    @Ian Thanks for the clarifications. And, not really, I am not faced with a singularity problem right now. I was just reading the book, and wondered what I should do if singularity occurred.2017-01-12
  • 1
    The same issue can arise with Newton's method for unconstrained optimization. So as a first step, we should understand what to do in that case.2017-01-12
  • 0
    @littleO Indeed. But it appears that the problem is somewhat less for the unconstrained optimization. For example, the book assumes strong convexity in Ch9 for unconstrained optimizations, which precludes singularity altogether, since the Hessian is p.d. But even with strong convexity, the KKT matrix here can still be singular.2017-01-12
  • 0
    @littleO Nonetheless, for the unconstrained problem, would you suggest something different from the comments here? It appears that the suggestions here can be applied to the unconstrained problem as well.2017-01-12
  • 1
    In the unconstrained case, one option is to take a gradient descent step when the Newton's step fails. Is there an analogous option in the constrained case?2017-01-12
  • 0
    @littleO Yeah, I did think about that, and added that in the last paragraph of my question. However, to stay in the feasible set, I thought I'd project the negative gradient to the null space of $A$. Would you do something else?2017-01-12
  • 1
    The projection of the negative gradient will work just fine---as a gradient step. You won't get the nice convergence properties of a Newton step.2017-01-12
  • 0
    @MichaelGrant Thanks, Michael. I was thinking using the projection of $-\nabla f(x)$ just for the current step where the KKT system of equations is inconsistent. In the subsequent iterations, I would use Newton's method whenever the KKT system of equations are consistent, which I hope would be the usual case.2017-01-12
  • 3
    I'd say you're giving up too much by doing that. A better bet is to add a small multiple of the identity matrix to $\nabla^2 f(x)$ to make it positive definite. If $A$ has full row rank this will give you a nonsingular system, and only a small degradation from the Newton system. In fact, this approach will have the nice result of preferring solutions with small norm $\|\Delta x_{nt}\|$.2017-01-12
  • 0
    If $A$ doesn't have full row rank, you can fix that before you start your iterations by using a rank-revealing factorization to find an equivalent $\bar{A}x=\bar{b}$. So it's really only $\nabla^2 f(x)$ you need to worry about.2017-01-12

0 Answers 0