4
$\begingroup$

Notation for Gauss-Newton method

Non-linear least squares problems are often solved by the Levenberg-Marquardt algorithm, which can be viewed as a Gauss–Newton method using a trust region approach.

In a non-linear least squares problem, we want to find the minimum of a function $F(x)$ that is a sum of squares of nonlinear functions $F(x)=\sum_{i=1}^m[f_i(x)]^2$ The Gauss-Newton method computes a sequence of approximate solutions $x_k$ by linearizing the functions $f_i$ around $x_k$ and determine $x_{k+1}$ as the solution of the corresponding linear least squares problem. However, convergence is not guaranteed, for general non-linear functions $f_i$.

Context

I recently tried to answer a question for which it is possible to formulate the problem in such a way that the non-linear functions are of the form $f_i(x)=\max(0,y_i-b_i^{T}x)$. It's clear to me that this can be reformulated as a convex quadratic programming problem, and hence can be solved exactly by a quadratic programing solver. However, I asked myself whether the much simpler Gauss-Newton method wouldn't also be able to solve this problem with a "finite number of iterations". I tried to find a counter example, but didn't succeed so far.

Question

Is it possible to prove that the Gauss-Newton method finds the solution after a "finite number of iterations" when the non-linear functions are of the form $f_i(x)=\max(0,y_i-b_i^{T}x)$ in case there is a unique solution? (I assume here that degenerate linear least squares problems are handled appropriately, for example by using a singular value decomposition). Is it even possible to give an explicit bound for the number of iterations?

EDIT / Update (I decided to add a bounty. Explaining my current status hopefully increases the chances of getting answers that bring me nearer to a solution.)

Considering the counter example from the comment, it's a good idea to use $F_k(x)=\sum_{i \;:\; y_i-b_i^{T}x_k \geq 0}[y_i-b_i^{T}x]^2$ as the linear least squares problem to be solved in the $k$-th iteration. (Because we will probably use an existing Levenberg-Marquardt implementation instead of the Gauss-Newton method in practice, having a uniquely defined linearization independent of the optimization "history" is extremely desirable).

My current idea how to "prove" that the algorithm "finds" the unique solution (assuming an unique solution exists) goes as follows: Let $x_*$ be the unique solution and define $F_*(x)=\sum_{i \;:\; y_i-b_i^{T}x_* \geq 0}[y_i-b_i^{T}x]^2$ For $x_k=x_*$ we have $x_{k+1}=x_*$. My current hope is that for $k>1$ and $x_k\neq x_*$ we have $F_*(x_{k+1}) < F_*(x_k)$. If this is really true, it's hopefully possible to prove it. If it's not true, it's hopefully possible to find an explicit counter example.

  • 0
    @Dominique The challenge is to rule out the possibility that the method might cycle between a finite number of active sets forever instead of converging to the solution. But if you can rule out this possibility, then you will have proved that the method will find the solution in a finite number of steps.2012-05-25

0 Answers 0