1
$\begingroup$

I have the following optimization problem: $$\min_{a,b,c} f(a,b,c)$$ and $f(a,b,c,)$ can be written in a quadratic form respect to each one of variables $a,b,c$ separately. And more precisely, these variables are vectors which are coupled to each others like $f(a,b,c)=f(a*K*b*c)$ and $K$ is a matrix.

So i decided to solve it in an alternating fashion, so i made a loop of 3 separate optimization, in each i optimize the problem respect to one variable. for example $a$ then $b$ then $c$. And i repeat the whole loop until there is no progress in minimizing the cost function.

Since each separate optimization is quadratic, they are all convex and they'll converge to their minimum while fixing the value of the 2 other parameters.

The whole optimization loop converges to a better minimum for $f(a,b,c)$, but i compared this to the case that $a=1$ which is like omitting effect of $a$ in $f(a,b,c)$ and it becomes $f(b,c)$ and solved that in the same fashion as above (having 2 separate optimization in the loop now). But the convergence point has a lower minimum that the when we use also $a$. Meaning that $$f(b^{*},c^{*}) < f(a^{*},b^{*},c^{*})$$ and $a^{*}\neq 1 $. I do not understand why such thing should happen as all the individual optimization problems are convex and quadratic!

  • 0
    Do you have any constraints in place? Is $f(a,b,c) = g_1(a) + g_2(b) + g_3(c)$ or is different? Also keep in mind, that the product of two convex functions must not be convex itself (i.e. negative values)2017-01-12
  • 0
    The variables are vectors which are coupled to each others like $f(a,b,c)=f(a*K*b*c)$ and $K$ is a matrix.2017-01-12
  • 1
    The type of algorithm you are using is commonly called "block coordinate descent". See this: https://en.wikipedia.org/wiki/Coordinate_descent for example on why it may fail even for convex problems. There are also some nice results from the late Paul Tseng that can be used to check whether coordinate descent can be used to reach a global minimum for convex problems.2017-01-13
  • 0
    @user23658 then what is the difference between "Block Coordinate" and "Alternating Optimization?"2017-01-13
  • 1
    As far as I can tell, coordinate descent need not alternate in the same way that "alternating optimization" does (for example, variables to update can be chosen at random). See: leitang.net/presentation/BCD-convergence.pdf; where they say "in this work, coordinate descent actually refers to alternating optimization(AO). Each step finds the exact minimizer". Seems AO is a special case of CD.2017-01-13
  • 0
    Point being, searching coordinate descent and reading about the conditions under which it converges to a global minimum would give you your answer.2017-01-13
  • 0
    Please accept the answer or indicate why it is not satisfactory.2017-01-30

1 Answers 1

1

Even for convex functions, optimizing each variable iteratively does not result in a global optimum. Consider $f(x,y) = \max\{x,y\}$ starting at $(1,1)$. For each variable individually, the objective function cannot be reduced.

  • 0
    So what is a good strategy to solve problems like this? Based on the definition of $f$ i know that there are values for triple ${a,b,c}$ which can result in smaller value in $f$, but the above optimization strategy cannot find them unless i choose a close initial point to some manually known optimal point!2017-01-12
  • 0
    Maybe you can define a domain that contains the optimal solution, and maybe the function in Lipschitz on that domain. Global optimization solvers can solve such problems to proven optimality.2017-01-12