1
$\begingroup$

In optimization, sometimes you encounter functions or difference equations defined in terms of $\text{argmin}$ or $\text{argmax}$

$p: X \to X, x \mapsto \text{argmin}_{y \in X} \phi(y) + \|x-y\|^2$

I am puzzled by the fact we need to solve an optimization algorithm in order to evaluate these functions.

For example, consider $p: X \to X, x \mapsto \text{argmin}_{y \in X} \phi(y) + \|x-y\|^2$ .

On the one hand, the interpretation of this function is clear. Given a vector $x \in X$, we map it to a vector that gives the minimum of $\phi(y) + \|x-y\|^2$.

But on the other hand, we have no idea of this map is even well defined, since it is well known in optimization this map $\text{argmin}$ could have one, multiple, or no solution. So given a sequence of $x_k$, $p(x_k)$ could yield something like $y_1, \{y_{21},y_{22}\}, \text{no solution}, y_4, ...$

Further, it is not at all clear how this minimizer $y$ is found, it is usually not stated as part of the function. I would be much happier if the function was defined as "$p: X \to X, x \mapsto \text{argmin}_{y \in X} \phi(y) + \|x-y\|^2$ where we solve for $y$ using ... algorithm which guarantees a unique minimizer"

How do people in optimization deal with this ambiguity.

  • 0
    The prox mapping you defined about is usually considered on the space of functions which are *convex lower semi-continuous* on a Hilbert space $\mathcal H$. This function space is usually denoted $\Gamma_0(\mathcal H)$. For $f \in \Gamma_0(\mathcal H)$, the prox $p_f$ defines an "unambiguous" function from $\mathcal H$ to itself. It also corresponds to the derivative of the *inf-convolution* of $f$ with the quadrance $\frac{1}{2}\|.\|_2^2$ (called the *Moreau envelope* of $f$). More on proximal calculus can be found in this one-page manuscript https://people.ok.ubc.ca/bauschke/Research/68.pdf2017-01-02
  • 0
    If you can read french, a more complete reference is here: http://www.mat.ufpb.br/jmbo/papers/Brezis/BSMF_1965__93__273_0.pdf2017-01-02

1 Answers 1

1

Regarding your first question: indeed, one has to take care when defining a mapping like this to ensure that the solution to the minimization/maximization is unique.

Fortunately, in this case, we likely have that. The specific mapping you have chosen here is the proximal mapping, and in the convex optimization settings where it is often found, the function $\phi$ is convex (although not necessarily differentiable). If that is the case, then the squared term $\|x-y\|_2^2$ makes the function strongly convex, which guarantees that the minimization will yield a single, unique solution.

Regarding your second question. The practice of mathematics does not require that the algorithm for solving an optimization problem be provided before that problem can be studied, or before anything useful can be said about it. That's a good thing, because there simply is no way one can define a single, universal, practical algorithm for solving optimization problems, even convex ones.

Take the specific example of the proximal map: the solution method depends critically on the choice of $\phi(x)$. For some choices, like $\phi(x)=\|x\|_1$, the solution can be obtained analytically. For others, an iterative numerical approach (like a gradient search or Newton method) is required. Thus while there are general principles and patterns that can guide you, the solution method must be determined on a case-by-case basis.

But again, there is quite a bit we can say about the proximal mapping before we ever get around to solving it.

  • 0
    Hi thanks 4 ur excellent answer! can you take a look at my other question specifically regarding optimization algorithms? http://stats.stackexchange.com/questions/254107/why-are-optimization-algorithms-defined-in-terms-of-other-optimization-problems2017-01-02