I am currently reading Hazan et. al.'s recent paper[1] on graduated nonconvex optimization. Letting $f$ be an arbitrary, possibly nonconvex function, its smoothed version on scale $\delta$ is defined as:
$$ \hat{f}_{\delta}(\mathbf{x}) \triangleq \mathbb{E}_{\mathbf{u} \sim \mathbb{B}} \left[ f(\mathbf{x} + \delta \mathbf{u}) \right] \qquad (1) $$
where $\mathbb{B}$ is the unit ball.
In Section 3.2, the notion of an $(a, \sigma)$-nice function is introduced, which depends on two conditions:
Centering property: $\forall \delta > 0$, letting $\mathbf{x}_{\delta}^* = \mbox{argmin} \hat{f}_{a \delta} (\mathbf{x})$, there exists $\mathbf{x}_{\delta / 2}^* = \mbox{argmin} \hat{f}_{a \delta / 2}(\mathbf{x})$ such that $\|\mathbf{x}^*_{\delta} - \mathbf{x}_{\delta / 2}^*\| < \delta / 2$.
Local strong convexity: $\forall \delta > 0$, $\mathbf{x}_{\delta}^* = \mbox{argmin} \hat{f}_{a \delta} (\mathbf{x})$, then the function $\hat{f}_{a \delta}(\mathbf{x})$ is $\sigma$-strongly convex over $\mathbb{B}_{3 \delta}(\mathbf{x}^*)$ (the euclidean ball with radius $3 \delta$ centered around $\mathbf{x}_{\delta}^*$).
I understand that the first property essentially means that if we transition from a scale $\delta$ to a more "fine" scale $\delta /2 $, the new minima will not shift too much a position with respect to the scale change, while the second property ensures that after shifting scale the optimization procedure remains within a convex neighbourhood.
My question pertains to how one could verify (or disprove) the validity of the above conditions for some arbitrary function $f$. The authors seem to take them for granted for the parameters of a single-layer neural network trained on a square loss, but in the relevant section (Sec. 7) they fail to mention the parameters $\sigma, a$ that they used for training.
I'm also confused by the fact that $\hat{f}$ is defined as the expectation of $f$ over a scaled euclidean ball, which makes it harder for me to imagine a way to check local properties such as convexity for arbitrary $f$. More specifically, I am dealing with a loss function
$$ J(\mathbf{x}) = \sum_{i=1}^n \|\max_j(x_j + k_{ij}) - d_i\|_p $$
where $(\mathbf{k}_i, d_i)$ are data pairs, and $||\cdot||_p$ is some suitable $\ell_p$ norm. Is it possible to write $\hat{J}_{\delta}(\mathbf{x})$ in closed (or easily computable) form? More generally, how would one go about proving or disproving conditions (1, 2) for a function defined as the expectation of values over a set?
Thanks to everyone in advance.