1
$\begingroup$

I have following expression, I would like to take the sub-gradient of it with respect to $Z$. Would someone help me with this ?

$$\|Y-Z\|^2_2$$

where $Y$ and $Z$ both are $\in\mathbb R^{m \times n}$

  • 0
    In general, the second one does not make sense.2017-01-04
  • 0
    for the first one, doe it make sense that it will be $-2(Y-Z)$ ?2017-01-04
  • 0
    Do you know the definition for derivative with respect to a matrix?2017-01-04
  • 0
    What matrix norm are you using? The Frobenius norm? The matrix 2-norm?2017-01-04
  • 0
    @BrianBorchers , it's matrix 2-norm.2017-01-04
  • 0
    Does that mean the norm induced by the Euclidean norm? Why do you think it is differentiable?2017-01-04
  • 0
    I suspect that it's differentiable almost everywhere. Neverthless, it is most definitely not $-2(Y-Z)$. For the Frobenius norm, sure.2017-01-04
  • 0
    How about sub-differential of it ?2017-01-04

1 Answers 1

1

Let's answer it this way. For a norm defined on any vector space, we have $$\partial \|X\| = \left\{ W \,\middle|\, \|W\|_*\leq 1, ~\langle W,X \rangle = \|X\| \right\}$$ where $\|\cdot\|_*$ is the dual norm (and Y is drawn from the dual vector space). For the matrix norm $\|X\|_2=\sigma_\max(X)$, we have $\|W\|_*=\sum_i \sigma_i(w)$, so $$\partial \|X\|_2 = \left\{ Y \,\middle|\, \sum_i \sigma_i(Y) \leq 1, ~\langle W,X \rangle = \sigma_\max(X) \right\}$$ I'm going to skip the proof here but I believe we have this: $$\partial \|X\|_2 = \mathop{\textrm{Conv}} \left\{ uv^T \,\middle|\, \|u\|_2=\|v\|_2=1, ~ u^TXv = \sigma_\max(X) \right\}$$ where $\mathop{\textrm{Conv}}$ denotes the convex hull.

The subdifferential of the squared norm follows simply: $$\partial \|X\|^2 =2 \|X\| \cdot \partial \|X\|$$ This is true for any norm, not just the matrix norm.

Now, as for subdifferentiability vs. differentiability. As a convex function defined on all of $\mathbb{R}^{m\times n}$, the subdifferential exists everywhere. But a convex function it is differentiable only where the cardinality of that subdifferential is 1. For the non-squared norm, I believe that is everywhere $\sigma_1(X)>\sigma_2(X)$; that is, where the maximum singular value has a multiplicity of one. At those points, there is only one dyad $uv^T$ that obtains $u^TXv=\sigma_\max(X)$.

On the other hand, the squared norm is differentiable at one other location: the origin, because the subdifferential is $\{0\}$ there.

Now, you asked about $\|Y-Z\|^2=\|Z-Y\|^2$, not just $\|X\|^2$, but that's simple enough: $$\partial \|Z-Y\|^2 = 2\|Z-Y\|\cdot \partial \|Z-Y\| =2\|Z-Y\|\cdot\left.\partial\|X\|\right|_{X=Z-Y}$$ $$\partial \|Z-Y\|^2 = 2\|Z-Y\|\mathop{\textrm{Conv}}\left\{uv^T\,\middle|\,\|u\|_2=\|v\|_2=1, ~ u^T(Z-Y)v^T=1\right\}$$

  • 0
    thanks. Would it be possible to add simple example to understand the last equation better ? I want to implement this.2017-01-04
  • 0
    Also I didn't get first part. why do you assume that Y is drawn from dual norm space ?2017-01-04
  • 0
    If you're going to *implement* a subgradient method, just take the SVD of $Y-Z$ and pick $-2\sigma_1(Y-Z)u_1v_1^T$.2017-01-04
  • 1
    Don't worry about the dual space concept. That's a technical matter for general vector spaces, but in this case, the dual space is the same as the primal, just $\mathbb{R}^{m\times n}$.2017-01-04
  • 1
    I just realized it would be a heck of a lot easier to work with $Z-Y$ instead of $Y-Z$.2017-01-05
  • 0
    I think it;s not easy to compute it in each iteraction of algorithm.2017-01-05
  • 0
    Well, I suppose if $Z,Y$ are very large, sure. But it's just an SVD. If you can't afford an SVD then you're going to need to do something more clever than a subgradient method, methinks. I suppose you could use a Lanczos-style method to get just $\sigma_1u_1v_1^T$.2017-01-05