2
$\begingroup$

The hinge loss function (summed over $m$ examples):

$$ l(w)= \sum_{i=1}^{m} \max\{0 ,1-y_i(w^{\top} \cdot x_i)\} $$

My calculation of the subgradient for a single component and example is:

$$ l(z) = \max\{0, 1 - yz\} $$ $$ l^{\prime}(z) = \max\{0, - y\} $$ $$ g(w) = w \cdot x $$ $$ g^{\prime}(w) = x $$ $$ \frac{\partial l}{\partial z}\frac{\partial g}{\partial w} = \max\{0 \cdot x, - y \cdot x\} = \max\{0, - yx\} $$

For vectors:

$$ l^{\prime}(w) = \sum_{i=1}^{m} \max\{0 ,-(y_i \cdot x_i)\} $$

But the answer I have been given is:

enter image description here

I don't understand this notation. Have I arrived at the same solution, and can someone explain the notation?

  • 0
    This function is not differentiable, so what do you mean by "derivative"?2017-01-18

1 Answers 1

1

$$\mathbb{I}_A(x)=\begin{cases} 1 & , x \in A \\ 0 & , x \notin A\end{cases}$$

is the indicator function

Hence for each $i$, it will first check if $y_i(w^Tx_i)<1$, if it is not, the corresponding value is $0$.

If it is $y_i(w^Tx_i)<1$ is satisfied, $-y_ix_i$ is added to the sum.

We can see that the two quantities are not the same as your result does not take $w$ into consideration.

Remark: Yes, the function is not differentiable, but it is convex. Subgradient is used here.

  • 0
    Thanks. Can you remark on why my reasoning is incorrect? I have added my derivation of the subgradient in the post. I am not sure where this check for less than 1 comes from. I have seen it in other posts (e.g. https://stats.stackexchange.com/questions/4608/) but it was not the definition given.2017-01-18
  • 0
    The mistake occurs when you compute $l'(z)$, in general, we cannot bring differentiation inside maximum function. The indicator function is used to know for a function of the form $\max(f(x), g(x))$, when does $f(x) \geq g(x)$ and otherwise.2017-01-18