2
$\begingroup$

Let $f$ denote the function defined by

$$f(x) = w_{pos} \sum_v \left[ \left(\sum_b d_{v,b} x_b - \theta_v \right)_+\right]^2 + w_{neg} \sum_v \left[ \left(\sum_b d_{v,b} x_b - \theta_v \right)_- \right]^2$$

I would like to find the gradient of $f$.

Here, $d_{v,b}$ is a large matrix of dim ${v \times b}$ and $x_b$ is a vector of dim ${b \times 1}$ and $\theta$ is a vector of dim ${n \times 1}$. The first part of the equation penalizes over achieving the goal (theta) and second part penalizes under achieving the goal (theta). The $_+$ indicates that the first sum penalizes the positive results and the $_-$ indicates that the second sum penalizes the negative results.

Could someone differentiate this? I believe it has to be done piece wise, and what would the code look like?

  • 0
    Enclosing the math expression in dollar signs "$\$$" will make your tex render.2011-08-03
  • 2
    What do the positive and negative subscripts stand for?2011-08-03
  • 0
    @anon: Based upon "penalises over/under achieving" I'd guess it means the first sums only the positive results, and the second sums only the negative results.2011-08-04
  • 1
    Yes thats exactly correct. The first sum penalizes the positive results and the second sum penalizes the negative results.2011-08-04
  • 0
    FWIW: if you already have code to compute the monstrosity that is $f$, why not have an [automatic differentiation](http://www.autodiff.org/) routine derive the gradient for you?2011-08-04
  • 0
    @J.M. :What kind of automatic differentiation? Like use numerical gradients? In the algorithm I'm running, I have to supply the gradient as an output of the main function call. Hence my query.2011-08-04
  • 0
    No... the routine is supposed to generate code for evaluating the gradient.2011-08-04
  • 0
    Are you sure the $\sum_v$ is not in the square brackets (indicating that the sum should be squared)? If not, it looks like $f = ||dx - \theta||_2^2$2011-08-05
  • 1
    No the sum is outside the square brackets. Individual elements are squared, and then the entire sum is taken.2011-08-05
  • 0
    Only reasonable way I've tried to do this is, check for each $v$ if the value inside the bracket produces positive or negative, and based on that, differentiate it.2011-08-05
  • 0
    So if individual elements are squared and summed, then it doesn't matter if they're positive or negative : it will still be the sum of the squares of the elements of the vector $dx - \theta$ ; correct?2011-08-05
  • 1
    In the greater scheme of things it does. What I didnt include here are two different scalar weights multiplied on the two sums. Hence the "over-achieving" is penalized differently then "under-achieving".2011-08-07
  • 0
    Ah, so $f$ is a weighted sum of the first and second sum of squares? If so, do update the question.2011-08-07
  • 0
    Updated. Thanks.2011-08-08

1 Answers 1

4

Consider the function $g$ defined on $\mathbb{R}^k$ by $$ g(x)=\left(\langle d,x\rangle-\theta\right)_+^2,\quad \langle d,x\rangle=\sum_bd_bx_b, $$ for some given $\theta$ in $\mathbb R$ and $d=(d_b)$ in $\mathbb R^k$. Then, $g$ is also $$ g(x)=\left(\langle d,x\rangle-\theta\right)^2\mathbf{1}_D(x),\quad D=\left\{x\in\mathbb R^k\mid\langle d,x\rangle>\theta\right\}. $$ For every $x$ in $D$, since $D$ is open, the indicator function is uniformly $1$ in a neighborhood of $x$ hence $g(z)=\left(\langle d,z\rangle-\theta\right)^2$ for every $z$ in a neighborhood of $x$ and $$ \vec\nabla g(x)=2\left(\langle d,x\rangle-\theta\right)\,d. $$ For every $x$ not in the closure of $D$, the indicator function is uniformly $0$ in a neighborhood of $x$ hence $g(z)=0$ for every $z$ in a neighborhood of $x$ and $$ \vec\nabla g(x)=0. $$ For every $x$ in the boundary of $D$, $g(z)=\left(\langle d,z\rangle-\theta\right)^2$ AND $g(z)=0$ for some points $z$ in every neighborhood of $x$, but $\langle d,x\rangle-\theta=0$ hence $$ \vec\nabla g(x)=0. $$ Finally, for every $x$, $$ \vec\nabla g(x)=2\left(\langle d,x\rangle-\theta\right)_+\,d. $$ To conclude one adds the contributions of several such functions $g$ to $\vec\nabla f(x)$ and one uses a similar result for elementary functions of the form $$ g(x)=\left(\langle d,x\rangle-\theta\right)_-^2. $$ This last step is direct since $g$ is also $g(x)=\left(\langle d',x\rangle-\theta'\right)_+^2$ for $d'=-d$ and $\theta'=-\theta$.