2
$\begingroup$

Let $f$ denote the function defined by

$f(x) = w_{pos} \sum_v \left[ \left(\sum_b d_{v,b} x_b - \theta_v \right)_+\right]^2 + w_{neg} \sum_v \left[ \left(\sum_b d_{v,b} x_b - \theta_v \right)_- \right]^2$

I would like to find the gradient of $f$.

Here, $d_{v,b}$ is a large matrix of dim ${v \times b}$ and $x_b$ is a vector of dim ${b \times 1}$ and $\theta$ is a vector of dim ${n \times 1}$. The first part of the equation penalizes over achieving the goal (theta) and second part penalizes under achieving the goal (theta). The $_+$ indicates that the first sum penalizes the positive results and the $_-$ indicates that the second sum penalizes the negative results.

Could someone differentiate this? I believe it has to be done piece wise, and what would the code look like?

  • 0
    Updated. Thanks.2011-08-08

1 Answers 1

4

Consider the function $g$ defined on $\mathbb{R}^k$ by $ g(x)=\left(\langle d,x\rangle-\theta\right)_+^2,\quad \langle d,x\rangle=\sum_bd_bx_b, $ for some given $\theta$ in $\mathbb R$ and $d=(d_b)$ in $\mathbb R^k$. Then, $g$ is also $ g(x)=\left(\langle d,x\rangle-\theta\right)^2\mathbf{1}_D(x),\quad D=\left\{x\in\mathbb R^k\mid\langle d,x\rangle>\theta\right\}. $ For every $x$ in $D$, since $D$ is open, the indicator function is uniformly $1$ in a neighborhood of $x$ hence $g(z)=\left(\langle d,z\rangle-\theta\right)^2$ for every $z$ in a neighborhood of $x$ and $ \vec\nabla g(x)=2\left(\langle d,x\rangle-\theta\right)\,d. $ For every $x$ not in the closure of $D$, the indicator function is uniformly $0$ in a neighborhood of $x$ hence $g(z)=0$ for every $z$ in a neighborhood of $x$ and $ \vec\nabla g(x)=0. $ For every $x$ in the boundary of $D$, $g(z)=\left(\langle d,z\rangle-\theta\right)^2$ AND $g(z)=0$ for some points $z$ in every neighborhood of $x$, but $\langle d,x\rangle-\theta=0$ hence $ \vec\nabla g(x)=0. $ Finally, for every $x$, $ \vec\nabla g(x)=2\left(\langle d,x\rangle-\theta\right)_+\,d. $ To conclude one adds the contributions of several such functions $g$ to $\vec\nabla f(x)$ and one uses a similar result for elementary functions of the form $ g(x)=\left(\langle d,x\rangle-\theta\right)_-^2. $ This last step is direct since $g$ is also g(x)=\left(\langle d',x\rangle-\theta'\right)_+^2 for d'=-d and \theta'=-\theta.