1
$\begingroup$

Let $f_i,b_1,\cdots b_k$ be column vectors of length $d$, and let $[c_{i1}, \cdots c_{ik}]$ be a row vector.

Let $O = ||f_i -\sum_{r=1}^{k} c_{ir}(b_r) ||^2_2$. The second term is a linear combination of the $b_r$.

I am interested in calculating the partial derivative with respect to $c$, i.e $[\frac{\partial O}{\partial c_{i1}} \cdots \frac{\partial O}{\partial c_{ik}}]$.

When I use the norm definition and calculate the partial derivative, the expression looks very long for each $c_{ir}$. Is there any way to represent the above partial derivative in a very compact manner?

  • 0
    Yes. I have edited the question2012-08-18

1 Answers 1

1

To simplify, I will use $c$ to denote the row vector $[c_{i1}, \cdots c_{ik}]$.

A slight notational complexity is introduced by choosing $c$ to be a row vector. Other than that, the total derivative (as opposed to the partials) can be computed in a straightforward manner.

Let $\phi(x) = \sum_{i=1}^d x_i^2$. Then we have $D \phi(x) = 2 x^T$, or equivalently, $D \phi(x)(h) = 2 x^T h$.

Let $L(c) = f_i -\sum_{r=1}^{k} c_{r}(b_r)$. If we let $B = \begin{bmatrix} b_1 \cdots b_k \end{bmatrix}$, this can be written more compactly as $L(c) = f_i - B c^T$ ($c$ is a row vector). By linearity, you have $DL(c)(h) = -Bh^T$. (If $c$ were a column vector, we could just write $DL(c) = -B$.)

You have $O = \phi \circ L$, hence by the product rule, $D O(c)(h) = D \phi (L(c))(D L(c)(h))$, so we have $D O(c)(h) = 2 L(c)^T(-B h^T) = -2 (f_i -B c^T)^T B h^T$.

Choosing $h = e_j^T$ ($e_j$ is the $j$th unit vector) gives the partials:

$\frac{\partial O(c)}{\partial c_j} = D O(c)(e_j^T) = -2 (f_i -B c^T)^T b_j =-2 (f_i^T -c B^T) b_j = -2 (f_i -\sum_{r=1}^{k} c_{r}(b_r))^T b_j$.