3
$\begingroup$

So I have gotten stumped on something that seems like it (should?) be easy. I am trying to find the following derivative shown below. I have scoured the wiki link on matrix derivatives, and I think my answer is correct but I want to make sure.

So let us say we have a square matrix $\boldsymbol{A}$, and a vector $\boldsymbol{\theta}$. (I am assuming here that the dimensionality is 2 for ease. So:

$\boldsymbol{A} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} &a_{22}\end{bmatrix}, \boldsymbol{\theta}= \begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix}$

I am trying to derive how we get:

$ \frac{\delta \boldsymbol{A}\boldsymbol{\theta}}{\delta \boldsymbol{\theta}}= \boldsymbol{A} $

So first I tried to 'open up' the matrix-vector product, so I now have the following matrix:

$ \begin{bmatrix} a_{11}\theta_{0} + a_{12}\theta_{1} \\ a_{21}\theta_{0} + a_{22}\theta_{1} \end{bmatrix}_{2x1} $

... and this is where I am stuck. How do I show from here, that the derivative of the above is indeed equal to $\boldsymbol{A}$? I know that I have to take the partials, but I cannot seem to find a rule governing in what ordering of columns/rows those partials must be taken.

  • 0
    What you're looking for is called the [Jacobian matrix](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant). You should be able to apply the definition given there to get the expected answer.2012-05-08

1 Answers 1

3

The derivative is the 'best' linear approximation to a function at a given point. More explicitly, if $f$ is differentiable at a point $x$, then there is a linear approximation (that is, the derivative) $L$ such that $ f(x+h) -f(x) = Lh + o(h).$ You can view $h$ as a perturbation, and $Lh$ is the 'best' linear approximation to the corresponding change in $f$. (See https://en.wikipedia.org/wiki/Big_O_notation#Little-o_notation for a description of $o(h)$, it is roughly a way of expressing a limit without using the limit sign.)

There are two points to this ramble:

The first is that if $f$ is linear, then it is it's own derivative. You need do no more work. In your case, the derivative of the function $\theta \mapsto A \theta$ must be $A$.

The second is that this view can help you sort out the whole row/column/order thing. Let $f$ denote your function $\theta \mapsto A \theta$, that is $f(\theta) = A \theta$. If you 'perturb' the first variable ($\theta_0$ in your notation) by an amount $\delta$, then we have $f(\theta + \binom{\delta}{0})-f(\theta) = A \binom{\delta}{0} = \delta \binom{a_{11}}{a_{21}}$. So you can see that the 'linear response' to perturbing $\theta_0$ by $\delta$ is given by the first column of $A$, so we have $\binom{a_{11}}{a_{21}} = \binom{\frac{\partial f_1(\theta)}{\partial \theta_0}}{\frac{\partial f_2(\theta)}{\partial \theta_0}}$.

In general, the $n$th column of the derivative corresponds to perturbing the $n$th variable.

More explicitly, in your case, the derivative is given by $\begin{bmatrix} \frac{\partial f_1(\theta)}{\partial \theta_0} & \frac{\partial f_1(\theta)}{\partial \theta_1} \\ \frac{\partial f_2(\theta)}{\partial \theta_0} &\frac{\partial f_2(\theta)}{\partial \theta_1}\end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} &a_{22}\end{bmatrix} .$

  • 0
    They can, but I would ur$g$e you to think in terms of what the derivative means in a coordinate-free sense, then the usual coordinate-based approach will follow easily (I think!).2012-05-08