35
$\begingroup$

Given a matrix $A$ and column vector $x$, what is the derivative of $Ax$ with respect to $x^T$ i.e. $\frac{d(Ax)}{d(x^T)}$, where $x^T$ is the transpose of $x$?

Side note - my goal is to get the known derivative formula $\frac{d(x^TAx)}{dx} = x^T(A^T + A)$ from the above rule and the chain rule.

Thanks,
Asaf

4 Answers 4

38

Let $f(x) = x^TAx$ and you want to evaluate $\frac{df(x)}{dx}$. This is nothing but the gradient of $f(x)$.

There are two ways to represent the gradient one as a row vector or as a column vector. From what you have written, your representation of the gradient is as a row vector.

First make sure to get the dimensions of all the vectors and matrices in place.

Here $x \in \mathbb{R}^{n \times 1}$, $A \in \mathbb{R}^{n \times n}$ and $f(x) \in \mathbb{R}$

This will help you to make sure that your arithmetic operations are performed on vectors of appropriate dimensions.

Now lets move on to the differentiation.

All you need to know are the following rules for vector differentiation.

$\frac{d(x^Ta)}{dx} = \frac{d(a^Tx)}{dx} = a^T$ where $x,a \in \mathbb{R}^{n \times 1}$.

Note that $x^Ta = a^Tx$ since it is a scalar and the equation above can be derived easily.

(Some people follow a different convention i.e. treating the derivative as a column vector instead of a row vector. Make sure to stick to your convention and you will end up with the same conclusion in the end)

Make use of the above results to get,

$\frac{d(x^TAx)}{dx} = x^T A^T + x^T A$ Use chain rule to get the above result i.e. first take $Ax$ as constant and then take $x^T A$ as constant.

So, $\frac{df(x)}{dx} = x^T(A^T + A)$

  • 3
    Is this really called the chain rule? I've always called this the product rule. $\frac{d(u(x)\cdot v(x)}{dx} = \frac{du}{dx}(x)v(x)+\frac{dv}{dx}(x)u(x)$ (And the chain rule would be that $\frac{d(u(v(x))}{dx} = \frac{du}{dx}\left(v(x)\right)\cdot \frac{dv}{dx}\left(x\right)$)2018-02-23
6

I think there is no such thing. $\mbox{d}(x^\mbox{T}Ax)/\mbox{d}x$ is something that, when multiplied by the change $\mbox{d}x$ in $x$, yields the change $\mbox{d}(x^\mbox{T}Ax)$ in $x^\mbox{T}Ax$. Such a thing exists and is given by the formula you quote. $\mbox{d}(Ax)/\mbox{d}(x^\mbox{T})$ would have to be something that, when multiplied by the change $\mbox{d}x^\mbox{T}$ in $x^\mbox{T}$, yields the change $\mbox{d}Ax$ in $Ax$. No such thing exists, since $x^\mbox{T}$ is a $1 \times n$ row vector and $Ax$ is an $n \times 1$ column vector.

If your main goal is to derive the derivative formula, here's a derivation:

$(x^\mbox{T} + \mbox{d}x^\mbox{T})A(x + \mbox{d}x) = x^\mbox{T}Ax + \mbox{d}x^\mbox{T}Ax + x^\mbox{T}A\mbox{d}x + \mbox{d}x^\mbox{T}A\mbox{d}x =$

$=x^\mbox{T}Ax + x^\mbox{T}A^\mbox{T}\mbox{d}x + x^\mbox{T}A\mbox{d}x + O (\lVert \mbox{d}x \rVert^2) = x^\mbox{T}Ax + x^\mbox{T}(A^\mbox{T} + A)\mbox{d}x + O (\lVert \mbox{d}x \rVert^2)$

  • 0
    I understand the issue you have. So essentially our argument boils down, in some sense, to what is $dx/dx$ and $dx/dx^T$, when $dx$ is a column vector. My definition is $dx/dx = 1$ i.e. $dx = dx$ and $dx/dx^T = I$ i.e. $dx = I (dx^T)^T$. Will that take care of the issues?2011-02-06
4

Mathematicians kill each other about derivatives and gradients. Do not be surprised if the students do not understand one word about this subject. The previous havocs are partly caused by the Matrix Cookbook, a book that should be blacklisted. Everyone has their own definition. $\dfrac{d(f(x))}{dx}$ means either a derivative or a gradient (scandalous). We could write $D_xf$ as the derivative and $\nabla _xf$ as the gradient. The derivative is a linear application and the gradient is a vector if we accept the following definition: let $f:E\rightarrow \mathbb{R}$ where $E$ is an euclidean space. Then, for every $h\in E$, $D_xf(h)=<\nabla_x(f),h>$. In particular $x\rightarrow x^TAx$ has a gradient but $x\rightarrow Ax$ has not ! Using the previous definitions, one has (up to unintentional mistakes):

Let $f:x\rightarrow Ax$ where $A\in M_n$ ; then $D_xf=A$ (no problem). On the other hand $x\rightarrow x^T$ is a bijection (a simple change of variable !) ; then we can give meaning to the derivative of $Ax$ with respect to $x^T$: consider the function $g:x^T\rightarrow A(x^T)^T$ ; the required function is $D_{x^T}g:h^T\rightarrow Ah$ where $h$ is a vector ; note that $D_{x^T}g$ is a constant. EDIT: if we choose the bases $e_1^T,\cdots,e_n^T$ and $e_1,\cdots,e_n$ (the second one is the canonical basis), then the matrix associated to $D_{x^T}g$ is $A$ again.

Let $\phi:x\rightarrow x^TAx$ ; $D_x\phi:h\rightarrow h^TAx+x^TAh=x^T(A+A^T)h$. Moreover $<\nabla_x(f),h>=x^T(A+A^T)h$, that is ${\nabla_x(f)}^Th=x^T(A+A^T)h$. By identification, $\nabla_x(f)=(A+A^T)x$, a vector (formula (89) in the detestable matrix Cookbook !) ; in particular, the solution above $x^T(A+A^T)$ is not a vector !

2

As Sivaram points out, you must define your convention about rows/colums derivatives and just be consistent.

For example, you could define the derivative of a column vector with respect to a row vector as (assuming the letters represent column vectors) as matrix:

$\displaystyle \frac{d(y)}{dx^T} = D$ with $d_{i,j} = \frac{d(y_i)}{dx^j}$

And that will work (it will be consistent). For example, you get $\displaystyle \frac{d(Ax)}{dx^T} = A$

But it's not so simple to apply this -and the product rule of derivation- to deduce your identity, because you get to different derivatives: a row with respect to a row and a column respect to row, and you can't (at least without further justification) mix them.

Of course, if the matrix is simmetric all is simpler.

  • 0
    OK, to make my point of view more precise, I should say: You are both right that if you want to define the notation this way you can do it; but in the notation that Asaf himself used for $\mbox{d}(x^\mbox{T}Ax)/\mbox{d}x$, and that is used on Wikipedia, it doesn't make sense to write $\mbox{d}(Ax)/\mbox{d}x^\mbox{T}$. Can we agree on that?2011-02-06