1
$\begingroup$

I was reading this to review the derivation of the ordinary least squares estimator but I'm having trouble differentiating (4). Can someone please help explain why

$ \dfrac{\partial (\hat{\beta}'X'X\hat{\beta})}{\partial \hat{\beta}} = 2X'X\hat{\beta} $

I understand all the other steps.

Thanks!

2 Answers 2

1

Notice first that you might guess this formula because if $X$ and $\hat{\beta}$ are $1 \times 1$ then it reduces to the formula for the derivative of $x^2$ from calculus.

Let's derive a multivariable product rule that will help us here. Suppose $f:\mathbb{R}^n \to \mathbb{R}$ and suppose $f(x) = \langle g(x), h(x) \rangle$ for some functions $g:\mathbb{R}^n \to \mathbb{R}^m$ and $h:\mathbb{R}^n \to \mathbb{R}^m$. Then if $\Delta x \in \mathbb{R}^n$ is small (and $g$ and $h$ are differentiable at $x$), we have $\begin{align*} f(x + \Delta x) &\approx \langle g(x) + g'(x) \Delta x, h(x) + h'(x) \Delta x \rangle \\ &= \langle g(x),h(x) \rangle + \langle g(x), h'(x) \Delta x \rangle + \langle g'(x) \Delta x, h(x) \rangle + \langle g'(x) \Delta x, h'(x) \Delta x \rangle \\ &\approx \langle g(x),h(x) \rangle + \langle g(x), h'(x) \Delta x \rangle +\langle g'(x) \Delta x, h(x) \rangle \\ &= \langle g(x),h(x) \rangle + \langle h'(x)^T g(x), \Delta x \rangle + \langle g'(x)^T h(x), \Delta x \rangle \\ &= f(x) + \langle h'(x)^T g(x) + g'(x)^T h(x), \Delta x \rangle. \end{align*}$ Comparing this result with $f(x + \Delta x) \approx f(x) + \langle \nabla f(x), \Delta x \rangle$ we discover that $\nabla f(x) = h'(x)^T g(x) + g'(x)^T h(x).$ This is our product rule. (I'm using the convention that the gradient is a column vector, which is not completely standard.)

Now let $g(x) = Ax$ for some matrix $A$. So $g'(x) = A$. What's the gradient of the function $\begin{align*} f(x) &= \langle g(x),g(x) \rangle \\ &= \langle Ax, Ax \rangle \\ &= x^T A^T A x \quad \text{?} \end{align*}$

By our product rule the answer is $\begin{align*} \nabla f(x) &= 2g'(x)^T g(x) \\ &= 2 A^T A x. \end{align*}$

This is the result that you wanted to derive.

1

First we need to note that $X'X$ is a symmetric matrix as $(X'X)'=X'(X')'=X'X$.

Now,Using basic matrix differentiation,we know that for symmetric matrix $A$,$\dfrac{\partial (x'Ax)}{\partial x} = 2Ax = 2x'A$ where $x$ is a vector and dimensions are proper. hence result follows trivially.

Note:Actually we can say more,if $A$ is any $n \times n$ matrix, $x$ is $n \times 1$ vector,then,

$ \dfrac{\partial (x'Ax)}{\partial x} = x'(A'+A)$

Proof:$ x'Ax = \sum_{j=1}^n \sum_{i=1}^n a_{ij}x_ix_j$ Differentiating wrt to $x_k$,we get, $\dfrac{\partial (x'Ax)}{\partial x_k}= \sum_{j=1}^n a_{kj}x_j + \sum_{i=1}^n a_{ik}x_i\ ; \forall k = 1,...,n$ hence,$\dfrac{\partial (x'Ax)}{\partial x} = x'A'+x'A$

Hence result follows.

More details on Matrix differentiation can be found on http://en.wikipedia.org/wiki/Matrix_calculus .

Thanks.