Trying to get the derivative of $(X \beta)^t (X \beta)$, where $X$ is an NxP matrix, and $\beta$ is a Px1 vector, and t means transpose.
What is the derivative of $(X \beta)^t (X \beta)$
-
0What is the variable and what is the constant? – 2017-01-14
3 Answers
Using the Matrix Cookbook (PDF):
Derivative with respect to $X$ is $2X\beta \beta^\top$; see (77).
Derivative with respect to $\beta$ is $2X^\top X \beta$; see (81).
By hand:
Derivative with respect to $\beta$: let $A:=X^\top X$, and note $\beta^\top A \beta = \sum_i \sum_j A_{ij} \beta_i \beta_j$. The partial derivative with respect to $\beta_k$ is $2 \sum_{i \ne k} A_{ik} \beta_i + 2A_{kk}\beta_k = 2 \sum_i A_{ik} \beta_i$, so the derivative with respect to $\beta$ is $2A\beta$.
Derivative with respect to $X$: let $B=\beta\beta^\top$ and note $(X\beta)^\top (X\beta) = Tr(X BX^\top) = \sum_i x_i^\top B x_i$, where $x_i^\top$ is the $i$th row of $X$. By using the result of the previous paragraph and transposing, we see that the derivative with respect to $x_i^\top$ is $(2B x_i)^\top=2x_i^\top B$. Then the derivative with respect to $X$ is $2XB$.
If $X$ is the variable, consider $f(X,Y)=(X\beta)^t(Y\beta)$, it is bilinear, so its derivative at $df_{(X,Y)}(U,V)=(U\beta)^t(Y\beta)+(X\beta)^t(V\beta)$.
This implies that the derivative of $g(X)=f(X,X)$ is $df_X(U)=(U\beta)^t(X\beta)+(X\beta)^t(U\beta)$.
Assuming that the context is Linear regression / Least squares.
Note that $(X\beta)'(X\beta) = \beta' X'X\beta$, denote $A=X'X$, hence you have a quadratic form $$ f(\beta) = \beta ' A \beta, $$ so, the gradient w.r.t $\beta$ is $$ \nabla f(\beta)=2A\beta . $$ You can derive the formula by writing $f(\beta) = \sum_{i=1}^p \sum _{j=1}^pa_{ij}\beta_i\beta_j = \sum_{i=1}^pa_{ii}\beta_i^2+2\sum_{i>j}^pa_{ij}\beta_i\beta_j$ and take derivative w.r.t each of the entries of $\beta$.