7
$\begingroup$

Consider $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ and $g:\mathbb{R}^m\rightarrow\mathbb{R}^k$. Then $(g\circ f):\mathbb{R}^n\rightarrow\mathbb{R}^k$ and, if both of them are differentiable, $[D(g\circ f)_p]=(Dg)_{f(p)}\cdot (Df)_p$. If these functions are two times differentiable, then

$D(D(g\circ f))_p=(D^2g)_{f(p)}\cdot (Df)^2_p + (Dg)_{f(p)}\cdot (D^2 f)_p$.

I'm trying to figure out what $(Df)^2_p$ means. Since it is a $m\times n$ matrix I cannot multiply them. Can someone help me?

  • 2
    It's a tensor. To get an idea of what's going on: say we have $f(x,y,z)$ (scalar valued). Then $Df=\nabla f = \langle f_x,f_y,f_z \rangle$ (a vector of functions). $D^2f=H$ (the Hessian matrix) which is a $3 \times 3$ matrix filled with second partials. $D^3f$ is a $3 \times 3 \times 3$ cube filled with third partials. etc. So if your first derivative $Df$ is an $m \times n$ matrix, $D^2f$ is a vector of matrices (or better to start using tensors).2012-04-08
  • 1
    Are you sure? Because what I have in my equation is $(Df)_p\cdot (Df)_p$, not $(D^2 f)_p$. I know what $(D^2 f)_p$ is, but this product is what is taking my sleep away at night :)2012-04-08
  • 0
    Yes. I'm sure. To make sense out of "multiplying" these things using matrices and such, you'll need to pull apart $D(f)$ into a big column vector so that $D^2(f)$ is again a matrix. But in the end this is not a great way to handle such matters.2012-04-08
  • 0
    @Bill Gustavo is asking about $(Df)_p^2$, not $(D^2f)_p$.2012-04-08

1 Answers 1

4

Let $p = (x_1,\ldots,x_m)$, $f(p) = (y_1,\ldots,y_n)$, and $(g\circ f)(p)= (z_1,\ldots,z_k)$. Then the chain rule can be written $$ \frac{\partial z_j}{\partial x_i} \;=\; \sum_{\alpha=1}^m \frac{\partial z_j}{\partial y_\alpha} \frac{\partial y_\alpha}{\partial x_i} $$ What you have written as $(Dg)_{f(p)}$ is the matrix of partial derivatives $\dfrac{\partial z_j}{\partial y_\alpha}$, and what you have written as $(Df)_p$ is the matrix of partial derivatives $\dfrac{\partial y_\alpha}{\partial x_i}$.

Taking the derivative again yields $$ \frac{\partial^2 z_j}{\partial x_h \partial x_i} \;=\; \sum_{\alpha=1}^m\sum_{\beta=1}^m \frac{\partial^2 z_j}{\partial y_\beta\partial y_\alpha}\frac{\partial y_\beta}{\partial x_h}\frac{\partial y_\alpha}{\partial x_i}\;+\;\sum_{\alpha=1}^m \frac{\partial z_j}{\partial y_\alpha}\frac{\partial^2 y_\alpha}{\partial x_h \partial x_i} $$ What you have written as $(D^2g)_{f(p)}$ is the rank-three $k\times m\times m$ tensor of second partial derivatives $\dfrac{\partial^2 z_j}{\partial y_\beta \partial y_\alpha}$. As you can see, what you have written as $(Df)_p^2$ is the rank four $m\times n\times m\times n$ tensor whose entries are $\dfrac{\partial y_\beta}{\partial x_h}\dfrac{\partial y_\alpha}{\partial x_i}$. That is, $(Df)_p^2$ is the tensor product (or Kronecker product) of the matrix $(Df)_p$ with itself.

In general, given an $m\times n$ matrix $A$ and a $q\times r$ matrix $B$, their tensor product is the $m\times n\times q \times r$ tensor whose $(i,j,k,\ell)$-th entry is $a_{i,j}b_{k,\ell}$. This operation is analogous to the outer product of two vectors. More generally, it is possible to take the tensor product of any rank $R$ tensor with any rank $S$ tensor to get a tensor of rank $R+S$.

From a more algebraic point of view, $(Df)_p$ is a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$, and the second derivative $(D^2f)_p$ is a linear tranformation $$ (D^2f)_p : \mathbb{R}^n\otimes\mathbb{R}^n \to \mathbb{R}^m $$ where $\otimes$ denotes the tensor product of vector spaces. The object $(Df)_p^2$ is the linear transformation $$ (Df)_p^2 : \mathbb{R}^n\otimes\mathbb{R}^n \to \mathbb{R}^m\otimes\mathbb{R}^m $$ defined by $$ (Df)_p^2(v\otimes w) \;=\; (Df)_p(v) \,\otimes\, (Df)_p(w) $$ Since $(D^2g)_{f(p)}$ goes from $\mathbb{R}^m \otimes \mathbb{R}^m$ to $\mathbb{R}^k$, the composition $(D^2g)_{f(p)}\cdot (Df)_p^2$ is defined, and is a linear transformation from $\mathbb{R}^n\otimes\mathbb{R}^n$ to $\mathbb{R}^k$.

  • 0
    The first line seem to be incorrect , i guess you meant $\frac {\partial (f o g)}{\partial x_i}$2013-11-13
  • 0
    @Theorem I don't see a problem. Can you be more specific?2013-11-13