7
$\begingroup$

Consider $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ and $g:\mathbb{R}^m\rightarrow\mathbb{R}^k$. Then $(g\circ f):\mathbb{R}^n\rightarrow\mathbb{R}^k$ and, if both of them are differentiable, $[D(g\circ f)_p]=(Dg)_{f(p)}\cdot (Df)_p$. If these functions are two times differentiable, then

$D(D(g\circ f))_p=(D^2g)_{f(p)}\cdot (Df)^2_p + (Dg)_{f(p)}\cdot (D^2 f)_p$.

I'm trying to figure out what $(Df)^2_p$ means. Since it is a $m\times n$ matrix I cannot multiply them. Can someone help me?

  • 0
    @Bill Gustavo is asking about $(Df)_p^2$, not $(D^2f)_p$.2012-04-08

1 Answers 1

4

Let $p = (x_1,\ldots,x_m)$, $f(p) = (y_1,\ldots,y_n)$, and $(g\circ f)(p)= (z_1,\ldots,z_k)$. Then the chain rule can be written $ \frac{\partial z_j}{\partial x_i} \;=\; \sum_{\alpha=1}^m \frac{\partial z_j}{\partial y_\alpha} \frac{\partial y_\alpha}{\partial x_i} $ What you have written as $(Dg)_{f(p)}$ is the matrix of partial derivatives $\dfrac{\partial z_j}{\partial y_\alpha}$, and what you have written as $(Df)_p$ is the matrix of partial derivatives $\dfrac{\partial y_\alpha}{\partial x_i}$.

Taking the derivative again yields $ \frac{\partial^2 z_j}{\partial x_h \partial x_i} \;=\; \sum_{\alpha=1}^m\sum_{\beta=1}^m \frac{\partial^2 z_j}{\partial y_\beta\partial y_\alpha}\frac{\partial y_\beta}{\partial x_h}\frac{\partial y_\alpha}{\partial x_i}\;+\;\sum_{\alpha=1}^m \frac{\partial z_j}{\partial y_\alpha}\frac{\partial^2 y_\alpha}{\partial x_h \partial x_i} $ What you have written as $(D^2g)_{f(p)}$ is the rank-three $k\times m\times m$ tensor of second partial derivatives $\dfrac{\partial^2 z_j}{\partial y_\beta \partial y_\alpha}$. As you can see, what you have written as $(Df)_p^2$ is the rank four $m\times n\times m\times n$ tensor whose entries are $\dfrac{\partial y_\beta}{\partial x_h}\dfrac{\partial y_\alpha}{\partial x_i}$. That is, $(Df)_p^2$ is the tensor product (or Kronecker product) of the matrix $(Df)_p$ with itself.

In general, given an $m\times n$ matrix $A$ and a $q\times r$ matrix $B$, their tensor product is the $m\times n\times q \times r$ tensor whose $(i,j,k,\ell)$-th entry is $a_{i,j}b_{k,\ell}$. This operation is analogous to the outer product of two vectors. More generally, it is possible to take the tensor product of any rank $R$ tensor with any rank $S$ tensor to get a tensor of rank $R+S$.

From a more algebraic point of view, $(Df)_p$ is a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$, and the second derivative $(D^2f)_p$ is a linear tranformation $ (D^2f)_p : \mathbb{R}^n\otimes\mathbb{R}^n \to \mathbb{R}^m $ where $\otimes$ denotes the tensor product of vector spaces. The object $(Df)_p^2$ is the linear transformation $ (Df)_p^2 : \mathbb{R}^n\otimes\mathbb{R}^n \to \mathbb{R}^m\otimes\mathbb{R}^m $ defined by $ (Df)_p^2(v\otimes w) \;=\; (Df)_p(v) \,\otimes\, (Df)_p(w) $ Since $(D^2g)_{f(p)}$ goes from $\mathbb{R}^m \otimes \mathbb{R}^m$ to $\mathbb{R}^k$, the composition $(D^2g)_{f(p)}\cdot (Df)_p^2$ is defined, and is a linear transformation from $\mathbb{R}^n\otimes\mathbb{R}^n$ to $\mathbb{R}^k$.

  • 0
    @Theorem I don't see a problem. Can you be more specific?2013-11-13