8
$\begingroup$

Cliff Taubes wrote in his differential geometry book that:

We now calculate the directional derivatives of the map $M\rightarrow M^{-1}$ Let $\alpha\in M(n,\mathbb{R})$ denote any given matrix. Then the directional derivatives of the coordinates of the map $M\rightarrow M^{-1}$ in the drection $\alpha$ are the entries of the matrix $-M^{-1}\alpha M^{-1}$ Consider, for example, the coordinate given by the $(i,j)$th entry, $(M^{-1})_{ij}$. The directional derivative in the drection $\alpha$ of this function on $GL(n,\mathbb{R})$ is $-(M^{-1}\alpha M^{-1})_{ij}$ In particular, the partial derivative of the function $M\rightarrow (M^{-1})_{ij}$ with respect to the coordinate $M_{rs}$ is $-(M^{-1})_{ir}(M^{-1})_{sj}$.

I am wondering why this is true. He did not give any deduction of this formula, and all the formulas I know for matrix inverse does not generate anything similar to his result. So I venture to ask.

  • 0
    I would believe that, since $M^{-1}_{ij}=\frac{1}{\det(M)}(-1)^{i+j}A_{ij}$. But this would lead to something like $A*e_{ij}$, which is not what the formula is.2012-09-03

2 Answers 2

12

Not sure if this is the type of answer you want, since I'm giving another argument rather than explain his argument. However, this is how I usually think of it.

Let $M$ be a matrix and $\delta M$ the infinitesimal perturbation (e.g. $\epsilon$ times the derivative). Now, let $N=M^{-1}$ and $\delta N$ the corresponding perturbation of the inverse so that $N+\delta N=(M+\delta M)^{-1}$. Including only first order perturbations (i.e. ignoring terms with two $\delta$s), this gives $ \begin{split} I=&(M+\delta M)(N+\delta N)=MN+M\,\delta N+\delta M\,N\\ &\implies M\,\delta N=-\delta M\,N=-\delta M\,M^{-1}\\ &\implies \delta N=-M^{-1}\,\delta M\,M^{-1}.\\ \end{split} $ Written in terms of derivatives, i.e. $M'=dM/ds$ and $N'=dN/ds$ where $M=M(s)$ and $N=N(s)$ and $M(s)N(s)=I$, the same would be written $ 0=I'=(MN)'=M'N+MN'\implies N'=-M^{-1}\,M'\,M^{-1}. $


To address some of the comments, although a bit belatedly:

For example, if you let $M(s)=M+s\Delta M$, this makes the derivative $M'(s)=\Delta M$ for all $s$. This makes $N(s)=M(s)^{-1}=(M+s\Delta M)^{-1}$, and you can use $M(s)\cdot N(s)=I$, and differentiate to get the above expressions.

For any partial derivative, e.g. with respect to $M_{rs}$, just set $\Delta M$ to be the matrix $E^{[rs]}$ with $1$ in cell $(r,s)$ and zero elsewhere, and you get $ \frac{\partial}{M_{rs}} M^{-1} = -M^{-1}\frac{\partial M}{\partial M_{rs}} M^{-1} = -M^{-1} E^{[rs]} M^{-1} $ which makes cell $(i,j)$ of the inverse $ \frac{\partial (M^{-1})_{ij}}{\partial M_{rs}} = -(M^{-1})_{ir}(M^{-1})_{sj}. $

  • 0
    @GeorgesElencwajg: This is a very helpful remark, because the reason$I$was confused with the problem is basically how to write the derivative in coordinates. Your suggestion clarified everything.2012-09-04
8

I have the following result. I am assuming you already proved that the inversion map (I will call it $f$) is differentiable. We will look at the total derivative $Df(A)$ at $A\in GL(n,\mathbb{R})$.

Take the identity map $Id:GL(n,\mathbb{R})\to GL(n,\mathbb{R}):A\mapsto A$ and the map $g:GL(n,\mathbb{R})\to GL(n,\mathbb{R}):A\mapsto A\cdot A^{-1}=I_n$. Note that the derivative of $Id$ is $DId(A)(H)=Id(H)=H$ for $A,H\in GL(n,\mathbb{R})$ since $Id$ is a linear map. Furthermore, note that $g=Id\cdot f$ and that since $g$ is a constant map, it's derivative is the zero matrix. Here I use the following result that I will prove later on:

Let $h,k:GL(n,\mathbb{R})\to GL(n,\mathbb{R})$ be differentiable at $A\in GL(n,\mathbb{R})$. Then $D(h\cdot k)(A)(H)=Dh(A)(H)k(A)+h(A)Dk(A)(H)\;\text{for}\; H\in GL(n,\mathbb{R})$ From this follows: $Dg(A)(H)=DId(A)(H)f(A)+Id(A)Df(A)(H)$ $0=H\cdot f(A)+A\cdot Df(A)(H)$ $-H\cdot A^{-1}=A\cdot Df(A)(H)$ $-A^{-1}HA^{-1}=Df(A)(H)$ Which is the desired result. Now we have to show that the result I used is true. This is a bit iffy since I will prove it for functions on $\mathbb{R}^n$ and since there exists an isomorphism of vector spaces between $n\times m$-matrices and the metric space $\mathbb{R}^{nm}$ I think it also holds for matrices. Input is welcome but here it goes:

Suppose we have two functions $f:U\to\mathbb{R}^{n_1n_2}$ and $g:U\to\mathbb{R}^{n_2n_3}$ that are differentiable at $x_0$ with $U\subset\mathbb{R}^m$ an open subset. Define $\phi:\mathbb{R}^{n_1n_2}\times\mathbb{R}^{n_2n_3}\to\mathbb{R}^{n_1n_3}:(x,y)\mapsto xy$. Note that $h$ is bilinear and thus is differentiable with derivative: $Dh(x,y)(v,w)=h(v,y)+h(x,w)=vy+xw$ (nice exercise to prove this).

We define $k:U\to\mathbb{R}^{n_1n_2}\times\mathbb{R}^{n_2n_3}:x\mapsto (f(x),g(x))$. Note that $k$ is differentiable at $x_0$ if and only if it's components are. But it's components are $f$ and $g$ and so differentiable at $x_0$ by definition, thus $k$ is differentiable at $x_0$. Similarly the derivative of $k$ is the vector of derivatives of it's components.

By the Chain Rule $h\circ k$ is differentiable at $x_0$ with derivative: $D(h\circ k)(x_0)=Dh(k(x_0))\circ Dk(x_0)$ $D(h\circ k)(x_0)=Dh((f(x_0),g(x_0))\circ (Df(x_0),Dg(x_0))$ $D(h\circ k)(x_0)=Df(x_0)g(x_0)+f(x_0)Dg(x_0)$ The last part was obtained by using the identity for the derivative of bilinear maps I gave earlier.

Hope this is clear and any additions to the solution are welcome!

  • 0
    Very helpful answer. Thank you.2015-07-04