3
$\begingroup$

I am trying to understand Theorem 9.1 of 1991 copy of Munkres' Analysis on Manifolds. I have stated what I don't understand below; there is a heading in bold. This theorem is a precursor to the implicit function theorem and on my copy of the book is on page 73.

Now on page 72 he states the following definition:

Let $A$ be open in $\Bbb{R}^m$; let $f : A \rightarrow \Bbb{R}^n$ be differentiable. Let $f_1,\ldots,f_n$ be the component functions of $f$. We sometimes use the notation $Df = \frac{\partial(f_1,\ldots,f_n)}{\partial(x_1,\ldots,x_m)}$ for the derivative of $f$. On occasion we shorten this to the notation $Df = \partial f /\partial \Bbb{x}$.

This is all good, so now on to theorem 9.1 (which is where my confusion lies).

Theorem 9.1: Let $A$ be open in $\Bbb{R}^{k+n}$; let $f : A \rightarrow \Bbb{R}^n $ be differentiable. Write $f$ in the form $f(\Bbb{x},\Bbb{y})$, for $\Bbb{x} \in \Bbb{R}^k$ and $\Bbb{y} \in \Bbb{R}^n$; then $Df$ has the form $Df = \Big[ \partial f/\partial \Bbb{x} \hspace{5mm} \partial f / \partial \Bbb{y}\Big].$ Suppose there is a differentiable function $g : B \rightarrow \Bbb{R}^n$ defined on an open set $B$ in $\Bbb{R}^k$, such that $f(\Bbb{x},g(\Bbb{x})) = 0$ for all $\Bbb{x} \in B$. Then for $\Bbb{x} \in B$, $ \frac{\partial f}{\partial \Bbb{x}}(\Bbb{x},g(\Bbb{x})) + \frac{\partial f}{\partial \Bbb{y}}(\Bbb{x},g(\Bbb{x}))\cdot Dg(\Bbb{x}) = 0.$

The dot just before $Dg(\Bbb{x})$ means matrix multiplication.

Now the proof of this goes as follows, given $g$, we can define $h : B \rightarrow \Bbb{R}^{k+n}$ by the equation

$h(\Bbb{x}) = (\Bbb{x},g(\Bbb{x})).$

The hypotheses of the theorem then imply that the composite function $f(h(\Bbb{X})) = f(\Bbb{x},g(\Bbb{x}))$ is defined and equals zero for all $\Bbb{x} \in B$. The chain rule then implies that

$\begin{eqnarray*} 0 &=& Df(h(\Bbb{x}))\cdot Dh(\Bbb{x})\\ &=& \Big[\frac{\partial f}{\partial \Bbb{x}}(h(\Bbb{x})) \hspace{4mm} \frac{\partial f}{\partial \Bbb{y}}(h(\Bbb{x})) \Big] \cdot \left[\begin{array}{c} I_k \\ Dg(\Bbb{x}) \end{array}\right] \end{eqnarray*}.$

What I don't understand: In the last row above, I get the second matrix on the right hand side, the one involving the identity matrix. However for the first matrix, I can see the notation means that it is formed by concatenating two matrices together, one from $\frac{\partial f}{\partial \Bbb{x}}(h(\Bbb{x}))$ and the other from $\frac{\partial f}{\partial \Bbb{y}}(h(\Bbb{x}))$. My problem now is I don't even no what these matrices look like.

I have tried several ways to interpret them, but keep getting tied up. Also, for the second matrix on the right it is of dimensions

$(n + k) \times k$

yes? But if this were so, then how can $Df(h(\Bbb{x}))$ be a map from $\Bbb{R}^{k+n}$ to $\Bbb{R}^n$?

Thanks.

  • 0
    @robjohn Yes it is. I have corrected. Sorry when you commented it was something like 1am here, I was in bed then!2012-06-15

1 Answers 1

4

The map is $Df$, which takes a variable in $\Bbb{R}^{k+n}$ and returns a value in $\Bbb{R}^n$, that is to say, it is (because $Df$ is linear) a $n \times (k+n)$ matrix. The point $h(x)$ is the point at which the different partial derivative are evaluated at (when we turn the matrix entries into actual real numbers), it is not "the argument of the linear function".

Note the composition $f \circ h:\Bbb{R}^k \to \Bbb{R}^n$, so we would expect $D(f \circ h)$ to be a $n \times k$ matrix, which it is.

To make an analogy with the 1-dimensional case: when we calculate the slope of a function $f$ at a point $x=a$, we get that the slope is $f'(a)$. A 1-dimensional linear map is a function of the form:

$L(x) = cx$, for some real number $c$, that is a 1x1 matrix. "Which number" we put in the matrix depends on the point "$a$" where we are finding the slope, but $a$ is not what we take "$L$ of".

EDIT: specifically, the matrix $Df$ at the point $X = (x_1,\dots,x_k,\dots,x_{k+n})$ is given by:

$\begin{bmatrix}\frac{\partial f_1}{\partial x_1}(X)&\dots&\frac{\partial f_1}{\partial x_{k+n}}(X)\\ \vdots&\ddots&\vdots \\\frac{\partial f_n}{\partial x_1}(X)&\dots&\frac{\partial f_n}{\partial x_{k+n}}(X) \end{bmatrix}$

if we write $X = (x,y)$ for $x \in \Bbb{R}^k, y \in \Bbb{R}^n$, then at the point $h(x) = (x,g(x))$ $Df$ has the form:

$\begin{bmatrix}\frac{\partial f_1}{\partial x_1}(h(x))&\dots&\frac{\partial f_1}{\partial x_{k+n}}(h(x))\\ \vdots&\ddots&\vdots \\\frac{\partial f_n}{\partial x_1}(h(x))&\dots&\frac{\partial f_n}{\partial x_{k+n}}(h(x)) \end{bmatrix}$

which we can split into two "blocks":

$\begin{bmatrix} \frac{\partial f}{\partial x}(h(x))& \frac{\partial f}{\partial y}(h(x)) \end{bmatrix}$

where the first $k$ columns are the partials with respect to the first $k$ coordinates of $X$, and the last $n$ columns are the partials with respect to the last $n$ coordinates of $X$.

  • 0
    The issue is confusing, because we tend to think of the derivative of$a$single-valued function as another function, and not the linear function $L(x) = f'(a)x$. You can still take this approach, but then $Df$ is a matrix-valued function of a vector-valued input, so then $Df$ is an esssentially "different kind of thing" than f. In this view $Df:\Bbb{R}^n \to \Bbb{R}^{mn}$. The trouble with this is that it obscures the fact that we can "multiply certain vectors" using matrix multiplication.2012-06-15