5
$\begingroup$

I am having trouble grokking why it is, assuming that the function is analytic everywhere (and many other assumptions that I am, no doubt, naively assuming), that this is true:

$f(x,y)=f(x_0,y_0)+[f'_x(x_0,y_0)(x-x_0)+f'_y(x_0,y_0)(y-y_0)]+\frac{1}{2!}[f''_{xx}(x_0,y_0)(x-x_0)+2f''_{yx}(x_0,y_0)(x-x_0)(y-y_0)+f''_{yy}(x_0,y_0)(y-y_0)^2]+...$

I am familiar with the one-variabled Taylor series, and intuitively feel why the 'linear' multivariable terms should be as they are.

In short, I ask for a proof of this equality. If possible, it would be nice to have an answer free of unnecessary compaction of notation (such as table of partial derivatives).

As a auxiliary question, I see a direct analogy with the first 2 terms $f(x,y)=f(x_0,y_0)+[f'_x(x_0,y_0)(x-x_0)+f'_y(x_0,y_0)(y-y_0)]$ and the total differential $f(x,y)-f(x_0,y_0)=\Delta f(x,y)=f'_x(x_0,y_0)\Delta x+f'_y(x_0,y_0)\Delta y$.

When $\Delta x $ and $\Delta y $ are not infinitesimally small, can I use the third term in the Taylor multivariable series to get closer to the real total differential?

5 Answers 5

11

Let $\phi(\boldsymbol{r})$ be a scalar field, and $\boldsymbol{a} \cdot \nabla \phi$ gives the directional derivative of $\phi$ in the direction of $a$. That is,

$$\boldsymbol{a} \cdot \nabla \phi(\boldsymbol{r}) = \lim_{t\to 0} \frac{\phi(\boldsymbol{r} + \boldsymbol{a} t) - \phi(\boldsymbol{r})}{t}$$

Now let's consider $\Phi(t) = \phi(\boldsymbol{r}_0 + \boldsymbol{a}t)$ for some finite $t$. Now, let's expand this in powers of $t$. This is a one-dimensional Taylor series.

$$\Phi(t) = \Phi(0) + \Phi'(0)t + \frac{1}{2!} \Phi''(0) t^2 + \ldots$$

To substitute back in $\Phi(t) = \phi(\boldsymbol{r}_0+\boldsymbol{a}t)$, we must compute derivatives of $\Phi$ in terms of $\phi$. Again, we resort to the basic definition of the derivative.

$$\Phi'(0) = \lim_{t\to 0} \frac{\phi(\boldsymbol{r}_0+\boldsymbol{a}t) - \phi(\boldsymbol{r}_0)}{t} = \boldsymbol{a} \cdot \nabla \phi(\boldsymbol{r})\Big|_{\boldsymbol{r}=\boldsymbol{r}_0}$$

And similarly for higher derivatives. This enables us to write,

$$\phi(\boldsymbol{r}_0+\boldsymbol{a}t) = \phi(\boldsymbol{r}_0) + [\boldsymbol{a} \cdot \nabla \phi(\boldsymbol{r})] \Big|_{\boldsymbol{r}=\boldsymbol{r}_0} t + \frac{1}{2!} [\boldsymbol{a} \cdot \nabla][\boldsymbol{a} \cdot \nabla]\phi(\boldsymbol{r}) \Big|_{\boldsymbol{r}=\boldsymbol{r}_0} t^2 + \ldots$$

It is not difficult to show that this form reproduces the form of the original question. Take $t=1$ and let $\boldsymbol{a} = (x-x_0, y-y_0)$ and $\boldsymbol{r}_0 = (x_0, y_0)$. Thus, we have built multivariate Taylor series from the well-established case of a single variable, just by use of the directional derivative.

  • 0
    Concise and understandable- excellent answer, thank you.2012-10-27
3

I think the easiest way to understand this is coming from the place of operators and linear transformations. A Taylor series in one dimension can be understood by exponentiating the derivative operator:

$$ f(x+a) = e^{a\frac{d}{dx}}f(x) = f(x) + af^\prime(x) + \frac{1}{2!}a^2f^{\prime\prime}(x)+... $$

You can see this in one way as follows. The infinitesimal (linear order) transformation $f(x+dx) = f(x) + dx f^\prime(x)$ is known, and we can build up the finite transformation by an infinite succession of infinitesimal transformations:

$$ f(x+a) = \lim_{N\rightarrow\infty} \left(1+\frac{a}{N}\frac{d}{dx}\right)^N f(x) = e^{a\frac{d}{dx}} f(x). $$

It is straightforward to extend this to multiple variables if we know the infinitesimal transformation (sometimes referred to as the generator), which you intuitively know as, $f(x+dx, y+dy) = f(x,y) + dx\frac{\partial}{\partial x}f(x,y) + dy\frac{\partial}{\partial y}f(x,y)$.

The finite transformation is then, $$ f(x+a,y+b) = e^{a\frac{\partial}{\partial x}+b\frac{\partial}{\partial y}} f(x,y)\\ = \left[1+a\frac{\partial}{\partial x} + b\frac{\partial}{\partial y} + \frac{1}{2!}\left(a^2\frac{\partial^2}{\partial x^2} + 2a\frac{\partial}{\partial x}b\frac{\partial}{\partial y}+ b^2\frac{\partial^2}{\partial y^2}\right) + ...\right]f(x,y). $$

2

Let $u \in \mathbb{R}^m, \, h \in \mathbb{R}^m, \, t \in \mathbb{R},$ and $F(t)=f(u+th).$ Suppose that $F$ can be expanded into Taylor's series $$F(t)=\sum\limits_{n=0}^{\infty}{\frac{1}{n!}}F^{(n)}(0)t^n.\tag{*}$$ Taylor's expansion for $f$ can be obtained from $({}^{*})$ by differentiating $f$ and then put $t=1$.

For the case $n=2$ $$f(u+h)=\sum\limits_{n=0}^{\infty}{{\frac{1}{n!}}d^{n}f(u)},$$ where $u=(x, \, y)\quad h=(dx,\, dy),$ $$d^{n}f(u)=\sum\limits_{k=0}^{n}{\binom{n}{k}}\frac{\partial^n{f}}{\partial{x}^k {}\partial{y}^{n-k}}dx^kdy^{n-k}.$$

  • 0
    Isn't that just an explanation for the single-variable expansion, or is a generalisation for the m-variable expansion? If so, why is $F''(t)=f''_{xx}(x,y) (x)^2+f''_{yx}(x,y)(x)(y)+f''_{yy}(x,y)(y)^2$? Most possibly a mundane thing, but this is what I'm having trouble with. I'm happy with your definition as a generalisation, but not how to apply it.2012-10-26
  • 0
    But *why* is the last equality correct?2012-10-27
  • 0
    because the sequence of partial derivatives you take to get an nth order approximation could be anything you want. If there are two variables than the number of distinct sequences becomes the binomial coefficient. Another way to think of it is that in order to isolate the coefficiants we must take partial derivatives of y-y0)^r and x-x0)^(n-r), r, and n-r times respectively for both variables, then divide by the factorials we created in order to isolate the coefficiants. These factorial coefficiants can be nicely factored by matrices2017-07-30
0

If you know the definition of gradient vectors, you can actually get a more concise answer. You can check it out: http://www.math.ucdenver.edu/~esulliva/Calculus3/Taylor.pdf.

-2

Intuitively its quite clear: the multivariable analog of the first derivative is the gradient, which is exactly the second term evaluated at Ro. The second derivitave generalizes to the Hessian, which is best represented in matrix form, and that is your second term... The trick in deriving this is to define an s=R-R0 so g(s) = f(R-Ro) and use the chain rule/mean value theorem as usual.