Everywhere is definition of total differential I see the sum of partial derivatives multiplied by appropriate differentials, but there is nowhere clear explanation why it is.
Why does total differential is sum of partial derivatives?
-
0Start by proving the multi-variate chain rule. – 2017-02-14
-
4If a variable is dependent on a number of other variables, then the *total change* in the former (given by the differential element of the dependent variable) must be computed by adding up the changes in all of the latter (given by the individual differential elements of the independent variables) *weighted* (by multiplication) by the respective influence of each variable (given by the partial derivatives). That's the intuitive explanation (of course not a formal rigorous one). – 2017-02-14
-
1@Deepak worthy of being an answer – 2017-02-14
-
0@Deepak "must be computed by adding up the changes in all of the latter" why it must? – 2017-02-14
-
0I don't clearly understand, when imagine 3d chart, why sum of changes in x and y axes gives the real change of z axes. – 2017-02-14
-
0[This question](http://math.stackexchange.com/q/1876559/265466) seems like it has a lot of relevant information. – 2017-02-14
1 Answers
The differential of a function $f:\mathbb R^m\to\mathbb R^n$ is a linear map that is the “best” approximation to the change of $f$ near some point $\mathbf p=(p^1,\dots,p^n)$, i.e., $f(\mathbf p+\mathbf h)=f(\mathbf p)+\operatorname{d}f_{\mathbf p}[\mathbf h]+o(\|\mathbf h\|)$. Restricting ourselves to a scalar-valued function $f:\mathbb R^n\to\mathbb R$, it’s fairly straightforward to show that ${\partial f\over\partial x_k}(\mathbf p)=\operatorname{d}f_{\mathbf p}[\mathbf e^k]$, where $\mathbf e^k$ is the basis vector corresponding to the $x^k$ coordinate. Since a linear map is determined by its action on the basis vectors, in this coordinate system we can write $\operatorname{d}f_{\mathbf v}$ as the row vector $\left({\partial f\over\partial x_1}(\mathbf p),\dots,{\partial f\over\partial x_n}(\mathbf p)\right)$ so that $\operatorname{d}f_{\mathbf p}[\mathbf h]$ becomes simple matrix multiplication (or, if you prefer, a dot product).
Now, the differential $dx^i$ of the affine coordinate function $x^i$ is just a function that assigns to a point $\mathbf p$ its $i$th coordinate. Using the above matrix formulation, this means that $dx^1=(1,0,\dots,0)$, $dx^2=(0,1,0,\dots,0)$, and so on. So we can write $\operatorname{d}f_{\mathbf p}$ as $${\partial f\over\partial x_1}(\mathbf p)(1,0,\dots,0)+\cdots+{\partial f\over\partial x_n}(\mathbf p)(0,0,\dots,1)$$ or $${\partial f\over\partial x_1}dx^1+\cdots+{\partial f\over\partial x_n}dx^n$$ (with the partial derivatives evaluated at $\mathbf p$).
It might help to look at this geometrically. For a scalar-valued function $f$, this linear approximation amounts to approximating the $n$-dimensional hypersurface (in $\mathbb R^{n+1}$) $y=f(\mathbf x)$ at the point $\mathbf p$ by its tangent hypersurface at that point. Just as the derivative of $f$ gives the slope of the tangent line to the curve $y=f(x)$ in the one-dimensional case $f:\mathbb R\to\mathbb R$, in the multidimensional case each partial derivative ${\partial f\over\partial x_i}$ gives the slope of the tangent hypersurface in the $x^i$ direction. The equation of the tangent hypersurface at $\mathbf p$ is thus $$y={\partial f\over\partial x_1}(x^1-p^1)+\cdots+{\partial f\over\partial x_n}(x^n-p^n)=\left({\partial f\over\partial x_1},\cdots,{\partial f\over\partial x_n}\right)(\mathbf x-\mathbf p),$$ with the partial derivatives evaluated at $\mathbf p$. Comparing this to the definition of $\operatorname{d}f_{\mathbf p}$ at the top, we again find that it can be represented as a row vector of partial derivatives, and proceed as before.
-
0It is too difficult. I don't understand your answer since f(p+h)=f(p)+dfp[h]+o(‖h‖) – 2017-02-14
-
0@DmitryNalyvaiko Do you at least understand the connection between the tangent (hyper)plane and partial derivatives described in the last paragraph? If you get that and how the tangent hyperplane relates to the differential, the rest is just a matter of definitions and algebra. – 2017-02-14
-
0> Do you at least understand the connection between the tangent (hyper)plane and partial derivatives described in the last paragraph? No. Can you explain simpler the answer of my question? – 2017-02-14
-
0@DmitryNalyvaiko Have a look at [this question](http://math.stackexchange.com/q/1912660/265466) and [this one](http://math.stackexchange.com/q/1876559/265466) that I referenced earlier. Beyond that, I don’t think I’m going to be able to help you. It’s starting to look like it’ll take more space and time than I have to do so. – 2017-02-15