4
$\begingroup$

I have a question regarding the differential $d_{\textbf a} f$.

Suppose we have the function $f(x,y)= xy$, and the vectors $\textbf a = (1,1)$ and $\textbf u = (2,1)$. Then, if I understand this correctly, $$d_{\textbf a} f(\textbf u) = \nabla f(\textbf a) \cdot \textbf u = (1,1)\cdot (2,1) = 2+1 = 3,$$ where $\nabla f(\textbf a) = (\partial f/\partial x, \partial f/\partial y)$. But what if my assignment is to calculate $d_{\textbf a} f$? I don't know what it means. Do they want me to calculate $d_{\textbf a} f(x,y) = (1,1)\cdot (x,y) = x+y$, or something else?

Edit: Note that it is not the directional derivative that I'm asking about.

  • 0
    What is $v$ supposed to be in the above equation? Since $d_a$ would normally refer to the directional derivative, please also give more information about the context.2011-05-30
  • 0
    I think the OP means $\mathbf{u}$ rather than $\mathbf{v}$ in the displayed equation.2011-05-30
  • 0
    There's a minor point that I'm curious about: does your book refer to $\mathbf{a} = (1,1)$ as a "point" or a "vector"? The distinction doesn't really matter mathematically, but sometimes it helps psychologically.2011-05-30
  • 0
    What I mean is that we usually think of taking the gradient $\nabla f$ at a _point_ $a = (1,1)$, and then taking the dot product of that with the _vector_ $\mathbf{u} = (2,1)$. Similarly, the differential $d_af$ is regarded as being at the _point_ $a$.2011-05-30
  • 0
    As you mentioned in a previous post, the notation is $$D_\mathbf{v}f(a) = \nabla f(a)\cdot \mathbf{v} = d_af(\mathbf{v}),$$ where $D_\mathbf{v}f(a)$ is the directional derivative in the _direction_ of the _vector_ $\mathbf{v}$ evaluated at the _point_ $a$.2011-05-30
  • 0
    @Alexander: Jesse is right, I meant $\textbf u$. With $d_{\textbf a}$, I mean the differential. I'm not sure that I am able to give more information about the context. Isn't the differential clearly defined?2011-05-30
  • 0
    Eivind: I think the question is clear. I'm writing an answer in a moment.2011-05-30
  • 0
    @Jesse: The book refers to $\textbf a$ as a point, but it is still a vector, or isn't it?2011-05-30
  • 0
    By the way: your calculations are correct and you're almost done, the calculations need only be interpreted and I'll explain it in a moment. Concerning your question to Jesse: Yes, $\mathbf{a}$ is a point in $\mathbb{R}^2$, that is to say a vector.2011-05-30

2 Answers 2

5

Essentially, you have worked out everything already, but there seems to be a bit of confusion about the definitions, so let me try to set this straight.

The differential of $f$ at the point $\mathbf{a} \in \mathbb{R}^2$ is the row matrix $$ d_{\mathbf{a}}f = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) & \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix}.$$

Now if you write $d_{\mathbf{a}}f (\mathbf{u})$ for $\mathbf{u} = \begin{pmatrix} u_1 \\\ u_2 \end{pmatrix} \in \mathbb{R}^2$ you're meaning the matrix product $$d_{\mathbf{a}}f (\mathbf{u}) = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) & \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix} \cdot \begin{pmatrix} u_1 \\\ u_2 \end{pmatrix} = \frac{\partial}{\partial x} f(\mathbf{a}) \cdot u_1 + \frac{\partial}{\partial y}f (\mathbf{a}) \cdot u_2 .$$

On the other hand, $\nabla f (\mathbf{a})$ is the column vector $$ \nabla f (\mathbf{a}) = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) \\\ \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix}$$ and when you're writing $\nabla f (\mathbf{a}) \cdot \mathbf{u}$ you're meaning the scalar product $$\nabla f( \mathbf{a}) \cdot u = \begin{pmatrix} \frac{\partial}{\partial x} f(\mathbf{a}) \\\ \frac{\partial}{\partial y}f (\mathbf{a}) \end{pmatrix} \cdot \begin{pmatrix} u_1 \\\ u_2 \end{pmatrix} = \frac{\partial}{\partial x} f(\mathbf{a}) \cdot u_1 + \frac{\partial}{\partial y}f (\mathbf{a}) \cdot u_2 . $$

So we see that for $f(x,y) = xy$ $$d_{\mathbf{a}}f = \begin{pmatrix} y & x \end{pmatrix} \qquad \text{while} \qquad \nabla f (\mathbf{a}) = \begin{pmatrix} y \\\ x \end{pmatrix}.$$

Now the confused reaction was due to the fact that the notation used here for the derivative of $f$ at the point $\mathbf{a}$ is often used as the directional derivative, and as you rightly pointed out in a comment, we have the relations $$ D_{\mathbf{u}} f (\mathbf{a}) : = d_{\mathbf{a}} f (\mathbf{u}) = \nabla f(\mathbf{a}) \cdot \mathbf{u},$$ and everything should be fine now, no?

Since you made the computations yourself already, I'll not repeat them here.

  • 0
    I'm having a little trouble with understanding the difference in using matrices and matrix product vs. vectors and scalar product. Since the results are the same, I guess must be because of some underlying theoretical difference between $\nabla f(\textbf a)$ and $d_{\textbf a} f$? Also, I came over this definition: $df=\frac{\partial f}{\partial x}(\textbf a) \cdot dx + \frac{\partial f}{\partial y}(\textbf a) \cdot dy$. How does this fit in with the rest of it?2011-05-30
  • 0
    @Eivind: Okay, I see. As I mentioned in my comment to Jesse's answer, $d_{\mathbf{a}}f$ is a *linear map* $d_{\mathbf{a}}f : \mathbb{R}^2 \to \mathbb{R}$. Now from linear algebra you might know that every linear map $\phi: \mathbb{R}^2 \to \mathbb{R}$ is of the form $\phi(v) = \langle x_\phi, v \rangle$ for a unique vector. The vector corresponding to $d_{\mathbf{a}}f$ is $\nabla f(\mathbf{a})$, that is $d_{\mathbf{a}}f(\mathbf{u}) = \langle \nabla f (\mathbf{a}), \mathbf{u} \rangle$. Note that the formulae for the matrix product and the scalar product are similar, but they mean rather ...2011-05-30
  • 0
    ... different things!2011-05-30
  • 0
    Let $\mathbf{u} = \begin{pmatrix} u_1 \\\ u_2 \end{pmatrix}$. The map $dx$ is a linear form and $dx (\mathbf{u}) = u_1$ and analogously $dy(\mathbf{u}) = u_2$, thus writing $d_{\mathbf{a}}f = \frac{\partial}{\partial x}f(\mathbf{a)} dx + \frac{\partial}{\partial y}f(\mathbf{a)} dy$ means when evaluating at $\mathbf{u}$ that $d_{\mathbf{a}}f (\mathbf{u}) = \frac{\partial}{\partial x}f(\mathbf{a)} dx (\mathbf{u}) + \frac{\partial}{\partial y}f(\mathbf{a)} dy (\mathbf{u}) = \frac{\partial}{\partial x}f(\mathbf{a)} u_1 + \frac{\partial}{\partial y}f(\mathbf{a)} u_2$ again, so it's the same thing.2011-05-30
  • 0
    @Theo I like your explanation here but have a follow-up. Without invoking, say, the Jacobian and its role in characterizing the derivative, is there an even more elementary way to show that the differential is a row vector and the gradient is a column vector? Said differently, from first principles, how can one know that the differntial is, in fact a row vector and the gradient is a column vector?2011-05-30
  • 0
    @3Sphere: If $f: U \subset \mathbb{R}^{n} \to \mathbb{R}$ then $d_{a}f$ is the unique linear map $d_af:\mathbb{R}^n \to \mathbb{R}$ satisfying $f(a + h) - f(a) = (d_af)(h) + o(|h|)$ (if it exists). So, choosing bases we get a $1 \times n$-matrix. Choosing $h = e_{i}$ we see that necessarily $d_{a}f(e_i) = \dfrac{\partial f(a)}{\partial x_i}$. On the other hand, to define the gradient, we need a scalar product: given a scalar product we get $\nabla f(a)$ as the unique *vector* such that $d_a f (u) = \langle \nabla f(a), u \rangle$. Having chosen a basis and the associated SP ...2011-05-31
  • 0
    ...we get that $\nabla f(a)$ is the usual gradient. However, we need not choose that scalar product, we could choose another one, and the gradient would no longer have the familiar form. Does this answer your question? Ah, and thanks by the way! :)2011-05-31
  • 0
    @Theo Makes perfect sense now, especially in view of the manner in which you defined the gradient. I was unaware that the gradient could be defined in this particular way. Thanks for the clarification.2011-05-31
  • 0
    @3Sphere: Ok, great! There were some typos in my last comments: the formula I used for defining $d_a f$ should read "$f(a + h) - f(a) = (d_af)(h) + o(|h|)$ *as $|h| \to 0$*". Then I should choose $h = t e_1$, divide by $|t|$ and let $|t| \to 0$ to get the formula for the Jacobian. This definition seems to depend on the choice of a norm, but it doesn't, as all norms on $\mathbb{R}^n$ are equivalent. I reiterate: the differential is *always* defined and involves no choices, while *the gradient only makes sense in presence of a scalar product*. The familiar formula is using the standard SP.2011-05-31
  • 0
    @3Sphere and Theo: Thanks to both of you for making this clear. I think I get the idea now. Theo, let's see if I understand you correctly. If this was an exam, would $d_{\textbf a} f = (y \quad x)$ be the correct answer? Also, I don't think that the book I use says anything about a difference between row/column matrices and vectors, so thank you for explaining that to me.2011-05-31
  • 0
    @Eivind: Yes, that would be the perfectly correct answer. Does the book you're using really write calculations the you write it in your question $d_{\mathbf{a}}f(\mathbf{u}) = \cdots = 3$? That must be quite confusing... I'm curious: what book is it?2011-05-31
  • 0
    @Theo: When it comes to the differential, the book uses $df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy$. The formula $d_a f(u) = \nabla f(a) \cdot u$ was given in a lecture. I can't find it in my book (Vector Calculus by Colley). But it introduces the gradient vector as $\nabla f=(\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_2})$ (in that notation), and this is the notation I am used to, when it comes to vectors.2011-05-31
  • 0
    Then it says: "Alternatively, we can use matrix notation and define the derivative of $f$ at $a$, denoted $Df(a) = [f_{x_1}(a)\quad \cdots \quad f_{x_n}(a)]$." It seems to me that $Df(a)$ is the Jacobian matrix. Then $Df(a)$ (row matrix) is multiplied by $\textbf h$, which is suddenly in column matrix form (but it doesn't say why, other that it's convenient). This makes me a little confused. Of course, it could be that I'm missing something from the text.2011-05-31
  • 0
    @Eivind: I see. So the book looks at the differential as a $1$-form (does it use that word?), that is, a linear map $\mathbb{R}^2 \to \mathbb{R}$ and writes it "invariantly" that is independently of coordinates. Now once you *choose* coordinates (that is, a basis), $1$-forms become identified with row matrices and what you write $Df(a)$ is indeed what is usually called the Jacobian. Probably, the point that the book is trying to make is: If $V = \mathbb{R}^2$, vectors are *abstract entities* that happen to have a concrete incarnation as *column matrices* **once you choose a basis**...2011-05-31
  • 0
    ...similarly, linear maps $g: V \to W$ are maps satisfying $g(v+w) = g(v) + g(w)$ and $g(\lambda v) = \lambda g(v)$ and can only interpreted as matrices *after* you choose bases of both $V$ and $W$. If you want to *multiply* matrices (= compose linear maps), they have to be in the correct form, that is $AB$ is only defined if the number of columns of $A$ is equal to the number of rows of $B$.2011-05-31
  • 0
    @Theo: Actually, the book uses that word (1-form), but that part is not syllabus in the course I'm taking at the moment (Calculus 2), so I have not read it. The same book is also used in Calculus 3. But I think I understand what you mean.2011-05-31
  • 0
    @Eivind: In fact, it took me quite some time to see why these rather subtle distinctions are made and why they are so important. Don't worry too much about them now, they will become much clearer soon. Try to get used to calculating the various forms of derivatives, try to get some feeling for them and as soon as you've mastered the formal busines, the theoretical distinctions will become much easier than they may seem now. That's the best piece of advice I can give you right now (and many people might disagree).2011-05-31
-1

In the case of functions $f\colon \mathbb{R}^n \to \mathbb{R}$, like $f(x,y) = xy$ as you have, the differential $d_af$ is the same thing as the gradient $\nabla f(a)$.

  • 4
    No, it's *not* the same thing. One is a linear map $d_af: \mathbb{R}^n \to \mathbb{R}$ (hence a row matrix once you choose a basis), the other is a vector $\nabla f(a) \in \mathbb{R}^n$). They're related by $(d_af)(v) = \langle \nabla f(a), v \rangle$. **But:** The linear map is invariantly defined (independently of a basis), the gradient is only defined when a scalar product is around (the standard one once you've chosen a basis).2011-05-30
  • 0
    @Theo: Yes, you're right of course, but I was trying to keep it simple.2011-05-30