0
$\begingroup$

I don't get the part that how can we get the gradient w.r.t $z$ of $z^Ty, z^Tz, z^TAz$ and just $Az$ by using vector notation of

$z, y \in R^n$ and $A \in \Bbb R^{n\times n}$

  • 0
    For clarification: how are you defining the gradient? If it is the classical introductory definition of $$\nabla f = \left(\frac {\partial f}{\partial z_1}, \frac {\partial f}{\partial z_2}, ..., \frac {\partial f}{\partial z_n}\right)$$ then you do the calculation in coordinates because that is how your gradient is defined. Once you are done, then you can express the answer directly in terms of $y, z, A$. As for your last one, the gradient is defined for functions into $\Bbb R$, which $f(z) = Az$ is not. That map into $\Bbb R^n$, so the gradient is not defined for it.2017-02-03

2 Answers 2

1

We can write the functions more explicitly and then so how it turns out: $$ z^Ty=\sum_{i=1}^mz_iy_i \\ z^Tz=\sum_{i=1}^mz_iz_i=\sum_{i=1}^mz_i^2 \\ z^TAz=\sum_{i,j=1}^mz_ja_{ij}z_i $$ For the last one, we will use the product rule anyway.
Now you can differentiate with respect to $z_k$ and see that: $$ \frac{\partial}{\partial z_k}z^Ty=y_k \\ \frac{\partial}{\partial z_k}z^Tz=2z_k $$ For the last one, we will apply the product rule: $$ \frac{\partial}{\partial z_k}z^TAz=\frac{\partial}{\partial z_k}\sum_{j=1}^m \sum_{i=1}^mz_ja_{ij}z_i=\sum_{j=1}^m z_ja_{kj} + \sum_{i=1}^m a_{ik}z_i $$ Overall, you will get:
$$ \nabla z^Ty=y \\ \nabla z^Tz=2z \\ \nabla z^TAz=(A^T+A)z $$ The function $z \to Az$ has no gradient, but the Jacobian Matrix equals $A$

  • 1
    minor notational quibble: you should have $\frac{\partial}{\partial z_k}$ everywhere instead of $\frac{d}{dz_k}$. It may seem unimportant, but the definition of $\frac{d}{dz_k}$ depends only on $z_k$ (other than identifying the point where the derivative is being taken), while the definition of $\frac{\partial}{\partial z_k}$ depends on all of the $z_i$. If you chose another orthonormal basis for $\Bbb R^n$ that happened to include the same $e_k$ vector, the value of $\frac{\partial f}{\partial z_k}$ would change, while $\frac{d f}{d z_k}$ would not.2017-02-03
  • 0
    Thank you so much I really appreciate your help @F. Conrad2017-02-03
  • 0
    that is so true thank you @PaulSinclair2017-02-03
  • 0
    Edited the $\partial$ in instead of the "d". It was meant to be more of a proof squetch, rather than a full proof with all details etc. Thanks for pointing out!2017-02-03
  • 0
    @F.Conrad - I figured as much, but I'm always leery about such notation abuses in places where the less-well-informed could be misled by them. Though I considered it likely you would know why, I gave the explanation anyway for the sake of any readers who might not.2017-02-03
1

Assuming that you mean the entire calculation is to be done without resorting to coordinates, then we have to start with a definition of the gradient that is not with respect to coordinates. The normal introductory definition of $$\nabla f = \left(\frac{\partial f}{\partial z_1},\frac{\partial f}{\partial z_2}, ..., \frac{\partial f}{\partial z_n}\right)$$ is defined by coordinates, so any calculation with it is necessarily done by coordinates.

The coordinate-free definition of $\nabla f$ requires the definition of the directional derivative first: If $v \in \Bbb R^n$, then the directional derivative of $f$ at $z$ in the direction of $v$ is $$D_vf(z) := \left.\frac{d}{dt}\right|_0f(z + tv)$$ Then the gradient is defined to be the unique vector such that $$v\cdot \nabla f(z) = D_vf(z)$$for all vectors $v\in \Bbb R^n$.

So let's examine your problems:

  • $f(z) = z\cdot y$.

Then $f(z + tv) = z\cdot y + tv\cdot y$, so $D_vf(z) = v\cdot y$. From which we see that $\nabla f = y$.

  • $f(z) = z\cdot z$.

Then $f(z + tv) = z\cdot z + t(z \cdot v) + t(v \cdot z) + t^2(v\cdot v)$, do $D_vf(z) = 2( v\cdot z) = v\cdot (2z)$, since $v\cdot z = z \cdot v$. Therefore $\nabla f = 2z$.

  • $f(z) = z\cdot Az$.

Then $f(z + tv) = z\cdot z + t(z \cdot Av) + t(v \cdot Az) + t^2(v \cdot v)$. So $D_vf(z) = (z \cdot Av) + (v \cdot Az) = (A^Tz \cdot v) + (v \cdot Az) = v \cdot (A^Tz + Az)$. Therefore $\nabla f = (A^T + A)z$.

  • 0
    Thank you very much Paul for your edit and the reply, however what I want is to get a separate results of f z^(T)y, z^(T)z, z^(T)Az and Az by using the above vector notation2017-02-03
  • 0
    @Arvin This is the more mathematical and precise version using the definition for arbitrary coordinates. Depending on your needs, this one might be better.2017-02-03
  • 0
    Now I understand what you just did and it seems really precise thank you a lot Paul really appreciate it2017-02-03