What is the difference between a gradient and a derivative? The text I'm reading keeps mentioning that 'the gradient is the transpose of the derivative'.
Does this mean that $ \nabla f(v) = df(v)^T $ and also that $ \nabla f(v) = \frac{df(v)}{dx} $
What is the difference between a gradient and a derivative? The text I'm reading keeps mentioning that 'the gradient is the transpose of the derivative'.
Does this mean that $ \nabla f(v) = df(v)^T $ and also that $ \nabla f(v) = \frac{df(v)}{dx} $
The derivative of a function $f:\ {\mathbb R}^n\to{\mathbb R}^m$ at a point $p\in{\rm dom}(f)$ is a linear map $df(p):\ T_p \to T_q$ where $q:=f(p)$, $T_p\cong{\mathbb R}^n$ is the tangent space at $p$, and $T_q\cong{\mathbb R}^m$ is the tangent space at $q$.
As such $df(p)$ has a matrix with respect to the standard bases in $T_p$ and $T_q$, namely $\bigl[df(p)\bigr]=\left[\matrix{{\partial f_1\over\partial x_1} & \cdots&{\partial f_1\over\partial x_n} \cr \vdots\cr {\partial f_m\over\partial x_1} & \cdots & {\partial f_m\over\partial x_n} \cr}\right]_p\ .$
This makes sense for arbitrary $n\geq1$ and $m\geq1$. In the case at hand we have $m=1$, so the above matrix consists of one row only. Now in this case something special arises: The values of the "abstract" linear map $df(p):\ {\mathbb R}^n\to{\mathbb R}$ can be computed using the scalar product in ${\mathbb R}^n$. There is a "concrete" vector $a\in{\mathbb R}^n$ such for all vectors $X\in{\mathbb R}^n$ we have
$df(p).X\ =\ a\cdot X\ \qquad(X\in T_p).\qquad(1)$
This vector $a$ is nothing else but the gradient vector of $f$ at $p$, i.e., $a =\nabla f(p):=\bigl(f_{.1}(p),\ldots, f_{.n}(p)\bigr)\ .$ It is easy to see that for this $a$ the identity $(1)$ holds. In terms of matrices we can say the following: The components of $a$ are the entries in the single row of $\bigl[df(p)\bigr]$, and when you write $a$ (as is usual) as a column vector then this column vector, regarded as a matrix, is the transpose of the matrix $\bigl[df(p)\bigr]$.
Generally, the derivative of a function $f:\mathbb R^n\to \mathbb R$ at a point $x$ is regarded as a linear map $\mathrm{d}f_x:\mathbb R^n\to \mathbb R$, while the gradient $\nabla f(x)$ if regarded as a vector in $\mathbb R^n$. The two objects (linear maps and vectors) are actually pretty interchangeable in finitely many dimensions, as the space of linear maps from $\mathbb R^n$ to $\mathbb R$, denoted $BL(\mathbb R^n,\mathbb R)$, is isomorphic to $\mathbb R^n$. The elements of $BL(\mathbb R^n,\mathbb R)$ are usually written as matrices with one row (aka column vectors). To move from $BL(\mathbb R^n,\mathbb R)$ to $\mathbb R^n$, we take the transpose. This maps $\mathrm{d}f_x$ to $\nabla f(x)$. Taking the transpose again gets us back to $\mathrm{d}f_x$.