The derivative of a function $f:\ {\mathbb R}^n\to{\mathbb R}^m$ at a point $p\in{\rm dom}(f)$ is a linear map $df(p):\ T_p \to T_q$ where $q:=f(p)$, $T_p\cong{\mathbb R}^n$ is the tangent space at $p$, and $T_q\cong{\mathbb R}^m$ is the tangent space at $q$.
As such $df(p)$ has a matrix with respect to the standard bases in $T_p$ and $T_q$, namely
$$\bigl[df(p)\bigr]=\left[\matrix{{\partial f_1\over\partial x_1} & \cdots&{\partial f_1\over\partial x_n} \cr \vdots\cr {\partial f_m\over\partial x_1} & \cdots & {\partial f_m\over\partial x_n} \cr}\right]_p\ .$$
This makes sense for arbitrary $n\geq1$ and $m\geq1$. In the case at hand we have $m=1$, so the above matrix consists of one row only. Now in this case something special arises: The values of the "abstract" linear map $df(p):\ {\mathbb R}^n\to{\mathbb R}$ can be computed using the scalar product in ${\mathbb R}^n$. There is a "concrete" vector $a\in{\mathbb R}^n$ such for all vectors $X\in{\mathbb R}^n$ we have
$$df(p).X\ =\ a\cdot X\ \qquad(X\in T_p).\qquad(1)$$
This vector $a$ is nothing else but the gradient vector of $f$ at $p$, i.e.,
$$a =\nabla f(p):=\bigl(f_{.1}(p),\ldots, f_{.n}(p)\bigr)\ .$$
It is easy to see that for this $a$ the identity $(1)$ holds. In terms of matrices we can say the following: The components of $a$ are the entries in the single row of $\bigl[df(p)\bigr]$, and when you write $a$ (as is usual) as a column vector then this column vector, regarded as a matrix, is the transpose of the matrix $\bigl[df(p)\bigr]$.