$\def\R{{\bf R}}$It's partly an issue of naming. The gradient is most often defined for scalar fields, but the same idea exists for vector fields - it's called the Jacobian.
Taking the gradient of a vector valued function is a perfectly sensible thing to do. You just don't usually call it the gradient.
A neat way to think about the gradient is as a higher-order function (i.e. a function whose arguments or return values are functions). Specifically, the gradient operator takes a function between two vector spaces $U$ and $V$, and returns another function which, when evaluated at a point in $U$, gives a linear map between $U$ and $V$.
We can look at an example to get intuition. Consider the scalar field $f:\R^2\to\R$ given by
$f(x,y) = x^2+y^2$
The gradient $g=\nabla f$ is the function on $\R^2$ given by
$g(x,y) = \left(2x, 2y\right)$
We can interpret $(2x,2y)$ as an element of the space of linear maps from $\R^2$ to $\R$. I will denote this space $L(\R^2,\R)$.
Therefore $g=\nabla f$ is a function that takes an element of $\R^2$ and returns an element of $L(\R^2,\R)$. Schematically,
$g: \R^2 \to L(\R^2 ,\R)$
This means that $\nabla$ should be interpreted as a higher-order function
$\nabla : (\R^2 \to \R) \to (\R^2 \to L(\R^2, \R))$
There's nothing special about $\R^2$ and $\R$ here. The construction works for any vector spaces $U$ and $V$, giving
$\nabla : (U\to V) \to (U \to L(U,V))$
A good reference for this way of thinking about the gradient is Spivak's book Calculus on Manifolds.