0
$\begingroup$

Does this expression $\frac{da^{\intercal}a}{da}$ evaluates to $2*a$ or $2*a^{\intercal}$ ? Here $a$ is column vector. Derivation steps would be appreciated.

2 Answers 2

3

There is no universally accepted convention about whether the gradient is a row vector or a column vector. In optimization literature, $\nabla f(x)$ is a column vector. In Terence Tao's Real Analysis books, $\nabla f(x)$ is a row vector.

In Calculus on Manifolds by Spivak, if $f:\mathbb R^n \to \mathbb R^m$, then $f'(x)$ is an $m \times n$ matrix. In particular, if $f:\mathbb R^n \to \mathbb R$, then $f'(x)$ is a $1 \times n$ matrix, a row vector.

0

This actually confused me also when I was learning. For instance in my signal processing course, I learned it as a row vector, where as in my optimization course I learned it as a column vector. Then later I came to know its just the difference of how you choose to denote the derivative of a scalar with respect to a vector. Here $a^{T}a$ is a scalar and $a$ is a vector. You can choose it either a column vector or to be row vector. The choice should depend on your domain of study and the prevalent notation in that. You can read more about it in wikipedia. The fact how the constant $2$ comes up should be obvious I guess.