I'm currently watching a lecture on Machine Learning at Stanford university. The lecturer defines $f$ to be a mapping from $\mathbb{R}^{m \times n}$ to $\mathbb{R}$. I understand this to be a function that maps a matrix to a scalar value, i.e. trace, determinant.
However, he then goes on to say that the derivative of such a function is defined by the derivative of $f$ with respect to each element which to me would be $\frac{df}{dA_{mn}} : \mathbb{R} \to \mathbb{R}$. That doesn't make sense : / it's derivative would therefore have a different mapping, which can't be true.
My explanation isn't fantastic so refer to this link http://www.youtube.com/watch?v=5u4G23_OohI#t=3363s (it'll take you straight to the relevant time)