I'm familiar with calculating derivatives of vectors, and with some searching here on StackExchange, I see how to calculate derivatives of a scalar and a matrix or a vector and a matrix, but I'm a bit puzzled on how to calculate the derivative of a matrix and its transpose. I know the solution is $X^T(y-XW)$, but I'm not at all sure how that was obtained.
If I let $U = (y-Xw)^T$, $V = (y-Xw)$ where y and w are vectors, X is a matrix, and T is the transpose, then apply the familiar product rule from calculus, I get $-X^T(y-Xw) + (y-Xw)^T -X$. In the solution above, it looks like U was differentiated and V was left alone. What am I missing?