Reading script I found following :
$$(X\beta-Y)^T(X\beta-Y)$$ ,where $X$ is a matrix $\dim X=n\times p$, $\dim Y=n\times 1$ and $\dim\beta=p\times 1$.
In next step we calculate gradient of it with respect to $\beta$ and get: $$X^T(X\beta-Y)$$ I don't see how the last step happened. Why we derive in such a way not like in regular derivative of product $$(fg)'=f'g+fg'$$