0
$\begingroup$

Reading script I found following :

$$(X\beta-Y)^T(X\beta-Y)$$ ,where $X$ is a matrix $\dim X=n\times p$, $\dim Y=n\times 1$ and $\dim\beta=p\times 1$.

In next step we calculate gradient of it with respect to $\beta$ and get: $$X^T(X\beta-Y)$$ I don't see how the last step happened. Why we derive in such a way not like in regular derivative of product $$(fg)'=f'g+fg'$$

1 Answers 1

1

Note that $(Ax-b)^{\top}(Ax-b)=(x^{\top}A^{\top}-b^{\top})(Ax-b)=x^{\top}A^{\top}Ax-x^{\top}A^{\top}b-b^{\top}Ax-b^{\top}b$, also say we have a function $\alpha=y^{\top}Ax$ then that equals $\alpha^{\top}=x^{\top}A^{\top}y$ since they are both scalars. Now which of those two is more intuitive when you consider taking the partial derivative with respect to $y$? Can you take it from there?

Furthermore, you might have forgotten a $\frac{1}{2}$ in front of your first equation.