Minimise the residual error gives us the standard least-squares problem: \begin{equation} \text{arg min}_x||Ax-b||^2 \end{equation}
First we rewrite the above as follows: \begin{equation} F(x)=(Ax-b)^T(Ax-b)\\ F(x)= (x^TA^T-b^T)(Ax-b)\\ F(x)= x^TA^TA x- x^TA^Tb-b^TAx+b^Tb\\ F(x) = x^TA^TA x- 2x^TA^Tb+b^Tb \end{equation}
Question 1: Why $x^TA^Tb=b^TAx$ ?
To derive the solution $x$, (with help) I calculated the gradient of the above: \begin{equation} \nabla F(x)= 2x^TA^TA-2A^Tb \end{equation}
\begin{equation} \nabla F(x)=0 \Rightarrow x^TA^TA=A^Tb \end{equation} Question 2: How can I prove that $x=(A^TA)^{-1}A^Tb$