I am new to linear algebra and I have the following doubts:
In weighted least square estimation of the system $Ax = b$ we minimize the weighted value of the error $e = b - Ax$ and the best $\hat{x}$ is given by $( A^T \Sigma^{-1}A )^{-1} A^T\Sigma^{-1} b$ where $\Sigma$ is the covariane matrix of the error $e$. Why is the covariance matrix $\Sigma{e}$ the best choice for the weighting matrix? Is there any derivation for it? Please refer me to its link or sum hints will also do.
For the same linear system $e = b - Ax$ is $E(ee^T) = E(bb^T)$ given that error is unbiased (i.e. $E(e) = 0$)?