Here is a reformulation of the previous answers and comments which I hope will be somewhat helpful to the OP.
A. The problem you are interested in is the following: given an inner product $\langle \cdot, \cdot \rangle$ find $x$ such that $\langle b - Ax, b - Ax \rangle$ is minimized.
When $\langle \cdot, \cdot \rangle$ is the ordinary inner product, this is the ordinary least squares solution. When $\langle x, y \rangle = x^T W y$ where $W$ is some positive diagonal matrix, this the weighted case you are interested in.
B. The solution will satisfy the following optimality criterion: the error must be orthogonal to the column space of $A$.
Formally, let $a_1, \ldots, a_n$ be the columns of $A$. Then the optimal $x^*$ will satisfy $ \langle a_i, b-Ax^* \rangle = 0 $ for all $i$.
Why? Because if the error could be orthogonally decomposed as
$ b- Ax = x_{R(A)} + x_{R(A)^\perp}$ where $x_{R(A)} \neq 0$ is the projection onto the range of $A$, and $x_{R(A)^\perp}$ is a projection onto its a complement, then we could pick a different $x$ to get a smaller error. Indeed, $ \langle b - Ax, b-Ax \rangle = \langle x_{R(A)}, x_{R(A)} \rangle + \langle x_{R(A)^\perp}, x_{R(A)^\perp} \rangle $ by the Pythagorean theorem. Now if $x_{R(A)} = Ay$, then $ \langle b-A(x+y), b-A(x+y) \rangle = \langle x_{R(A)^\perp}, x_{R(A)^\perp} \rangle$ which is smaller.
C. For the case of the ordinary inner product, the above optimality principle can be restated as $ A^T (b-Ax^*) = 0$ which immediately gives you your least-squares solution; and for the case of the weighted inner product, it can be restated as $ A^T W (b-Ax^*)=0$ which immediately gives you the weighted solution.