Here is a reformulation of the previous answers and comments which I hope will be somewhat helpful to the OP.
A. The problem you are interested in is the following: given an inner product $\langle \cdot, \cdot \rangle$ find $x$ such that $$\langle b - Ax, b - Ax \rangle$$ is minimized.
When $\langle \cdot, \cdot \rangle$ is the ordinary inner product, this is the ordinary least squares solution. When $\langle x, y \rangle = x^T W y$ where $W$ is some positive diagonal matrix, this the weighted case you are interested in.
B. The solution will satisfy the following optimality criterion: the error must be orthogonal to the column space of $A$.
Formally, let $a_1, \ldots, a_n$ be the columns of $A$. Then the optimal $x^*$ will satisfy $$ \langle a_i, b-Ax^* \rangle = 0 $$ for all $i$.
Why? Because if the error could be orthogonally decomposed as
$$ b- Ax = x_{R(A)} + x_{R(A)^\perp}$$ where $x_{R(A)} \neq 0$ is the projection onto the range of $A$, and $x_{R(A)^\perp}$ is a projection onto its a complement, then we could pick a different $x$ to get a smaller error. Indeed, $$ \langle b - Ax, b-Ax \rangle = \langle x_{R(A)}, x_{R(A)} \rangle + \langle x_{R(A)^\perp}, x_{R(A)^\perp} \rangle $$ by the Pythagorean theorem. Now if $x_{R(A)} = Ay$, then $$ \langle b-A(x+y), b-A(x+y) \rangle = \langle x_{R(A)^\perp}, x_{R(A)^\perp} \rangle$$ which is smaller.
C. For the case of the ordinary inner product, the above optimality principle can be restated as $$ A^T (b-Ax^*) = 0$$ which immediately gives you your least-squares solution; and for the case of the weighted inner product, it can be restated as
$$ A^T W (b-Ax^*)=0$$ which immediately gives you the weighted solution.