I am learning about mutual information, and am confused about one of the definitions. Mutual information is defined as $ I(X;Y) = H(X) - H(X | Y) $
where,
$ H(X) = \sum_{x} p(x) \log \frac{1}{p(x)} ,$
and similarly,
$ H(X|Y) = \sum_{x,y} p(x,y) \log \frac{1}{p(x|y)} $
where $H(X)$ is concerned we can then say that,
$ \begin{align*} \sum_x p(x) \log \frac{1}{p(x)} &= \sum_x \left( (p(x) \log \frac{1}{p(x)}) \sum_y p(y|x) \right) \\ &= \sum_{x,y} p(x)p(y|x) \log \frac{1}{p(x)} = \\ &= \sum_{x,y} p(x,y) \log \frac{1}{p(x)} \end{align*} $
because $ \sum \limits_y p(y|x) = 1 $ for any $x$.
I believe this is how the derivation is supposed to go, and combined with $H(X|Y)$ eventually leads to the canonical equation,
$\sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)} .$
But it seems to me that it is equally true to say that,
$ \begin{align*} \sum_{x} p(x) \log \frac{1}{p(x)} &= \sum_{x} \left\{ ( p(x) \log \frac{1}{p(x)} ) \sum_{y} p(y) \right\} \\ &= \sum_{x,y} p(x)p(y) \log \frac{1}{p(x)} \end{align*} $
because we also have $ \sum_{y} p(y) = 1 $ by definition.
The problem I'm having is that this latter version implies that,
$\sum_{x,y} p(x,y) \log \frac{1}{p(x)} = \sum_{x,y} p(x)p(y) \log \frac{1}{p(x)} $
which implies that $ p(x,y) = p(x)p(y) $ which also implies that $X$ and $Y$ are independent. I know this last conclusion is false, because it means there would never be any mutual information and thus that this metric would be pointless, but I can't figure out where I'm going wrong. It would be great to have someone point out the mistake I am making in the latter case.