I'm trying to better understand the link between PCA and Matrix Factorization of the form $X \approx WH$.
I've read somewhere that the PCA solution can be also derived from the following cost function (I'm not even sure whether this is the right norm to use)
\begin{align} argmin_{W,H} \frac{1}{N}\sum_{n=1}^N|| \mathbf{x}_n - \mathbf{Wh}_n ||^2 \qquad \text{subject to } W^{T} W = I \end{align}
which seems similar, if not equal, to the idea of the minimum-error formulation of PCA.
According to the PCA derivation, $\mathbf{W}$ is the matrix of the eigenvectors of the covariance matrix of $\mathbf{X}$, and $\mathbf{h}_n$ is the projection of $\mathbf{x}_n$ given by $\mathbf{h}_n = \mathbf{W}^T \mathbf{x}_n$.
I have followed the derivation of the minimum-error formulation in Bishop's PRML book, but I find it a bit unintuitive.
What is the shortest way to derive the above minimization problem? (I'm thinking, for instance, of using Lagrangians)