3
$\begingroup$

I'm trying to better understand the link between PCA and Matrix Factorization of the form $X \approx WH$.

I've read somewhere that the PCA solution can be also derived from the following cost function (I'm not even sure whether this is the right norm to use)

\begin{align} argmin_{W,H} \frac{1}{N}\sum_{n=1}^N|| \mathbf{x}_n - \mathbf{Wh}_n ||^2 \qquad \text{subject to } W^{T} W = I \end{align}

which seems similar, if not equal, to the idea of the minimum-error formulation of PCA.

According to the PCA derivation, $\mathbf{W}$ is the matrix of the eigenvectors of the covariance matrix of $\mathbf{X}$, and $\mathbf{h}_n$ is the projection of $\mathbf{x}_n$ given by $\mathbf{h}_n = \mathbf{W}^T \mathbf{x}_n$.

I have followed the derivation of the minimum-error formulation in Bishop's PRML book, but I find it a bit unintuitive.

What is the shortest way to derive the above minimization problem? (I'm thinking, for instance, of using Lagrangians)

  • 0
    [Write $X = U S V^T$ the SVD](https://en.wikipedia.org/wiki/Singular_value_decomposition), since $\|A P\|^2 = \|A\|^2$ whenever $P$ is orthonormal, it reduces to minimizing the $\|.\|$ of a diagonal matrix under some rank constraint (trivial). Note that the usual algorithm for obtaining the SVD is $\min_{u,v} \|M- u v^T\|^2$ where $u,v$ are some column vectors. So it is really the solution of PCA, by definition.2017-01-06

0 Answers 0