3
$\begingroup$

I'm trying to understand the paper Fast principal component analysis using fixed-point algorithm by Alok Sharma and Kuldip K. Paliwal (1151–1155), and especially what is said about $\Sigma_x$, the covariance of x.

But before being specific, let me summarize how I understand the algorithm.

The PCA finds a linear transformation matrix $\varphi$ of size $d \times h$ which is meant to reduce a set of $n$ d-dimensional feature vectors $x$ ($x \in \mathbb{R}^d$) to a set of $n$ h-dimensional feature vectors $y$ ($y \in \mathbb{R}^h$), with $h < d$.

So given one feature vector $x$, we have $\varphi x \rightarrow y$, or:

$\pmatrix{ a_{1,1} & a_{1,2} & \cdots & a_{1,h} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,h} \\ \vdots & \vdots & \ddots & \vdots \\ a_{d,1} & a_{d,2} & \cdots & a_{d,h} }\pmatrix{ x_{1} \\ x_{2} \\ \vdots \\ x_{d} } \rightarrow \pmatrix{ y_{1} \\ y_{2} \\ \vdots \\ y_{h} }$

And we want to define the matrix $\varphi$.

Let me quote the algorithm (Table 1):

  1. Choose $h$, the number of principal axes or eigenvectors required to estimate. Compute covariance $\Sigma_x$ and set $p \leftarrow 1$
  2. Initialize eigenvector $\varphi_p$ of size $d \times 1$ e.g. randomly
  3. Update $\varphi_p$ as $\varphi_p \leftarrow \Sigma_x \varphi_p$
  4. Do the Gram-Schmidt orthogonalization process [...]
  5. Normalize $\varphi_p$ by dividing it by its norm: $\varphi_p \leftarrow \varphi_p/||\varphi_p||$
  6. If $\varphi_p$ has not converged, go back to step 3
  7. Increment counter $p \leftarrow p + 1$ and go to step 2 until $p$ equals $h$

So basically, the orthogonalization process and the normalization are pretty straightforward and simple to implement, same for the convergence. Unfortunately, I'm having hard time trying to figure out the first steps:

How am I supposed to compute the covariance $\Sigma_x$ given that $x$ is one of the feature vector of the input? (BTW, is the described algorithm actually only explaining the definition of $\Sigma$ for one single features vector of the $n$ input vectors?)

I was unable to understand how to apply the definition of the covariance matrix to that case; is it possible to have a simple example of what it takes in input, and what gets out?

Now suppose I am able to compute the covariance $\Sigma_x$, what does step 2 and 3 means? $\varphi_p$ is supposed to be one column of the $d \times h$ matrix $\varphi$, so what does the random initialization of $\varphi_p$ means and how to apply the transformation $\varphi_p \leftarrow \Sigma_x\varphi_p$ using the previously computed covariance?

  • 0
    Newer randomized methods based on the concentration of measure phenomena are really good and likely outperform the method you're reading about. See, for example, [Halko, Martinsson & Tropp 2010](http://amath.colorado.edu/faculty/martinss/Pubs/2010_HMT_random_review.pdf)2012-08-26

2 Answers 2