I'm trying to understand the paper Fast principal component analysis using fixed-point algorithm by Alok Sharma and Kuldip K. Paliwal (1151–1155), and especially what is said about $\Sigma_x$, the covariance of x.
But before being specific, let me summarize how I understand the algorithm.
The PCA finds a linear transformation matrix $\varphi$ of size $d \times h$ which is meant to reduce a set of $n$ d-dimensional feature vectors $x$ ($x \in \mathbb{R}^d$) to a set of $n$ h-dimensional feature vectors $y$ ($y \in \mathbb{R}^h$), with $h < d$.
So given one feature vector $x$, we have $\varphi x \rightarrow y$, or:
$\pmatrix{ a_{1,1} & a_{1,2} & \cdots & a_{1,h} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,h} \\ \vdots & \vdots & \ddots & \vdots \\ a_{d,1} & a_{d,2} & \cdots & a_{d,h} }\pmatrix{ x_{1} \\ x_{2} \\ \vdots \\ x_{d} } \rightarrow \pmatrix{ y_{1} \\ y_{2} \\ \vdots \\ y_{h} }$
And we want to define the matrix $\varphi$.
Let me quote the algorithm (Table 1):
- Choose $h$, the number of principal axes or eigenvectors required to estimate. Compute covariance $\Sigma_x$ and set $p \leftarrow 1$
- Initialize eigenvector $\varphi_p$ of size $d \times 1$ e.g. randomly
- Update $\varphi_p$ as $\varphi_p \leftarrow \Sigma_x \varphi_p$
- Do the Gram-Schmidt orthogonalization process [...]
- Normalize $\varphi_p$ by dividing it by its norm: $\varphi_p \leftarrow \varphi_p/||\varphi_p||$
- If $\varphi_p$ has not converged, go back to step 3
- Increment counter $p \leftarrow p + 1$ and go to step 2 until $p$ equals $h$
So basically, the orthogonalization process and the normalization are pretty straightforward and simple to implement, same for the convergence. Unfortunately, I'm having hard time trying to figure out the first steps:
How am I supposed to compute the covariance $\Sigma_x$ given that $x$ is one of the feature vector of the input? (BTW, is the described algorithm actually only explaining the definition of $\Sigma$ for one single features vector of the $n$ input vectors?)
I was unable to understand how to apply the definition of the covariance matrix to that case; is it possible to have a simple example of what it takes in input, and what gets out?
Now suppose I am able to compute the covariance $\Sigma_x$, what does step 2 and 3 means? $\varphi_p$ is supposed to be one column of the $d \times h$ matrix $\varphi$, so what does the random initialization of $\varphi_p$ means and how to apply the transformation $\varphi_p \leftarrow \Sigma_x\varphi_p$ using the previously computed covariance?