0
$\begingroup$

In PCA the first dimension of the basis vector has the highest variance and the last has the least variance.

So if we are using PCA just for dimension reduction why cant we find the variance of individual features, sort the features in the descending order of the variance and just use the first n features/dimensions.

  • 1
    What do you call the "features" ?2012-10-03
  • 0
    http://stats.stackexchange.com/ exists...2012-10-03

2 Answers 2

2

PCA is a process of projecting your matrix onto the eigenvectors of the covariance matrix of your data. There is one to one correspondence between eigenvectors and the principal components. It is a transformation which provably and optimally transform your data to a space from where you can recover your data, removing some(highest indexed) columns/rows, with a minimal loss in the energy. This means there exist no other algorithm which preserves the energy more than PCA.

When I come to your question, yes there are variations inside the principal components. The second principal component can have some values which are greater than some that of the 1st component. One can order them and take the N greatest of them if the eventual interest is classification and highest values are the primary factors. In general this is not always the case. The drawback of this idea is that only some elements corresponding to a principal component will be evaluated. In terms of data compression, this makes definitely no sense, however might be useful for classification. The biggest problem is that when you sort them and take the most significant N of them, what about the second data??? The indexes of the most significant values will be different! As a result your classifier will suffer from this mismatch.

  • 0
    Please use "principal" instead of "principle"2012-10-03
0

what you are saying is not entirely wrong..actually it is the gist of PCA algorithm but it also probably does the work of transformation of data in plane to get a better understanding of it. I hope this link will help you a lot on this...PCA is an algorithm that is in use since about 1905..and there is no much better replacement to it. Only drawback is that it's linear in nature...for more information check this link

  • 0
    sorry for duplicate add...2012-10-03
  • 0
    thank you for the link on PAC.2012-10-04