0
$\begingroup$

I have a set of matrices which should fall into 3 distinct set/groups/clusters. They are unlabelled. I wish to do unsupervised clustering with PCA. I am using matlab as well. At the end I would also like to examine the eigenvectors.

Matlab has a function call "princomp" which I believe can do this task; is this correct?

When I give "princomp" a matrix the output can be interpreted how?

For example:

dataTmp=[1 1; 2 2; 1 2; 2 3; 4 6; -1 1; -2 2; -4 3; -5 8] dataTmp =  1     1  2     2  1     2  2     3  4     6 -1     1 -2     2 -4     3 -5     8  princomp(dataTmp)  ans =  0.9207    0.3902 -0.3902    0.9207 

or should I being using the function "zscore" beforehand to standardise the values first?

princomp(zscore(dataTmp))  ans =  0.7071    0.7071 -0.7071    0.7071 

How do I interpret the answer? The data I made were simple points in either the first or second quandrant.

1 Answers 1

1

I don't see what you want to do: princomp performs a principal component analysis of the data. That is it essentially determines an orthonormal basis of the sample space, such that the orthogonal projection of the data to the line spanned by the first base vector has maximal variance. The second base vector satisfies a similar maximality condition just with the constraint that it be orthogonal to the first. And so on for the third, fourth etc. The output you get is a matrix that has these base vectors as columns.

The principal components are not related in an obvious way to existing clusters in the data set.

Why don't you use kmeans instead? If the matrices you speak of are falling clearly into three classes, as you say, kmeans should work well. The only drawback possibly being the fact that the three clusters are not "convex", in which case kmeans might fail to give an acceptable result.

  • 0
    that was great, and clear. Thanks again!2011-09-14