5
$\begingroup$

I am fitting a Gaussian Mixture Model to high-dimensional data (40 dimensions).

I have trained the model using EM, learned the parameters and now I want to know quantitatively:

What is most important in capturing the structure of the data, the means or the covariance matrices?

Currently, I can think of measuring the Euclidean distance between different means or the cosine of the principal eigenvectors of the different covariance matrices to measure if the direction of variability each covariance matrix captures is similar or different to the rest.

Any ideas ?

  • 0
    I'm facing a very similar problem, could you tell me/point to a source how you used the Bayesian Information Criterion to decide on the number of Gaussians in the mix?2012-08-20

1 Answers 1

0

Look into model-based clustering research by Adrian Raftery:

http://www.stat.washington.edu/raftery/Research/mbc.html

Raftery's principal concern is devising methods for identifying the component distributions of Gaussian mixtures. He provides a multitude of tools useful for the task you are describing, many of which are available in public R packages.