5
$\begingroup$

I am fitting a Gaussian Mixture Model to high-dimensional data (40 dimensions).

I have trained the model using EM, learned the parameters and now I want to know quantitatively:

What is most important in capturing the structure of the data, the means or the covariance matrices?

Currently, I can think of measuring the Euclidean distance between different means or the cosine of the principal eigenvectors of the different covariance matrices to measure if the direction of variability each covariance matrix captures is similar or different to the rest.

Any ideas ?

  • 1
    both are important. I cannot really get what you wanna ask.2012-07-26
  • 0
    Have you looked into Principal Component Analysis?2012-07-27
  • 0
    How did you decide on the number of normal distributions in the mixture?2012-07-27
  • 0
    I used the Bayesian Information Criterion. For example, imagine a mixture model in which all the mixtures are centered in a specific point but are all aligned in different directions, in this case the means are not as helpful as the covariance matrices in capturing the structure of the data.2012-07-27
  • 0
    I'm facing a very similar problem, could you tell me/point to a source how you used the Bayesian Information Criterion to decide on the number of Gaussians in the mix?2012-08-20

1 Answers 1

0

Look into model-based clustering research by Adrian Raftery:

http://www.stat.washington.edu/raftery/Research/mbc.html

Raftery's principal concern is devising methods for identifying the component distributions of Gaussian mixtures. He provides a multitude of tools useful for the task you are describing, many of which are available in public R packages.