I have N sensors which are been sampled M times, so I have an N by M readout matrix. If I want to know the relation and dependencies of these sensors simplest thing is to do a Pearson's correlation which gives me an N by N correlation matrix. Now let's say if I have the correlation matrix but I want to explore tho possible readout space that can lead to such correlation matrix what can I do? So the question is: given an N by N correlation matrix how you can get to a matrix that would have such correlation matrix? Any comment is appreciated.
Going back from a correlation matrix to the original matrix
-
0This could be very interesting! What is a "Gaussian with a given covarience matrix"? How can I generate one? Is it the same as drawing samples $f$rom a 2D Gaussian pd$f$ and multiplying it with the target covarience matrix? [Hey, I can hear some of you laughing, but don't! this happens when you don't take math after high school] – 2011-01-17
4 Answers
You can do a cholesky-decomposition, such that if your correlation-matrix is R then if
$ L*transpose(L) = R $
then $ L = cholesky(R)$
Here L is triangular of size N by N. Now you could also understand L as a simple rotation of your original readout-matrix where the M columns are rotated to fit into N (<=M ) columns/vectorspace.
So you may apply any arbitrary rotation to the columns of L, which may arbitrarily be extended to some M columns by appending null-columns.
You may also apply a "stretching" of the rows in that reconstructed "readout"-matrix by multiplication of each row by some standard-deviation since the L-matrix represents rows with norm 1.
(If you want you can download my (free) MatMate-program for matrix-visualization and see with a simple script what I mean here/how this works. Here is the download page .)
[update] Here I give an example. After reading more comments I understand, that the goal was to construct a data-matrix given a correlation-matrix. In the example I constructed a random correlation-matrix 4 by 4. Then I produced data and then I check, whether the data, corrleated, reproduce the correlation-matrix. They do, and because the matrix-algebra is simple we can also see that this is exact.
;=========================================== ;MatMate-Listing vom:16.01.2011 18:33:13 ;============================================ [1] cor = covtocorr(cov) // this is the correlation-matrix from a given covariance. cor : 1.00 -0.38 0.22 -0.85 -0.38 1.00 0.30 0.57 0.22 0.30 1.00 0.27 -0.85 0.57 0.27 1.00 [2] lad = cholesky(cor) // this provides a "loadings"-matrix by cholesky-decomposition // of matrix cor lad : 1.00 0.00 0.00 0.00 -0.38 0.93 0.00 0.00 0.22 0.42 0.88 0.00 -0.85 0.27 0.39 0.25 Now we want to construct 200 columns of data. As I wrote earlier, the cholesky matrix L of a given correlationmatrix R is just a rotation of the data-matrix One should add, that the data must be centered (mean=0) and normed(stddev=1) Now I do the inverse: I construct a valid rotation-matrix t by just rotating a dummy random-matrix of 200 columns to the first column(rotation method:"triangular"). [3] n=200 [4] dummy = randomu(1,n) [5] t = gettrans(dummy,"tri") [6] hide t,dummy So I've got a valid rotation-matrix t. Now I provide columns to rotate the cholesky/loadings-matrix within and insert the cholesky matrix in the first 4 columns [7] data = (lad || null(v,n-v)) * sqrt(n) Then I rotate that data-matrix with that random rotation [8] data = data * t [9] disp = data[*,1..10] // show the first 10 columns of the created dataset disp : 1.71 -0.11 -0.20 -0.05 -0.06 -0.21 -0.16 -0.14 -0.11 -0.21 0.23 13.12 0.08 0.02 0.02 0.08 0.06 0.05 0.04 0.08 2.22 5.77 12.34 -0.01 -0.01 -0.05 -0.03 -0.03 -0.02 -0.05 -0.46 3.83 5.63 3.52 0.05 0.18 0.13 0.12 0.09 0.17 Check whether that data really give the correlation-matrix [10] chk_cor = data * data' /n chk_cor : 1.00 -0.38 0.22 -0.85 -0.38 1.00 0.30 0.57 0.22 0.30 1.00 0.27 -0.85 0.57 0.27 1.00
We see, that the correlation-matrix is reproduced.
The "mystery" here is simply, that by the definition of correlation (of normed data, to make the example short)
R = data * data' / n
here data may have arbitrary many columns and t*t' is identity-matrix, so
R = data * t * t' * data' / n
R = (data*t) * (data*t)' / n
and the cholesky-factorization is just the operation to get L and L' from a correlationmatrix
L = cholesky(R) ==> L*L' = R
(Thus the result is also exact)
[end update]
[update 2] In response to the remark of mpiktas I'll add a variant which allows to set the standard-deviation and the means in the generated data explicitely.
The arguing in the first update was to show, how the data form a m-dimensional vectorspace, and the cholesky-decomposition of the according correlation-matrix can simply be understood as a special rotation of the data in that vectorspace.
However, means of data generated the way I described above are not guaranteed to be zero. Soe here is a variant which produces means=zero, stddevs=1 and the data are then scalable to other stddevs and translatable to prediefined means.
Here the idea is to use the L-matrix as "recipe for composition" of uncorrelated "factors" as understood in factor analysis/PCA. Having a centered and normed matrix of data D it is assumed that the cholesky-matrix L describes a linear combination of uncorrelated data of unit-variance ("factors") in a matrix F with the same number of columns as D such that
L * F = D
Now, some random F can be created by a random-generator providing normal distribution with mean=0 and stddev=1 per row. If the rows in F were truely uncorrelated, we had by the above composition L * F = D a simple solution. After that we could scale the rows of D by standard-deviations and also translate to predefined means.
Unfortunately randomgenerators do not exactly give uncorrelated data, so the correlations in F must be removed first. This is possible by the inverse of the cholesky-matrix of the correlations of F. Here is a script pseudocode
v = 4 // set number of variables/of rows, make a small example n = 200 // set number of observations/ of columns F = randomn(v,n,0,1) // generate randomdata in F where rows have mean=0, stddev=1 covF = F * F' /n cholF = cholesky(covF) L = cholesky( R ) // R is the targetted correlation-matrix data = L * inv(cholF) * F sdevs = diag([1,2,3,4]) // provide four stddevs means = colvector([20,10,40,30]) // provide four means simuldata = sdevs*data + means // here the means-vector must be expanded to // be added to all columns in simuldata
The simulated data are in the matrix simuldata.
While we can set stddevs and means explicitely there is still one problem pertaining: by the leftmultiplication of inv(cholF) and L the normal-distribution in simuldata as originally provided by the randomgenerator in F may be affected/distorted a bit. I don't have an idea yet how to solve this...
-
0@mpiktas: true, I was a bit sloppy there. I'll edit the answer later and supply a similar solution where the sdev and mean is adjustable. – 2011-01-16
What you want to do is called sampling from a multivariate distribution. It's not hard to look up how to do this, although in your case if some of your variables are assumed normal and some Poisson you're going to have to calculate the joint PDF of your model yourself. Rather than trying to explain the whole process here, I'll give you some things to read:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution
-
0Correct, but more to the point I think sampling is actually what he wants, and talking about the fiber over a given covariance matrix is his first attempt at describing this. – 2011-01-16
You could make your "10%" implementation faster by using gradient descent
Here's an example of doing it for covariance matrix because it's easier.
You have $k\times k$ covariance matrix C and you want to get $k \times n$ observation matrix $A$ that produces it.
The task is to find $A$ such that
A A' = n^2 C
You could start with a random guess for $A$ and try to minimize the sum of squared errors. Using Frobenius norm, we can write this objective as follows
J=\|A A'-n^2 C\|^2_F
Let $D$ be our matrix of errors, ie D=A A'-n^2 C. Checking with the Matrix Cookbook I get the following for the gradient
$\frac{\partial J}{\partial a_{iy}} \propto a_{iy} \sum_j D_{i,j}$
In other words, your update to data matrix for sensor $i$, observation $y$ should be proportional to the sum of errors in $i$th row of covariance matrix multiplied by the value of $i,y$'th observation.
Gradient descent step would be to update all weights at once. It might be more robust to update just the worst rows, ie calculate sum of errors for every row, update entries corresponding to the worst rows, then recalculate errors.
-
0yes, $a$nd the question asks how to sample from that space – 2011-01-16
May I tell you guys my silly solution to this problem? Only if you won't laugh at me! I thought I need to explore this space so I want to randomly sample it (the space of sensor readings). I know what the correlation matrix should look like and I know that the sensor readings are coming from a Gaussian distribution. So I generated a random N by M matrix and started tweaking the values in small steps. and check if each change moves the matriv toward the target or away and kept the changes toward the target. So I choose a random cell in this matrix and increase it by 10% and calculate the correlation matrix and compare it to the target correlation matrix the difference is smaller than what it was before the 10% increase I keep the change and move to next randomly selected cell and continue until I get close enough to the target correlation matrix. This method, although it is silly, works well and I can get different samples of the sensor reading space. What you guys think?! In practice I am working on rather large matrices like N = 8000, M = 1000
-
0I $d$idn't know that, that's good to know. B$y$ the way I was a bit smarter than just going 10% but not that smart! – 2011-01-16