3
$\begingroup$

Am trying to write a simple program that can take an arbitrary data-set of [x,y] pairs from given file, analyzes and prints any interesting statistical characteristics.

Of the things am interested in, is printing some statistical description of the data based on things like statistical correlation. But now my problem is that their is no information given to the program about the probability distribution from which the sample was taken, and thus such things as Cov(X,Y) seem to evade me since the formula:

$Cov(X,Y)=\langle XY\rangle - \mu_x\mu_y$

requires that am able to calculate the Expectation of XY, which in turn requires that I know the probability density function of the source. So what can I do to obtain the $Cov(XY)$ when I can only calculate $mean(x), mean(y) ,var(x) $ and $var(y)$?

Eventually, am interested in saying something about the correlation between X and Y.

4 Answers 4

2

Assuming zero means, and applying Cauchy-Schwarz: $ |Cov(X,Y)|=|E(XY)| \le \sqrt{E(X^2) E(Y^2)} = \sqrt{Var(X) Var(Y)}$ The same result can be obtained for non-zero mean, and this bound is all you can get from marginal (mean and variance) information.

The, the extremes (in absolute value) of the covariance are realized for $X,Y$ independent, $Cov(X,Y)=0$, and for $X=Y$, $Cov(X,Y) = Var(X)=Var(Y)$.

  • 0
    Improved statement: The Cauchy-Schwarz inequality gives $-\sigma_X\sigma_Y \leq {\textit cov}(X,Y) \leq \sigma_X\sigma_Y$ and if $Y = aX +b$ where $a$ and $b$ are real numbers, then the upper bound (respectively lower bound) is satisfied with equality if a > 0 (respectively a < 0).2011-09-21
7

So what can I do to obtain [the covariance of $X$ and $Y$] when I can only calculate [their means and variances]? Nothing, I am afraid.

For an example, consider a standard normal variable $X$. If $Y=X$, then both means are zero, both variances are $1$ and the covariance of $X$ and $Y$ is $+1$. If $Y=-X$, then both means are zero, both variances are $1$ and the covariance of $X$ and $Y$ is $-1$. This shows you must know something else than the means and variances to get the covariance.

  • 1
    Well, if what you have is a set of $n$ points $(x_k,y_k)$, you can use the empirical mean of $XY$, that is replace $E(XY)$ by $\frac1n\sum\limits_{k=1}^nx_ky_k$.2011-09-21
4

If you can calculate ${\textit mean}(x)$ which I assume is the sample mean $ {\textit mean}(x) = \frac{1}{n}\sum_{i=1}^n x_i $ of your data set as opposed to the expectation $\mu_x$ which requires knowledge of the probability distribution, and similarly sample variance $ {\textit var}(x) = \frac{1}{n-1}\sum_{i=1}^n (x_i - {\textit mean}(x))^2 $ then you should be able to calculate a sample covariance for your samples as well using something like $ {\textit cov}(x,y) = \frac{1}{n-1}\sum_{i=1}^n (x_i - {\textit mean}(x))(y_i - {\textit mean}(y)). $ Sample means, sample variances, and sample covariances are (unbiased) estimators of the means, variances and covariance of the underlying probability distribution that "generated" the sample pairs $(x_i, y_i), i = 1, 2, \ldots n,$ in your data set.

  • 0
    yep! this is helpful for my situation :D thanks.2011-09-21
3

I don't think there is a way if you have just the means and the variances. But if you have the individual observations then you can estimate the covariance by the sample covariance $\frac{1}{N-1}\sum_{i=1}^N (x_i-\bar x)(y_i-\bar y)$ where $N$ is the number of observations, $(x_i,y_i)$ are the observations and $\bar x$ and $\bar y$ are the sample means of $X$ and $Y$ respectively. You will find this covered in any elementary statistics book.