Am trying to write a simple program that can take an arbitrary data-set of [x,y] pairs from given file, analyzes and prints any interesting statistical characteristics.
Of the things am interested in, is printing some statistical description of the data based on things like statistical correlation. But now my problem is that their is no information given to the program about the probability distribution from which the sample was taken, and thus such things as Cov(X,Y) seem to evade me since the formula:
$Cov(X,Y)=\langle XY\rangle - \mu_x\mu_y$
requires that am able to calculate the Expectation of XY, which in turn requires that I know the probability density function of the source. So what can I do to obtain the $Cov(XY)$ when I can only calculate $mean(x), mean(y) ,var(x) $ and $var(y)$?
Eventually, am interested in saying something about the correlation between X and Y.