2
$\begingroup$

With reference to the original thread on Stackexchange, my question is as follows.

Usually, one would enter two value-series and a script or program calculates the correlation. For instance, with $x = 5,3,6,7,4,2,9,5$ and $y = 4,3,4,8,3,2,10,5$, the correlation is $0.93439982209434$.

For an educational website, I'm trying to find a way to let students:

  • put in value series $x$, eg. $x = 5,3,6,7,4,2,9,5$
  • put in the correlation, eg. $0.9344$
  • put in upper and lower boundaries of $y$-series, eg. between $1$ and $10$
  • give back a random series which fits the citeria, eg. $y = 4,3,4,8,3,2,10,5$

The PHP script I have written to calculate the correlation can be found in the referred-to post on stackexchange. However, it was suggested mine was much more a mathematical than a programmatical question, hence this post. Would it be possible to execute this "reverse correlation"?

  • 0
    Although I can see how you got that from the OP, I do not restrict to integer values.2012-05-15

1 Answers 1

4

I have one idea about this procedure you may have found helpful. In the linked post, the answer advises you to draw samples randomly until you reach the desired correlation. However, I guess it might take some time if you draw these samples independently of the original sequence. Let us use another trick - namely we construct a random sequence from the original one.

Let $X = (X_1,\dots,X_n)$ be a sequence of iid random variables - and you want for a given $\rho$ to construct a sequence $Y = (Y_1,\dots,Y_n)$ of iid random variables which is bounded: $Y_i\in [a,b]$ and $ \operatorname{cor}(X,Y) \approx \rho. $ Well, the idea is to put $Y_i = X_i+\beta \xi_i$ where $\xi_i$ is some noise sequence you choose: e.g. $\xi_i = \pm1$. The parameter $\beta$ is needed to reach the desired correlation level: $ \operatorname{cor}(X,Y) = \frac{\mathrm{Cov}(X,Y)}{\sigma(X)\sigma(Y)} = \rho. $ We have $\mathrm{Cov}(X,Y) = \mathrm {Cov}(X,X+\beta \xi) = \sigma^2(X)$ if we assume $\xi$ to be independent of $X$. Also: $ \sigma^2(Y) = \sigma^2(X)+\beta^2\sigma^2(\xi) $ hence $ \rho = \frac{\sigma(X)}{\sqrt{\sigma^2(X)+\beta^2\sigma^2(\xi)}}. $ If you solve for $\beta$, you obtain $ \beta = \frac{\sigma(X)}{\sigma(\xi)}\sqrt{1 - \frac{1}{\rho^2}}. $

The algorithm goes like this:

  1. You're given a sequence $X$ and you estimate from it $\hat{\sigma}(X)$.

  2. You choose the distribution of $\xi$ and draw a sample of it.

  3. You put $\displaystyle{\beta = \frac{\hat\sigma(X)}{\sigma(\xi)}\sqrt{1 - \frac{1}{\rho^2}}} $ and construct the process $Y = X+\beta \xi$; here you reach the desired correlation level.

  4. Using the scaling and shift $y = aY+b$ you reach the desired bounds.