4
$\begingroup$

Select $n$ numbers from a set $\{1,2,...,U\}$, $y_i$ is the $i$th number selected, and $x_i$ is the rank of $y_i$ in the $n$ numbers. The rank is the order of the a number after the $n$ numbers are sorted in ascending order.

We can get $n$ data points $(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)$, And a best fit line for these data points can be found by linear regression. $r_{xy}$ (correlation coefficient) is the goodness of the fit line, I want to calculate $E(r_{xy})$ or $E(r_{xy}^2)$ (correlation of determination).

  • 0
    This seems to be the rank correlation: http://en.wikipedia.org/wiki/Rank_correlation2011-04-12
  • 0
    It seems unlikely that there is a nice formula for $E(r_{xy})$. For instance, when $U=6$ and $n=3$ my calculations give ${3\over 10}+{9\sqrt{21}\over 140} +{2\sqrt{39}\over 65} + {\sqrt{7}\over 28}+{\sqrt{57}\over 76}$. It seems that the average correlation is quite large, usually more than $9/10$.2011-04-13

1 Answers 1