4
$\begingroup$

Assume that all the entries of an $n \times n$ correlation matrix which are not on the main diagonal are equal to $q$. Find upper and lower bounds on the possible values of $q$.

I know that the matrix should be positive semidefinite but how to proceed to get the upper and lower bounds?

Thanks!

  • 1
    Bottom line: $-1/(n-1) \le q \le 1$.2011-08-29

3 Answers 3

1

Since it's a correlation matrix, the diagonal entries are equal to 1 and the off-diagonal entries are in $[-1,1]$. Now write the matrix as $aP+bQ$ where $P$ is the $n\times n$ matrix in which every entry is $1/n$, so it's the matrix of the orthogonal projection onto the line where all components of the vector are equal, and $Q = I - P$. Then you can exploit the fact that $P$ and $Q$ are complementary orthogonal projections onto spaces of dimensions $1$ and $n-1$. From that it follows that the matrix $aP+bQ$ can be diagonalized as $ \begin{bmatrix} a \\ & b \\ & & b \\ & & & b \\ & & & & \ddots \end{bmatrix} $ This should be a covariance matrix. To see that, recall that (1) a correlation matrix is a covariance matrix in which the diagonal entries are all 1, and (2) if $A$ is the matrix of covariances of a random vector $X$, the $MAM^\top$ is the matrix of covariances of $MX$ ($M$ need not generally be a square matrix, but in this case it is).

Since the diagonal matrix above is a covariance matrix, $a$ and $b$ cannot be negative. So what must $q$ be in order that $a$ and $b$ be nonnegative?

  • 0
    @dilip: thank you for the notification!2013-02-22
1

Consider $n$ unit-variance random variables $X_1, X_2, \ldots X_n$ with the property that $\operatorname{cov}(X_i,X_j) = q$ for all $i \neq j$. Then, the covariance matrix of these random variables is the same as the correlation matrix. Now $\begin{align*} \operatorname{var}(X_1+X_2+\cdots+X_n) &= \sum_{i=1}^n \operatorname{var}(X_i) + 2\sum_{i=1}^n\sum_{j=i+1}^n\operatorname{cov}(X_i,X_j)\tag{1}\\ &= n + n(n-1)q\\ &\geq 0 \end{align*}$ and so it must be that $q \geq -\frac{1}{n-1}$ as Michael Hardy noted in a succinct comment on the question. The upper bound is, of course, $q \leq 1$. Both bounds are achievable. Obviously, if all the $X_i$ are the same random variable $X$, then $q = 1$. For the lower bound, suppose that the $X_i$ are independent unit-variance random variables so that they enjoy the desired constant correlation with $q=0$. For each $i$, set $Y_i = X_i-\bar{X}$ where $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i.$ Then, $\operatorname{var}(Y_i) = \left(\frac{n-1}{n}\right)^2 + (n-1)\left(\frac{1}{n}\right)^2 = \frac{n-1}{n}$ while for $i \neq j$, $\begin{align} \operatorname{cov}(Y_i,Y_j) &= \operatorname{cov}(X_i - \bar{X}, Y_j- \bar{X})\\ &= \operatorname{cov}(X_i,X_j) - \operatorname{cov}(X_i,\bar{X}) - \operatorname{cov}(X_j,\bar{X})+ \operatorname{var}(\bar{X})\\ &= 0 - \frac{1}{n} - \frac{1}{n} + \frac{1}{n}\\ &= -\frac{1}{n} \end{align}$ showing that all the correlation coefficients do indeed have the minimum value $ \frac{-1/n}{\sqrt{(n-1)/n}\sqrt{(n-1)/n}} = -\frac{1}{n-1}.$


Returning to $(1)$, note that if the correlation coefficients are not required to all have the same value, then from $(1)$, we get that the sum of the $n(n-1)$ correlations must be at least $-n$. Thus, the average of the $n(n-1)$ correlations is at least $-1/(n-1)$ and since at least one correlation must be as large as the average, we can assert that

In any collection of $n$ random variables $X_1, X_2, \ldots, X_n$ with finite variance, there must be at least one pair of random variables $(X_i,X_j)$ (with $i\neq j$) for which $\operatorname{cov}(X_i,X_j) \geq -\frac{1}{n-1}$

0

A general scheme for the answer is immediately obvious by generalization of the following example. Assume the correlation-matrix $R$ of size nxn where in the example n=5 and $R=L \cdot L^T$ . Then define L with a unknown value $a$ $ L=\begin{bmatrix} a&a&a&a&.&.&.&.&.&. \\ -a&.&.&.&a&a&a&.&.&. \\ .&-a&.&.&-a&.&.&a&a&. \\ .&.&-a&.&.&-a&.&-a&.&a \\ .&.&.&-a&.&.&-a&.&-a&-a \\ \end{bmatrix} $ Then all offdiagonal entries in $R=L \cdot L^T$ are $r_{k,j}=-a^2$ and the diagonal entries are $r_{k,k}=4 a^2$. To have $r_{k,k}=1$ we must have $a=\sqrt{1 \over 4} $ and thus $q = r_{k,j}=-{1 \over 4}$.
It is immediately obvious how this is generalized, so for some $n$ we have $q=-{1 \over n-1}$

Unfortunately, this is only an illustrative example so far. It would be nice to show, that this defines indeed also the highest possible value for $-q$, but I do not see it at the moment how this could be done in a similarly obvious manner ...