4
$\begingroup$

Assume that all the entries of an $n \times n$ correlation matrix which are not on the main diagonal are equal to $q$. Find upper and lower bounds on the possible values of $q$.

I know that the matrix should be positive semidefinite but how to proceed to get the upper and lower bounds?

Thanks!

  • 2
    Do you know anything else about correlation matrices, other than positive semidefinite? Anything special about their form, how they are calculated?2011-08-26
  • 1
    Bottom line: $-1/(n-1) \le q \le 1$.2011-08-29

3 Answers 3

1

Since it's a correlation matrix, the diagonal entries are equal to 1 and the off-diagonal entries are in $[-1,1]$. Now write the matrix as $aP+bQ$ where $P$ is the $n\times n$ matrix in which every entry is $1/n$, so it's the matrix of the orthogonal projection onto the line where all components of the vector are equal, and $Q = I - P$. Then you can exploit the fact that $P$ and $Q$ are complementary orthogonal projections onto spaces of dimensions $1$ and $n-1$. From that it follows that the matrix $aP+bQ$ can be diagonalized as $$ \begin{bmatrix} a \\ & b \\ & & b \\ & & & b \\ & & & & \ddots \end{bmatrix} $$ This should be a covariance matrix. To see that, recall that (1) a correlation matrix is a covariance matrix in which the diagonal entries are all 1, and (2) if $A$ is the matrix of covariances of a random vector $X$, the $MAM^\top$ is the matrix of covariances of $MX$ ($M$ need not generally be a square matrix, but in this case it is).

Since the diagonal matrix above is a covariance matrix, $a$ and $b$ cannot be negative. So what must $q$ be in order that $a$ and $b$ be nonnegative?

  • 0
    BTW, a simple instance in which one of the two opposite extreme cases is realized is where $(X_1,\dots,X_n) = (0,0,\ldots,0,1,0,\ldots,0,0)$ with a $1$ in the $i$th place, with probability $1/n$ for each value of $i$. Clearly $\operatorname{corr}(X_i,X_j)$ is $1$ if $i=j$ and is negative if $i\neq j$, and close to $0$ if $n$ is large.2011-08-26
  • 0
    Is the case included, that all offdiagonal entries are $\small -1$ ? If I approach that value from above, say going to $\small -1+1e-80 $ , the cholesky-decompositions begin to show much increasing values. So I suggest to check carefully whether all values can be $\small -1$2011-08-26
  • 0
    They cannot be $-1$ except in the simplest non-vacuous special case, that $n=2$. The lower bound is negative, but nowhere near $-1$. It depends on $n$.2011-08-26
  • 0
    @GottfriedHelms I just posted an answer showing that at least one off-diagonal orrelation must have value $-1/(n-1)$ or more, and so all off-diagonal values being $-1$ is not a possibility except for the trivial case $n=2$.2013-02-22
  • 0
    @dilip: thank you for the notification!2013-02-22
1

Consider $n$ unit-variance random variables $X_1, X_2, \ldots X_n$ with the property that $\operatorname{cov}(X_i,X_j) = q$ for all $i \neq j$. Then, the covariance matrix of these random variables is the same as the correlation matrix. Now $$\begin{align*} \operatorname{var}(X_1+X_2+\cdots+X_n) &= \sum_{i=1}^n \operatorname{var}(X_i) + 2\sum_{i=1}^n\sum_{j=i+1}^n\operatorname{cov}(X_i,X_j)\tag{1}\\ &= n + n(n-1)q\\ &\geq 0 \end{align*}$$ and so it must be that $$q \geq -\frac{1}{n-1}$$ as Michael Hardy noted in a succinct comment on the question. The upper bound is, of course, $q \leq 1$. Both bounds are achievable. Obviously, if all the $X_i$ are the same random variable $X$, then $q = 1$. For the lower bound, suppose that the $X_i$ are independent unit-variance random variables so that they enjoy the desired constant correlation with $q=0$. For each $i$, set $Y_i = X_i-\bar{X}$ where $$\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i.$$ Then, $$\operatorname{var}(Y_i) = \left(\frac{n-1}{n}\right)^2 + (n-1)\left(\frac{1}{n}\right)^2 = \frac{n-1}{n}$$ while for $i \neq j$, $$\begin{align} \operatorname{cov}(Y_i,Y_j) &= \operatorname{cov}(X_i - \bar{X}, Y_j- \bar{X})\\ &= \operatorname{cov}(X_i,X_j) - \operatorname{cov}(X_i,\bar{X}) - \operatorname{cov}(X_j,\bar{X})+ \operatorname{var}(\bar{X})\\ &= 0 - \frac{1}{n} - \frac{1}{n} + \frac{1}{n}\\ &= -\frac{1}{n} \end{align}$$ showing that all the correlation coefficients do indeed have the minimum value $$ \frac{-1/n}{\sqrt{(n-1)/n}\sqrt{(n-1)/n}} = -\frac{1}{n-1}.$$


Returning to $(1)$, note that if the correlation coefficients are not required to all have the same value, then from $(1)$, we get that the sum of the $n(n-1)$ correlations must be at least $-n$. Thus, the average of the $n(n-1)$ correlations is at least $-1/(n-1)$ and since at least one correlation must be as large as the average, we can assert that

In any collection of $n$ random variables $X_1, X_2, \ldots, X_n$ with finite variance, there must be at least one pair of random variables $(X_i,X_j)$ (with $i\neq j$) for which $$\operatorname{cov}(X_i,X_j) \geq -\frac{1}{n-1}$$

0

A general scheme for the answer is immediately obvious by generalization of the following example. Assume the correlation-matrix $R$ of size nxn where in the example n=5 and $R=L \cdot L^T$ . Then define L with a unknown value $a$ $$ L=\begin{bmatrix} a&a&a&a&.&.&.&.&.&. \\ -a&.&.&.&a&a&a&.&.&. \\ .&-a&.&.&-a&.&.&a&a&. \\ .&.&-a&.&.&-a&.&-a&.&a \\ .&.&.&-a&.&.&-a&.&-a&-a \\ \end{bmatrix} $$ Then all offdiagonal entries in $R=L \cdot L^T$ are $r_{k,j}=-a^2$ and the diagonal entries are $r_{k,k}=4 a^2$. To have $r_{k,k}=1$ we must have $a=\sqrt{1 \over 4} $ and thus $q = r_{k,j}=-{1 \over 4}$.
It is immediately obvious how this is generalized, so for some $n$ we have $q=-{1 \over n-1}$

Unfortunately, this is only an illustrative example so far. It would be nice to show, that this defines indeed also the highest possible value for $-q$, but I do not see it at the moment how this could be done in a similarly obvious manner ...