3
$\begingroup$

I have a problem to interpret the following formula which is said to be the Pearson's correlation coefficient:

$r = \frac{N \left(\sum XY\right) - \left(\sum X\right) \left(\sum Y\right)}{\sqrt{\left[N \left(\sum X^2\right) - \left(\sum X\right)^2\right] \left[N \left(\sum Y^2\right) - \left(\sum Y\right)^2\right]}}$

It is from Mining a Web Citation Database for author co-citation analysis (p.7). I have problems with its interpretation, since the authors says $X$ and $Y$ are vectors with length $N + 1$ and the product of two column vectors is not defined, at least not normally, isn't it?

I have found a similiar notation of this formular on this Wikipedia article. Here, the formula does not take vectors as arguments, but a series of $n$ measurements with $x_i$ and $y_i$, where $i = 1,2,\dots,n$.

I have problems to combine both formulas and understand what my calculations should look like when applying it. Maybe an example would help:

Let's take this two vectors:

$X = (0,0.5,0,0)$

$Y = (0.5,0,0,0)$

with $N = 3$ which would be from this matrix:

$\begin{pmatrix} 0 & 0.5 & 0 & 0\\ 0.5 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ \end{pmatrix}$

1 Answers 1

1

I am not convinced this expression is correct if these are vectors with length $N+1$ (the implicit means are wrong), so for the rest of this I will assume they are of length $N$.

If $\mathbf{X}$ is $(X_1, X_2, \ldots , X_{N})$ then the interpretation of $\sum X$ is clearly $\sum_{i=1}^{N} X_i$, of $\sum X^2$ is $\sum_{i=1}^{N} X_i^2$, and $\sum XY$ is $\sum_{i=1}^{N} X_i Y_i$. You can regard the last of these either as a dot product or a sum over a pointwise product (for matrices this pointwise product is sometimes called a Hadamard product).