1
$\begingroup$

Say I have a numeric sequence A and a set of sequences B that vary with time.

I suspect that there is a relationship between one or more of the B sequences and sequence A, that changes in Bn are largely or wholly caused by changes in sequence A. However there is an unknown time delay between changes in A and their effect on each of the B sequences (they are each out of phase by varying amounts)

I am looking for a means of finding the most closely correlating B to A regardless of the time delay. What options are available to me?

** EDIT **

The crux of the problem here is that I have millions of B sequences to test, and there are approx 2 million data points within the lag window that I would like to test over. Working out a correllation for each B for each possible lag scenario is just going to be too computationally expensive (especially as in reality there will be a more dynamic relationship than just lag between A and B, so I will be looking to test variations of relationships as well).

So what I am looking for is a means of taking the lag out of calculation.

  • 0
    Have you asked on [CrossValidated](http://stats.stackexchange.com/)?2011-01-01

3 Answers 3

1

Will depend a bit on what kind of sequences you have, but assuming you are talking about discrete sequences, let's say $A=(A(t))_{t=-\infty}^{\infty}=(\ldots,A(-2),A(-1),A(0),A(1),A(2),\ldots)$ and the same for a sequence $B$. If your sequence doesn't run indefinitely, just put past values and future values equal to zero from a certain time on.

Then you can look at the following quantity:

$ C(A,B;t_0,T,\tau)=\sum_{t=t_0}^{t_0+T} [A(t) - \bar{A}(t_0,T)][B(t-\tau)-\bar{B}(t_0-\tau,T)] $

where

$ \bar{A}(t_0,T) = \frac{1}{T+1} \sum_{t=t_0}^{t_0+T} A(t) $

is a moving average over a time window $T+1$.

The quantity $C(A,B;t_0,T,\tau)$ then measures correlation between signals $A$ and $B$. You can normalize it by dividing with the square roots of the autocorrelations of $A$ and $B$, that is $C(A,A;t_0,T,0)$ and $C(B,B;t_0-\tau,T,0)$.

Let me take a periodic sequence as an example:

$A= (\ldots,0,1,2,0,1,2,0,1,2,0,1,2,\ldots)$

So, the pattern $(0,1,2)$ repeats itself indefinitely.

The time average over a window of 6 time units is $\bar{A}=(0+1+2+0+1+2)/6=1$ and this regardless of where I chose to start the sum. (I chose 6 on purpose, this independence need not be generally true.)

The autocorrelation of A with itself is:

$ C(A,A;0,6,\tau)=\sum_{t=0}^{6} [A(t) - 1][A(t-\tau)-1] $

If $\tau=0$, this sum will be $C(A,A;0,6,0)= [0-1]^2 + [1-1]^2 + [2-1]^2 + [0-1]^2 + [1-1]^2 + [2-1]^2 = 4 \; .$ In fact, if $\tau$ is any multiple of 3, you should get the same response, since the function is periodic.

If $\tau=1$, this sum will be $C(A,A;0,6,1)= 2\left([0-1][2-1] + [1-1][0-1] + [2-1][1-1]\right) = -2 \; .$ Which in absolute value is smaller than the previous result, meaning that the correlation is less strong, plus since the sign is negative, it can be interpreted as being closer to anti-correlation.

And that's how you'll be able to read patterns from these correlation functions.

  • 0
    Thanks for the example - unfortunately the problem for me is that I have millions of B and a granularity of data that gives me 1 million plus options to test for tau for each. I will add clarification to the question.2010-12-02
0

Take a look at dynamic time warping. I think it's just the solution you need. I've used the R package 'dtw' which is described here. http://cran.r-project.org/web/packages/dtw/dtw.pdf

0

You can find the lag more efficiently by using the Fourier transform. The operation of cross-correlation of A and B in the time domain is equivalent to convolution of A with time-reversed B, which can be efficiently computed in the frequency domain using the convolution theorem. The cross-correlation is given by $F^{-1} \left\{ F\left\{A\right\} \cdot F\left\{B\right\} \right\}$, where $F$ represents the Fourier transform, and $\cdot$ is pointwise multiplication.

But, as you say, you're not really interested in the lag per se. Instead, you want to know how much of $B$ is (linearly) caused by $A$. The answer to this problem is Wiener Filtering.

You may also be interested in computing the transfer function and coherence in the frequency domain. A good book on such methods is Random Data: Analysis and Measurement Procedures by Bendat and Piersol, though this particular book might be overkill for your needs.