1
$\begingroup$

I am faced with the following problem in my research:

Suppose that $X_1, X_2, \ldots, X_n$ are random variables with the same law $\mu$, but are not independent. Is there any algorithm to generate (from the realisations of $X_1, X_2, \ldots, X_n$) i.i.d. random variables $\widetilde{X}_1, \widetilde{X}_2, \ldots, \widetilde{X}_n$ with the same law $\mu$?

If the best we can get is an approximation, I am also interested to know the strong error $\mathbb{E} |X_i - \widetilde{X}_i|$.

1 Answers 1

1

Normal Sum and Difference. For correlated normal $X_1$ and $X_2$, one can find transformations $Y_1 = aX_1 + bX_2$ and $Y_2 = cX_1 + dX_2$ that are uncorrelated, hence independent. But for distributions other than normal, uncorrelated does not imply independent.

Thinning. In general, if you you know (e.g., from ACF and multidimensional plots) that $X_i$ and $X_j$ for large $|i-j|$ are nearly independent, then you can 'thin' the sequence $X_1, X_2, \dots$ by taking every $n$th observation (for suitably large) $n$ to get a very nearly independent subsequence. This is done in Markov Chain Monte Carlo (MCMC) simulations when one wants to have (essentially) iid sequences. (In many instances Markov 'one-step' dependence 'decays' after a period of time.)

But in general, I know of no way to manipulate a sequence $X_1, X_2, \dots, X_n$ to get an iid sequence $\tilde X_1, \tilde X_2, \dots\ \tilde X_n.$

Random walk with thinning. Here is an example of a simple random walk that gets set back to $0$ if it tries to stray outside $(-10, 10).$ (Otherwise, it would eventually wander away to $\pm \infty.$)

The original process shows significant autocorrelation up to about 50 steps, but if we 'thin' the process by taking every 100th observation, we have an essentially iid process. (Autocorrelations between the dotted lines are not significantly different from 0.) In many cases of practical interest, dependence wears away more quickly, and the thinned observations can be closer together.

enter image description here

The R code is shown below, in case it may be of interest.

m = 10^4;  x = numeric(m);  x[1] = 0
for(i in 2:m) {
  y = x[i-1] + rnorm(1)
  x[i] = y
    if (y > 10)  x[i]=0
    if (y < -10) x[i]=0 }

par(mfrow=c(2,2))  # four panels per plot
  plot(x,type="l");  acf(x)
  x.thin = x[seq(1,m,by=100)]  # every 100th index
  plot(x.thin,type="l");  acf(x.thin)
par(mfrow=c(1,1))