4
$\begingroup$

Suppose $X_1,X_2,\ldots$ are $m$-dependent random variables. Let $F_i$ be the cdf of $X_i$. Let $F_n(x, \omega)$ be the empirical cdf of $X_1,\ldots,X_n$. What will be the variance of $F_n(x, \omega)$?

  • 0
    Do you mean $F_i(x) = \mathbb{P}(X_i \leqslant x)$? Also, for the empirical cdf, do you mean $F_n(x,\omega) = \frac{1}{n} \sum_{k=1}^n I(X_k(\omega) < x)$? It could not hurt to be a little more specific.2012-04-30
  • 0
    Yes and thank you for the clarification.2012-04-30

1 Answers 1

8

Let $Y(\omega) = F_n(x, \omega) = \frac{1}{n} \sum_{k=1}^n I(X_k(\omega) \leqslant x)$. Then, using $\mathbb{E}(I(X_k \leqslant x)) = \mathbb{P}(X_k \leqslant x)$, we have $$ \mathbb{E}\left(Y\right) = \frac{1}{n} \sum_{k=1}^n \mathbb{P}(X_k \leqslant x) = \frac{1}{n} \sum_{k=1}^n F_k(x) $$ $$ \begin{eqnarray} \mathbb{E}(Y^2) &=& \frac{1}{n^2} \sum_{k=1}^n \sum_{\ell=1}^n \mathbb{E}\left( I(X_k \leqslant x) I(X_\ell \leqslant x) \right) \\ &=& \frac{1}{n^2} \sum_{k=1}^n \sum_{\ell=1}^n \mathbb{P}(X_k \leqslant x, X_\ell \leqslant x) \end{eqnarray} $$ Therefore: $$\begin{eqnarray} \mathbb{Var}(Y) &=& \frac{1}{n^2} \sum_{k=1}^n \sum_{\ell=1}^n \left( \mathbb{P}(X_k \leqslant x, X_\ell \leqslant x) - F_{X_k}(x) F_{X_\ell}(x) \right) \\ &=& \frac{1}{n^2} \sum_{k=1}^n F_{X_k}(x) \left(1-F_{X_k}(x)\right) + \frac{2}{n^2} \sum_{1 \leqslant k < \ell \leqslant n} \left( F_{X_k,X_\ell}(x,x) - F_{X_k}(x) F_{X_\ell}(x) \right) \end{eqnarray}$$

For the case of independent variables in the sample, we get $$ \mathbb{Var}(Y_\text{indep}) = \frac{1}{n^2} \sum_{k=1}^n F_{X_k}(x) \left(1-F_{X_k}(x)\right) $$ For the case of identically distributed: $$ \mathbb{Var}(Y_\text{i.i.d.}) = \frac{1}{n} F_X(x) (1-F_X(x)) $$

  • 0
    Thank you. How about the case of $m$-independence where $X_i$ and $X_j$ are independent once $m$ apart.2012-04-30
  • 0
    @user12847 In that case, the formula for $\mathbb{Var}(Y)$ simplifies, in that $F_{X_k,X_\ell}(x,x) = F_{X_k}(x) F_{X_\ell}(x)$ for $|k-\ell| \geqslant m$.2012-04-30
  • 0
    Then the variance increases with positive correlation among random variables. Why?2012-04-30
  • 0
    Sasha, could you please be more specific in the second line of $\mathbb{V}(Y)$? I don't get how you transform the joint probability of $X_{k}$ and $X_{l}$ and how the sums are transformed. Thanks.2017-05-24
  • 1
    @David The first line is the definition for the variance of $Y$. In second line the the double sum is split into $k, $k=l$ and $k>l$. The first term on the second line is the term coming from $k=l$, because $P(X_k<=x, X_k<=x) = P(X_k<=x)$. The other two sums are equal to each other by symmetry, hence the factor of 2.2017-05-24
  • 0
    Thanks, pretty clear explanation.2017-05-24