1
$\begingroup$

Let's assume that we have two independent Bernoulli populations,$Ber(\theta_1)$ $Ber(\theta_2)$

How do we prove that $\frac{(\bar X_1-\bar X_2)-(\theta_1-\theta_2)}{\sqrt{\frac{\theta_1(1-\theta_1)}{n_1}+\frac{\theta_2(1-\theta_2)}{n_2}}}\rightarrow^d N(0,1)$?

Assume that $n_1\neq n_2$

Any help would be appreciated.

P.S.:I've also posted this in CrossValidated, but since it got no answer, I've decided to also post it here.

  • 0
    This is a direct application of the central limit theorem.2017-02-18
  • 0
    @nicomezi I've also thought about that, but I don't think the conclusion follows, even though the conditions are true...2017-02-18

2 Answers 2

1

Assume that we have $n$ random variables of each distribution, then define $$ \bar{X}_1-\bar{X}_2 =\frac{1}{n}\sum_{i=1}^n(X_{1i}-X_{2i})=\frac{1}{n}\sum_{i=1}^nY_i, $$ where $Y_1,...,Y_n$ are iid r.v with $\mathbb{E}Y_i = \theta_1-\theta_2 $ and $$ Var(Y_i)=\frac{1}{n}\left(\theta_1(1-\theta_1)+\theta_2(1-\theta_2)\right). $$ Now you can easily apply the CLT.

If $n_1 \neq n_2$, define then for large enough $n_i$ $$ \bar{X}_i \sim^{approx.} N(\theta_i, \frac{\theta_i(1 - \theta_i)}{n}), $$ thus you can use the fact that difference of two normal r.v is normal with the desired parameters. For more rigorous treatment you have to define $\bar{X}_1 - \bar{X}_2$ for every $(n_1, n_2)$ and then take both $n_1$ and $n_2$ to $\infty$.


For $n_1 \neq n_2$ note that strictly from CLT (or, in particular, Laplace-De Moivre theorem) for $n_i \to \infty$ you get that $$ \sqrt{n_i}(\bar{X}_{n_i} - \theta_i) \xrightarrow{D} \theta_i(1-\theta_i)Z, \quad i\in\{1,2\}, $$ hence, $$ \sqrt{n_1}(\bar{X}_{n_1} - \theta_1) - \sqrt{n_2}(\bar{X}_{n_2} - \theta_2) \xrightarrow{D} \theta_1(1-\theta_1)Z - \theta_2(1-\theta_2)Z = Z(0, \theta_1(1-\theta_1) + \theta_2(1-\theta_2) ), $$ now, note that weak convergence holds for $n_1 \to \infty$ and $n_2\to \infty$. thus starting at some $N \in \mathbb{R}$, you can state that $n_1 \approx n_2 =n$, when you get $$ \frac{\sqrt{n} ( (\bar{X}_{n_1} - \bar{X}_{n_2}) - (\theta_1 - \theta_2) )}{\sqrt{\theta_1(1-\theta_1) + \theta_2(1-\theta_2)}} \xrightarrow{D} N(0,1). $$ For any finite $n_i$ the distribution is only approximately normal. For non "large enough" $n_i$ and imbalanced design the approximation is not that good. For analysis of goodness of this approximation for small $n_i$s, you need to do much more neat analysis and not the asymptotic arguments that I've used.

EDIT:

To be precise with the requirements for $n_1$ and $n_2$, you should assure that they have the same rate of convergence, in other words $$ \frac{n_1}{n_2} \xrightarrow{} c, \quad c \in (0, \infty). $$

  • 0
    VV thanks for your answer. However, the case I'm more interested in is precisely that of $n_1\neq n_2$, and it's also where you do the most 'hand-waving'. Could you please elaborate with more detail how you prove it?2017-02-21
  • 0
    I've edited the answer2017-05-01
  • 0
    VV, on the left-side, we don't seem to get the expression $\bar X_1 - \bar X_2$, but a function of $\bar X_1 + \bar X_2$. Also, the $n_i$ will not conform either, from what I can see...2017-05-01
  • 0
    I've added some clarifications.2017-05-01
1

The assumption $n_1\neq n_2$ implies that the limit of the CLT (if it applies) must be a double limit ($\lim_{n_1\to \infty ,n_2\to \infty } \cdots$). (Recall that this is no the same as the iterated limits $\lim_{n_1\to \infty}\lim_{n_2\to \infty } \cdots$ and $\lim_{n_2\to \infty}\lim_{n_1\to \infty } \cdots$ , see eg).

To assert the validity of the CLT for the double limit seems not trivial.

The paper "Necessary and Sufficient Condition for Asymptotic Standard Normality of the Two Sample Pivot" (Majumdar, Majumdar - 2010) mentions a result from the book "Mukhopadhyay, N. (2000) Probability and Statistical Inference" which states that , for any two sequences of iid random variables with finite variances, and independent of each other:

$$ \frac{\overline X_1 -\overline X_2 - (\mu_1 - \mu_2) }{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \tag{1}$$

converges in distribution to $\mathcal N(0,1)$ along any line $n_1/n_2 = \delta \in (0,\infty)$ for $n_1,n_2 \to \infty$. Notice that still this leaves open the the convergence of double limit.

  • 0
    Thanks for the info leon ;)2017-07-08