4
$\begingroup$

Let's say I have two independent random samples $X_1, X_2, \dots, X_n$ and $Y_1, Y_2, \dots, Y_n$ from normal distributions with real, unknown means $\mu_x$ and $\mu_y$ and known standard deviations $\sigma_x$ and $\sigma_y$.

How would I go about deriving a $100(1 - \alpha)$% confidence interval for $\mu_x - \mu_y$? This is straight forward (in my mind) assuming the standard deviations are equal, but what if they are unequal?

  • 1
    You know the term 'pivotal quantity' ? $\dfrac {\bar X - \bar Y -\mu_x + \mu_y}{\sigma_x^2/n + \sigma_y^2/m}$ is one, and you cna get a confidence interval form that2012-08-16
  • 0
    @mike : What you suggest is workable only when the two population variances are known. The bound of the confidence interval will depend on them.2012-08-16
  • 0
    I see: He did say they're known.2012-08-16

2 Answers 2

4

Alright, you say known variances. So it's an exercise on a point of theory, not a realistic problem.

And you actually assume the two sample sizes are equal.

Start by recalling something from the one-sample problem: $$ \bar{X} = \frac{X_1+\cdots+X_n}{n} \sim N\left(\mu_x,\frac{\sigma^2_x}{n}\right) $$ $$ \bar{Y} = \frac{Y_1+\cdots+Y_n}{n} \sim N\left(\mu_y,\frac{\sigma^2_y}{n}\right) $$ You don't explicitly state that the two samples are independent. If they are, they we have $$ \bar X - \bar Y \sim N\left(\mu_x-\mu_y,\frac{\sigma^2_x+\sigma^2_y}{n}\right) $$ (If we had unequal sample sizes $n$ and $m$, then the variance would be $\dfrac{\sigma^2_x}{n}+\dfrac{\sigma^2_y}{m}$.)

So $$ \frac{((\bar X-\mu_x) - (\bar Y-\mu_y))\sqrt{n}}{\sqrt{\sigma^2_x+\sigma^2_y}} \sim N(0,1). $$ So the probability that $$ -A < \frac{(\bar X-\mu_x) - (\bar Y-\mu_y)}{\sqrt{ \frac{\sigma^2_x+\sigma^2_y}{n} }} $A$ is suitably chosen. Now do a bit of algebra to rearrange the inequalities $(1)$: $$ \bar X - \bar Y - A\sqrt{\frac{\sigma^2_x+\sigma^2_y}{n}} < \mu_x-\mu_y < \bar X - \bar Y + A\sqrt{\frac{\sigma^2_x+\sigma^2_y}{n}} $$ That's the confidence interval.

  • 0
    This was here for a few minutes without the factor or $A$ in two places in the last line. Now I hope it's correct.2012-08-16
2

When the standard deviations are unequal the inference problem of comparing two means is often called the Behrens - Fisher problem. The pivotal quantity for testing or constructing confidence intervals is a "t-like" statistic gotten by taking the difference of the two sample means and dividing by the sample estimate of the standard error of the mean difference. The standard error is a function of the two unknown standard deviations and and the sample sizes used. The estimate involves replacing the unknown variances with their sample estimates. The distribution of the test statistic under the null hpothesis that the means are equal is sometimes called Welch's distribution. It can be approximated by a t distribution whoses degrees of freedom are fractional (not necessarily an integer). This approximation is called the Satterthwaite approximation. This Wikipedia link provides the detailed information:

Welch-Satterthwaite Approximation .

If the variances are unequal and known then the pivotal quantity to use is what is given by Mike and it will have a standard normal distribution. In practice the variance are not known unless you have knowledge that they are the same as variances that have been previously estimated based on very large samples.

  • 0
    note that his $\sigma$s are *known*2012-08-16
  • 0
    I see that the OP states that but if the variances are known how can you not know whether or not the are equal? In the last sentence the OP seems to be asking what to do if you do not know that the variances are equal.2012-08-16
  • 0
    Sorry guys -- poorly worded on my part. Edited.2012-08-16