17
$\begingroup$

I am on the hypothesis testing for two populations unit. I need some intuitive explanation as to why this formula is used. My statistics professor put this up on the board but he didn't explain why its true.

For a T distribution, the formula for the degrees of freedom is:

$$ \large \mathrm{df} = \frac{ \left(\frac{s_{1}^{2}}{n_1} + \frac{s_{2}^{2}}{n_2} \right)^{2} }{ \frac{\left(\frac{s_{1}^{2}}{n_1}\right)^2}{n_{1} - 1} + \frac{\left(\frac{s_{2}^{2}}{n_2}\right)^2}{n_{2}-1}}$$

Here $s_1, s_2$ are the sample standard deviations and $n_1,n_2$ are the sample sizes.

  • 1
    This is something of a big issue, actually. There isn't really a good test for the difference of two means with unequal variances (http://en.wikipedia.org/wiki/Behrens%E2%80%93Fisher_problem). In fact, the test statistic you are using is only approximately distributed as a t distribution.2012-04-25
  • 0
    This doesn't really give any intuition as to why that formula is used to calculate the degrees of freedom, but perhaps it indicates why your professor would not want to spend time on it.2012-04-25
  • 8
    This is based on matching moments of a linear combination of independent chi squared random variables to a gamma distribution and then using plug-in estimators. It is covered, for example, in B. L. Welch (1947), The generalization of Student's problem when several different population variances are involved, *Biometrica*, vol. 34, no. 1/2, pp. 28-35. See pages 31 and 32 in particular.2012-04-25
  • 0
    There is no reason to use `\Large` in the math display equations. I've left it as `\large` only because it makes it a little easier to see the denominator terms.2012-04-25
  • 0
    Maybe http://stattrek.com/estimation/difference-in-means.aspx?tutorial=stat can help. Under the subsection titled "If you use a t score, you will need to compute degrees of freedom (DF)."2012-05-04

1 Answers 1