2
$\begingroup$

The international comparative school performance study PIRLS raised in 2011 the reading competences of the fourth graders in more than $40$ countries, among others in Germany.

They would now like to investigate how the cultural capital of the parents affects the reading competencies of the children in the fourth class. They operationalize the cultural capital as the volume of the books in the parents' house.

They want to find out if there is a significant difference between children whose parents have over $100$ books, and children whose parents have a maximum of one hundred books.

A table of statistic is given: enter image description here where Mittelwert=mean value, Standardabweichung=standard deviation, Standardfehler des Mittelswertes=standard error of mean value.

We have to compute a T-Test to find out, if we can confirm out hypothesis and to get a statistic significant relation. First we have to prove with a F-Test, if we have to apply a double T-Test or a Test of Welch. At the F-Test and the T-Test we have a significance level of $5 \%$.

$$$$

How can we check what T-Test we have to apply?

The null hypothesis is that when the parents have more than $100$ books then the children are better in reading than others, or not?

Do we get from that that we don't need a double T-Test?

Also using the F-Test how can we check if there is an homogeneity or a heterogeneity of variance? Do we have to use this formula ?

1 Answers 1

2

Welch t test. Unless you have good reason from prior experience with such data, you should not assume that the population variances for Strong and Weak are equal. In your case, I think you should begin with a Welch ('separate variances') two-sample t test. I assume the formula is in your book. (Including, a somewhat complicated additional formula for finding degrees of freedom.)

Here is output from Mintab 17 statistical software. I have rounded means and SDs, but that should not make a consequential difference in the results.

 Two-Sample T-Test and CI 

 Sample     N   Mean  StDev  SE Mean
 1       1541  555.3   64.9      1.7
 2       1598  512.9   70.8      1.8

 Difference = μ (1) - μ (2)
 Estimate for difference:  42.37
 95% CI for difference:  (37.62, 47.12)
 T-Test of difference = 0 (vs ≠): 
     T-Value = 17.49  P-Value = 0.000  DF = 3129

The T-value 17.49 is quite large. If $|T| > 1.96,$ then you would reject $H_0: \mu_1 = \mu_2$ against $H_a: \mu_1 \ne \mu_2$ at the 5% level of significance.

The P-value is the probability that a value as far from 0 or farther than 17.49 would occur due to sampling error if $H_0$ were true. Here it is $P(|T| \ge 17.49) < 0.0005,$ computed using $T \sim \mathsf{T}(\nu = 3129).$

One-sided alternative. One might assume that people from homes with many books would score generally higher on PIRLS. In that case, one might want to do a one-sided test of $H_0: \mu_1 = \mu_2$ against $H_a: \mu_1 > \mu_2.$ Then the P-value would be half the size as for a two-sided test (but still essentially $0$, reported as 0.000 in the software).

I think by 'double t test' you mean two-sided t test. If the researchers anticipated before seeing data, that the scores would be higher among students from homes with more books, then they should use a one-sided test.

F-test for equal variances. Because you are explicitly asked to do an F-test to determine whether the data are consistent with equal variances in the two populations, you should do that. The test statistic F is the ratio of the two sample variances. For convenience using tables, I would put the larger sample variance in the numerator: $F \approx 70.75^2/64.93^2 = 1.187304.$

The critical value for an F-test with such large degrees of freedom will not be shown in most printed tables, but might guess from looking at the largest available numerator and denominator DF than the critical value is around 1.10. So (even though the SDs seem close together) you have enough data to detect a difference in population variances. [The reason for putting the larger sample variance in the numerator is that F-tables do not usually give information for F-values smaller than 1.]

Here is Minitab output: I entered sample SDs (smaller one first).

 Test and CI for Two Variances 

 Method

 Null hypothesis         σ(First) / σ(Second) = 1
 Alternative hypothesis  σ(First) / σ(Second) ≠ 1
 Significance level      α = 0.05

 F method was used. This method is accurate for normal data only.

 Ratio of standard deviations = 0.918
 Ratio of variances = 0.842

 Test

                          Test
 Method   DF1   DF2  Statistic  P-Value
 F       1540  1597       0.84    0.001

The test statistic given here is the reciprocal of the one I gave above: $1/1.187 \approx 0.84.$ The P-value 0.001 says you can reject $H_0$ at the 0.1% level--or any greater level such as 5%.

Note: I believe it is now established statistical practice to use the Welch t test (instead of the 'pooled' test, which assumes equal population variances) unless there is excellent prior evidence, not just based on the data at hand, that the populations have equal variances.

There are several reasons for not letting an T-test "decide" which t test to do. (1) The F-test has very low power. That is, it frequently does not detect population variances are unequal even when they are. (2) Doing a pooled t test when population variances are unequal can lead to making the wrong decision extraordinarily often. (3) If you do the F-test at the 5% level and then do one of the t tests at the 5% level, the significance level of the 'hybrid' combination of tests is not clear.

Many simulations have been done under various circumstances comparing the Welch test by itself against doing an F-test to decide between Welch and pooled t tests. Doing the Welch test straightaway is better, sometimes much better. You can google for papers on this on the Internet. I have repeated some of these simulations for myself to verify that I get the same results, because nowadays we have the computer power to do more accurate simulations.

  • 1
    You link to Levene's test for equal variances. Because you are doing t tests to compare means there is some presumption that data must be nearly normal. That is why I chose to use the F-test (variance-ratio test). Levene's test does not assume normality, and is not as powerful as even the under-powered F-test for normal data. To make a wise choice one would do normality tests on the original data, but you have only summary data (means, SDs). Also, I think it would be difficult (maybe even impossible: not sure) to compute Levene's test without having the original observations.2017-02-14
  • 0
    I understand!! I have also calculated the test statistic and now I want to calculate the p-value? Which formula do we have to use?2017-02-15
  • 1
    P-value for exactly what? t or F. One- or two-sided? Exact P-value requires software. No printed tables for that. Do you have favorite software?2017-02-15
  • 0
    For t. Ah so we cannot calculate it by hand? What software do you suggest?2017-02-15
  • 1
    Simple example. Suppose $T = 2.8$ with 15 DF. My table shows P(T > 2.602) = .01 and P(T > 2.947) = .005, so P-value of a right-tailed test is bracketed between .01 and .005. But with R statistical software I can use `1 - pt(2.8, 15)` to get exact P-value 0.00673, rounded to 5 places. (In R `pt` denotes the CDF of the t distribution.)2017-02-15
  • 0
    For the example of this post I got T=17.52 and DF=3137. Are there tables that shows apporximately the result for so big numbers? In R I get $0$.2017-02-15
  • 1
    Most printed t tables go to about DF=100 and then stop because t with huge DF is so close to standard normal. (Of course the "0" you get in R just means underflow, not exactly 0; maybe < $10^{100}.$2017-02-15
  • 0
    I see!! Thank you so much for your help!! :-)2017-02-15