Welch t test. Unless you have good reason from prior experience with such data, you should
not assume that the population variances for Strong and Weak are equal.
In your case, I think you should begin with a Welch ('separate variances')
two-sample t test. I assume the formula is in your book. (Including, a
somewhat complicated additional formula for finding degrees of freedom.)
Here is output from Mintab 17 statistical software. I have rounded means
and SDs, but that should not make a consequential difference in the results.
Two-Sample T-Test and CI
Sample N Mean StDev SE Mean
1 1541 555.3 64.9 1.7
2 1598 512.9 70.8 1.8
Difference = μ (1) - μ (2)
Estimate for difference: 42.37
95% CI for difference: (37.62, 47.12)
T-Test of difference = 0 (vs ≠):
T-Value = 17.49 P-Value = 0.000 DF = 3129
The T-value 17.49 is quite large. If $|T| > 1.96,$ then you would
reject $H_0: \mu_1 = \mu_2$ against $H_a: \mu_1 \ne \mu_2$ at the 5%
level of significance.
The P-value is the probability that a value
as far from 0 or farther than 17.49 would occur due to sampling
error if $H_0$ were true. Here it is $P(|T| \ge 17.49) < 0.0005,$
computed using $T \sim \mathsf{T}(\nu = 3129).$
One-sided alternative. One might assume that people from homes with many books would
score generally higher on PIRLS. In that case, one might want
to do a one-sided test of $H_0: \mu_1 = \mu_2$ against $H_a: \mu_1 > \mu_2.$
Then the P-value would be half the size as for a two-sided test (but
still essentially $0$, reported as 0.000 in the software).
I think by 'double t test' you mean two-sided t test. If the researchers
anticipated before seeing data, that the scores would be higher among
students from homes with more books, then they should use a one-sided test.
F-test for equal variances. Because you are explicitly asked to do an F-test to determine whether
the data are consistent with equal variances in the two populations, you
should do that. The test statistic F is the ratio of the two sample variances.
For convenience using tables, I would put the larger sample variance in the numerator:
$F \approx 70.75^2/64.93^2 = 1.187304.$
The critical value for an F-test with such large degrees of freedom
will not be shown in most printed tables, but might guess from looking
at the largest available numerator and denominator DF than the critical
value is around 1.10. So (even though the SDs seem close together)
you have enough data to detect a difference in population variances.
[The reason for putting the larger sample variance in the numerator
is that F-tables do not usually give information for F-values smaller
than 1.]
Here is Minitab output: I entered sample SDs (smaller one first).
Test and CI for Two Variances
Method
Null hypothesis σ(First) / σ(Second) = 1
Alternative hypothesis σ(First) / σ(Second) ≠ 1
Significance level α = 0.05
F method was used. This method is accurate for normal data only.
Ratio of standard deviations = 0.918
Ratio of variances = 0.842
Test
Test
Method DF1 DF2 Statistic P-Value
F 1540 1597 0.84 0.001
The test statistic given here is the reciprocal of the one I gave above:
$1/1.187 \approx 0.84.$ The P-value 0.001 says you can reject $H_0$ at
the 0.1% level--or any greater level such as 5%.
Note: I believe it is now established statistical practice to use the
Welch t test (instead of the 'pooled' test, which assumes equal population
variances) unless there is excellent prior evidence, not just based on
the data at hand, that the populations have equal variances.
There are several reasons for not letting an T-test "decide" which t test to do. (1) The F-test has very low power. That is,
it frequently does not detect population variances are unequal even when
they are. (2) Doing a pooled t test when population variances are
unequal can lead to making the wrong decision extraordinarily often. (3)
If you do the F-test at the 5% level and then do one of the t tests at
the 5% level, the significance level of the 'hybrid' combination of
tests is not clear.
Many simulations have been done under various circumstances comparing the
Welch test by itself against doing an F-test to decide between Welch and
pooled t tests. Doing the Welch test straightaway is better, sometimes
much better. You can google for papers on this on the Internet. I have
repeated some of these simulations for myself to verify that I get the
same results, because nowadays we have the computer power to do more accurate simulations.