-4
$\begingroup$

Are the Yankees more likely to win at home than on the road? Use data from 2016 season.

Home Record: $48-33$ Away Record: 36-45

Need to know how to calculate the population mean, population sample mean, which sample size I should use, standard deviation, and level of significance.

A step by step solution would be greatly appreciated!

  • 1
    Well what have *you* tried? What does your book say?2017-02-13

1 Answers 1

1

You have two binomial proportions from data: $\hat p_h = 48/(48+33) = 0.5926$ and $\hat p_a = 36/(36+45) = 0.4444.$

You want to test the null hypothesis $H_0: p_h = p_a$ against the alternative hypothesis $H_a: p_h > p_a.$

Clearly, $\hat p_h > \hat p_a,$ but the question is whether the observed at-home proportion is enough larger than the observed away proportion to be called 'significantly' larger in a statistical sense.

You have large enough numbers of home and away games that it is feasible to use a formula based on normal approximations to the binomial distributions.

The test statistic is

$$ Z = \frac{\hat p_h - \hat p_a}{\text{SE}}, \text{ where } \text{SE}=\sqrt{\frac{\hat p_h(1 - \hat p_h)}{n_h}+ \frac{\hat p_a(1 - \hat p_a)}{n_a}}.$$

In your case $n_h = 48+33=81$ and $n_a = 36+45=81.$ The rationale for the denominator is that we add the variances of $\hat p_h$ and $\hat p_a$ to find the variance of the difference $\hat p_h -\hat p_a,$ and then take the square root to get the standard deviation (here also called 'standard error') of $\hat p_h -\hat p_a.$

This is a standard test and Minitab statistical software has a procedure for it. Here is the computation from Minitab, which you can check on a calculator or in your favorite software package. You should be able to find the formula displayed above in most elementary statistics texts. (I have edited out some output that is not directly relevant here.)

Test for Two Proportions 

Sample   X   N  Sample p
1       48  81  0.592593
2       36  81  0.444444

Difference = p (1) - p (2)
Estimate for difference:  0.148148
Test for difference = 0 (vs > 0):  Z = 1.91  P-Value = 0.028

One rejects this one-sided test at the 5% level of significance for $Z \ge 1.645,$ which is the case here. The P-value below 5% = 0.05 also indicates rejection at the 5% level. (The P-value is computed as $P(Z > 1.91).$)

Notes: (1) An alternative test finds a 'pooled' estimate $\hat p = \frac{48+36}{162}$ and bases the standard error on that. Simulations have shown that it is a little better to estimate the standard deviations separately, as shown in the displayed equation above. Another alternative test is Fisher's 'exact' test based on a hypergeometric distribution, but the computation of the P-value for the one-sided version of that test is open to discussions that I don't want to get into here. (Both of these also show rejection at the 5% level.) You may want to learn about these alternative methods on your own. (2) The estimates and tests assume the games are independent of one another. Sports commentators make a living speculating on winning streaks, losing streaks, team morale, and other factors that might indicate interdependence of games. However, several analyses of distributions of winning and losing streaks for MLB games have indicated that data are consistent with independence. I don't suppose chatter about these statistical analyses between innings or in sports columns would help commentators' popularity.