97
$\begingroup$

Can someone explain why taking an average of an average usually results in a wrong answer? Is there ever a case where the average of the average can be used correctly?

As an example, let's say that an assessment is given to three schools and I want to find out the average score for all three schools combined and the average score per school. When I attempt to add the three individual scores and divide by three I get a number that is very close (+/- 1 percent) to the actual overall average.

  • 0
    This distortion of averages is a common trick of journalists and politicians to tilt statistical evidence to support their positions on various issues.2018-06-29

3 Answers 3

82

If there are $n_1$, $n_2$, and $n_3$ students in the three schools, and the average test score for each school is $a_1$,$a_2$,$a_3$, respectively, the correct average is a "weighted average:"

$\frac{n_1}{n_1+n_2+n_3}a_1+\frac{n_2}{n_1+n_2+n_3}a_2+\frac{n_3}{n_1+n_2+n_3}a_3$

The average of the averages is:

$\frac{1}{3}a_1 + \frac{1}{3}a_2 + \frac{1}{3}a_3$

These two values will be exactly the same if each school has exactly the same number of students, and will tend to be "close" if the schools are relatively close in size and/or the scores for the three schools are close.

If a school system put all the smart students at a single school, they could bump up the second value - the "average of averages" - but they couldn't do that if they take the correct weighted average.

  • 0
    In the answer, $n_i$ is the number of students in school $i$, not the sum of score for the students in school $i$, but I can see how the text would cause confusion. Will edit for clarity.2017-07-27
46

For example: the average of $ \{ 2,2,2,2,2,2,2,2,2,2,2,2,2\} $ is $2$, ($N=13$) and the average of $\{4\}$ is $4$, ($N=1$). The average of the averages is $3$. But the average of all numbers is $30/14$$2.14$.

I hope this is enough to explain what goes wrong (you're giving equal weights to the "first averages" when you take their average, which isn't the correct thing to do if you want the average of all the numbers).

18

Thomas Andrews already answered the question, but I'd like to present a more analytical solution to the problem.

The average of averages is only equal to the average of all values in two cases:

  1. if the number of elements of all groups is the same; or
  2. the trivial case when all the group averages are zero

Here's why this is so.

Consider two sets $X = \{x_1, x_2, ..., x_n\}$ and $Y = \{y_1, y_2, ..., y_m\}$ and their averages:

$ \bar{x} = \frac{\sum_{i=1}^{n}{x_i}}{n} \,,\, \bar{y} = \frac{\sum_{i=1}^{m}{y_i}}{m} $

The average of the averages is:

$ average(\bar{x}, \bar{y}) = \frac{\frac{\sum_{i=1}^{n}{x_i}}{n} + \frac{\sum_{i=1}^{m}{y_i}}{m}}{2} = \frac{\sum_{i=1}^{n}{x_i}}{2n} + \frac{\sum_{i=1}^{m}{y_i}}{2m} $

Now consider the whole group $Z = \{x_1, x_2, ..., x_n, y_1, y_2, ..., y_m\}$ and its average:

$ \bar{z} = \frac{\sum_{i=1}^{n}{x_i} + \sum_{i=1}^{m}{y_i}}{n + m}$

For the general case, we can see that these averages are different:

$ \frac{\sum_{i=1}^{n}{x_i}}{2n} + \frac{\sum_{i=1}^{m}{y_i}}{2m} \ne \frac{\sum_{i=1}^{n}{x_i} + \sum_{i=1}^{m}{y_i}}{n + m} $

This answers the first OP question, as to why the average of averages usually gives the wrong answer.

However, if we make $n = m$, we have:

$ \frac{\sum_{i=1}^{n}{x_i}}{2n} + \frac{\sum_{i=1}^{m}{y_i}}{2n} = \frac{\sum_{i=1}^{n}{x_i} + \sum_{i=1}^{n}{y_i}}{2n} $

This is why the average of averages is equal to the average of the whole group when the groups have the same size.

The second case is trivial: $\bar{x} = \bar{y} = average(\bar{x}, \bar{y}) = 0$.

Note that the above reasoning can be extended for any number of groups.

  • 1
    This is a great answer and both the summary of the two cases and the proof are elegant. I am not sure that it answers _why_ the average of averages usually gives the wrong answer, it only shows how the two calculations are different. The application and reasoning is more interesting. What is it exactly that the calculation is measuring, how will it influence behaviour? If it is the average of all students, then why is that the choice? Is it to compare all students with each other, and is that fair? If it is the average between schools then why? What are the dangers of misinterpretation?2018-09-21