3
$\begingroup$

We are taught that geometric mean (GM) should only be applied to a dataset of positive numbers, and some insist that it should be strictly positive numbers.

However, I have seen people discussing the calculation of the GM of a dataset which contains $0$s. And there seem to be at least two ways to deal with such situation:

  1. If there exists a $0$ in the dataset, then the GM is also $0$

  2. Substitute the $0$s with some other number (e.g. $1$), then work out the GM as usual

Can someone please share their opinion on when we should apply method 1) instead of method 2) and why (i.e. the justifications), and vice versa? In addition, intuitively, why we want to do 1) as it essentially throws away all the other non-zero values in the dataset? Thanks.

  • 0
    @ballib, what does that give you that the ordinary arithmetic mean doesn't?2011-12-14

2 Answers 2

3

Well, the geometric mean of a set of points $\{p_1,\dots,p_n\}$ is given by $\left(\prod_{i=1}^np_i\right)^{1/n}$ so if any of the $p_i$ are zero, the whole product and hence whole expression will be zero.

The substitution method is used when you want to take the logarithm of the geometric mean (which will convert products into sums); zeros are excluded because the logarithm of zero is undefined.

See this this article for more detail.

3

Systematically, option 1 is the right one because a geometric mean is an $n$th root of the product of the data points, and if one of the points is zero, then that collapses the entire product into 0.

Intuitively, you can also look at this by considering that the geometric mean is the antilogarithm of the artithmetic mean of the logarithms of the data points, and the logarithm of 0 is (remember, we're speaking intuitively here) $-\infty$, so it drags the mean of the logarithms clean down to $-\infty$ too.

You can also argue by continuity: If you have a set of data points, and begin to shrink one of them towards zero, keeping the others unchanged, the geometric mean of all of them will also eventually drop towards zero. If you single out a true $0$ for different processing from a value that is merely very small, the output will be discontinuous as a function of the inputs -- that is, at the time your shrinking value hits $0$ exactly, the mean would jump upwards, which is counterintuitive too.

Substituting another number is more a desperate hack that tries to make the other values mean something in this case too. In many practical contexts, it might be more principled simply to leave the zero values out of the mean calculation.

  • 0
    "In many practical contexts, it might be more principled simply to leave the zero values out of the mean calculation." Can you back up this claim with some examples? Many thanks.2011-12-14