2
$\begingroup$

I'm trying to construct a rough statistical software and I'm confused about the following:

  1. What is the mode if all numbers are unique?
  2. What is the mode if 2 numbers have same (highest) frequency?

What I feel:

  1. Should output nothing.
  2. Should output the average of the 2.

Just as reference:

a = [1 2 3 4 5] mode(a) %Prints 1 a = [2 2 3 3] mode(a) %Prints 2 

in Octave

But I'm not sure how it's supposed to be done.

  • 0
    For 2.: then you have two modes (i.e. your data is *bimodal*).2012-07-07
  • 0
    @J.M., in that case, if all elements are unique, I should have $n$ modes?2012-07-07

1 Answers 1

3

In general, a data set can have multiple modes. If you must return a single value, pick any one of them. (Octave seems to pick the least one.) But don't pick something that isn't a mode. So don't use your second idea: when asked for the mode of $[1,1,2,5,5]$, the value $3$ is not a sensible answer.

  • 0
    So, I assume there isn't some International standard about what to choose as mode? I think I'd rather output all possibilities and put out a warning to the user about what I'm doing. If there are $n$ modes, isn't it a good idea to use the _median_ of these $n$ modes as the true mode?2012-07-07
  • 0
    I don't know if having multiple modes in a data set is such an exceptional case that you should emit a warning every time it happens. Also, the mode still makes sense for nominal data, e.g. *apple* and *orange* are both modes of the data set [*apple, apple, banana, orange, orange*], but what is the median of *apple* and *orange*? I would be wary of putting in something that doesn't generalize as far as the original operation does.2012-07-07
  • 0
    Makes sense. So, I think the choice is between random and first/last occurrence.2012-07-07
  • 2
    If you document your software with something to the effect of "If there are multiple modes, an arbitrary one is returned", you don't even have to worry about what choice you make. :)2012-07-07
  • 1
    Generally speaking the mode is not defined to be unique. If two or more values tie as most frequently occuring they both or all are modes. It is unconventional to pick one of the modes and call it "the mode". When the distribution is absolutely continous and has a density the mode is defined as the highest peak but other peaks are local modes and a density with two peaks even if not of equal height are referred to as bimodal.2012-07-07