1
$\begingroup$

Just want to make sure that I understand the meaning of an outlier.

Question: Can you have an outlier of categorical data?

I think that to have an outlier you must first have some sort of measurement. My reason is that any data point > 3*IQR (Interquartile range) is used to identifiy an outliner.

However, there is no measurement with categorical data, as I understand.

2 Answers 2

2

Suppose you have 1000 people choose between apples and oranges. If 999 choose oranges and only one person chooses apple, I would say that that person is an outlier.

We use measurement as a way to detect anomalies. With categorical data you have to explain why choosing an apple is considered an anomaly (that data point does not behave as the rest 99.9% of the population).

There are also papers that talk about outliers in categorical data, for example http://www.cs.umn.edu/tech_reports_upload/tr2008/08-008.pdf.

  • 1
    Please could you provide an updated link? This one is dead.2017-10-20
1

I don't know what the standard treatment of this problem is, however I have a remark about the question. In order for the concept outlier to have any meaning you need to be able to define a distance between the values, that in this case may not be trivial i.e. is an apple closer to an orange or a pear?

  • 0
    Genetically and morphologically, an apple is closer to a pear than either is to an orange.2012-10-23