1
$\begingroup$

I understand that in order to calculate the lower limit this is the formula that is used

Lower Limit = Q1 - 1.5(Q3-Q1)

and for the Upper Limit

Upper Limit = Q3 + 1.5(Q3-Q1)

Now my question is, where did the 1.5 come from?

It makes sense to get the median index. All you need to do is multiply the number of values(N) to 1/2. Median meaning middle, duh.

To put it briefly, I can't seem to wrap my head around the quartile thing. I'm not very good at remembering things if I don't know their source.

Update: I guess what I mean to say is, why is the number 1.5? Why can't it be 1/4, 2,4, 3/4, or 4/4?

  • 0
    The ends of the boxes are at Q1 and Q3. In a boxplot that does not show outliers, the 'whiskers' go down to the minimum from Q1 and up to the maximum from Q3. The lower and upper 'limits' you have computed are not plotted. They are invisible 'fences'. Data values outside the fences are 'boxplot outliers' and are plotted individually, usually as dots. Then the whiskers extend to the most extreme values inside the fences.2017-02-12
  • 0
    Maybe [this question](http://math.stackexchange.com/questions/966331/why-john-tukey-set-1-5-iqr-to-detect-outliers-instead-of-1-or-2) helps2017-02-12
  • 1
    In answer to your edit: Someone (probably John Tukey, credited with inventing the boxplot) decided that 1.5(IQR) above and below the box is a good criterion for determining 'outliers'. Some people use a 3(IQR) rule for 'extreme outliers.' There is some simulation evidence to argue that a better choice would have been 2.25(IQR). _So it was an arbitrary decision._ IQR = 'interquartile range'.2017-02-12
  • 0
    @BruceET Thank you! That's exactly what I was looking for.2017-02-12

1 Answers 1

0

Here is a sketch to illustrate my Comment: The data are 100 observations taken at random from an exponential distribution with rate 1. The sample has $Q_1 = 0.231, Q_3 = 1.672,$ and thus upper fence at 3.81, added in the sketch as a dotted line. The median is at 0.558, and the lower fence is outside the plotting window.

enter image description here

The observation that determines the upper end of the upper whisker is at 3.51. There are three 'outliers' at 6.62, 5.06, and 4.13. Outliers are especially common in data from a highly skewed distribution such as the exponential.

Below is a histogram of the data. The 'rug' beneath the histogram shows the locations of the 100 individual observations. The vertical green lines are at the values of the 'five-number summary'. The dotted red line is at the same value as in the boxplot.

enter image description here