-2
$\begingroup$

I'm trying to define the different elements of a Box Plot and how they depend on a sample of N numbers. More specifically, I was curious on the sample of N numbers, since I try to explain it more in terms of the 5# summary but need something stronger. Please advise.

1 Answers 1

1

Your question is a little vague, but touches on important issues.

First, I think it is inappropriate to make boxplots when the sample size $n$ is less than about a dozen. The 'idea' of the five number summary is to find numbers that, roughly speaking, cut the sorted sample into four 'chunks', all of about the same relative size. For very small $n$ you can get some strange looking boxplots: such things as whiskers going into the box, or no whiskers at all.

All boxplots should be accompanied by a note that reports the sample size. There is no way from the boxplot alone to get an idea of sample size.

Boxplots outliers are often misunderstood. For data known to come from a normal population, it is common to see boxplot outliers. More so as the sample size increases. The Empirical Rule says "almost all" of a normal sample is within three standard deviations of the mean, and that is sometimes a useful guide. But normal tails actually extend to $\pm \infty,$ so it is inevitable to get outliers, and boxplots are maybe a little too aggressive in indicating them.

Below are boxplots of 20 samples, each with $n=20$ from a normal distribution (with mean 100 and SD 15---sort of like IQ scores). You can see that several of them show outliers (heavy dots); a couple of samples show two outliers. This is not a contrived example. It represents typical behavior of real data.

enter image description here

Here are some basic facts about boxplots. Roughly speaking, half of the observations are within the box, and half outside the box. Also, the cross-bar within the box is at the median, so about half of the observations are above and below that cross-bar. The middle three numbers in the five-number summary are essentially the lower quartile, the median, and the upper quartile. So the cross-bar and the two ends of the box divide the sample into four chunks of about the same size. (I say 'essentially' because some software uses slightly different rules for these features of the bosplot, but usually these differences don't matter, especially not in large samples.)

Now, if you can say specifically what you were wondering about that I did not mention above, please leave a Comment, and I will check later to try to respond.

  • 0
    I was wondering on the STATISTICAL properties on a sample of numbers within the Boxplot. Could you please elaborate more on this? Thanks2017-02-17
  • 0
    Not clear to me what you want. Please give specific examples, and I'll try to answer. Or someone else may.2017-02-17