2
$\begingroup$

When would you use a Box Plot, a Histogram, or a QQPlot to graphically summarize a SAMPLE of numbers?

Interpretation:

a sample of numbers: like a dataset that only contains numbers but not discrete values such as categories.

OR:

It means whether the change of the number of samples in the dataset would have some effect on the box plot, QQplot and histogram.

2 Answers 2

0

The second is true. Since you have a sample, and the Box Plot/Histogram/QQPlot can only display the data is your sample (since the population is unknown), they will change for each sample.

Whilst it is inappropriate to plot a QQ plot if you have discrete values (because the QQ plot is testing for normality... and discrete values are obviously not normal because the normal distribution is continuous), you can use histograms and box plots to plot discrete data.

If I'm understanding the question correctly...

  • 0
    QQplots are not used only for testing for normality. For example, if you plot the observed order statistics against the expected order statistics of a sample from an exponential distribution, you've got a QQplot that you'd used for testing whether the population is exponentially distributed.2017-02-12
  • 0
    There is nothing wrong with using any of these three kinds of graphical displays for discrete data, for example observations from Binom(n=30,p=.5).2017-02-12
0

It's unclear whether the question means (a) when would you use any of the three methods, or (b) how decide which of the three is best.

For (a), you'd want numerical data rather than categorical data. Some data (binomial, Poisson) are discrete in the sense that they can have only integer values. However, in real life, all data wind up being rounded to some number of decimal places, and so are actually discrete. But when one can imagine that the population values are continuous, it is often convenient to treat rounded data as if they are continuous--insofar as feasible.

For (b), there are some guidelines about sample size that make sense.

It isn't a good idea to use boxplots when the sample size is very small (say, below a dozen) because boxplots depend on quartiles. How can you meaningfully divided 9 sorted values into four essentially equal 'chunks'?

Histograms often work best for large samples. For large samples one can have more bins (intervals, bars) and yet get a smooth appearance than suggests the shape of the population PDF.

Q-Q Plots are often used as a quick way to judge the nature of the population distribution---perhaps most often to judge whether the population is normal, in which case a normal Q-Q Plot will tend to be a straight line. Q-Q plots also give an instant impression how many observations there are, one for each dot. (Neither boxplots nor histograms give an indication of sample size without extra embellishments or annotations.)

Below, I will post each kind of plot for a small sample, and then for a large sample, in hopes of illustrating the points made above.

Small sample ($n = 15$): Exponential population. Too few observations for a histogram that closely matches the population density (blue curve). A normal Q-Q plot happens to show points in a distinctly non-linear configuration, so it seems unlikely the data are normal. (I say 'happens to show' because not all Q-Q plots of 15 exponential observations would be so clearly non-normal.)

enter image description here

Large sample ($n=1500$): Normal population. The boxplot is symmetrical and shows a few outliers; both features are typical of a normal sample. The histogram gives a very good idea of the shape of the population PDF. The normal Q-Q plot is very nearly linear (except for a few straggling points at either extreme, which are to be expected).

enter image description here

Finally, an example to show how these plots summarize discrete data.

Discrete Data ($n=700$): Observations from Binom(25, .5). The histogram is plotted with 26 bins, matching the possible binomial values. The blue dots near the tops of histogram bars show exact binomial probabilities. Because this binomial distribution is reasonably well approximated by a normal distribution, the normal Q-Q plot is mainly linear (except for horizontal clusters showing tied values).

enter image description here