4
$\begingroup$

I posted the following question over on StackOverflow, and I think I am having a hard time understanding exactly how a histogram works.

I have the following set of numbers between 0 and 9:

   [1] 1 0 1 4 0 0 7 3 5 3 8 9 1 3 3 1 2 0 7 5 8 6 2 0 2 3 6 9 9 7 8 9 4 9 2 1 3   [38] 1 1 4 9 1 4 4 2 6 3 7 7 4 7 5 1 9 0 2 2 3 9 1 1 1 5 0 6 3 4 8 1 0 3 9 6 2   [75] 6 4 7 1 4 1 5 4 8 9 2 9 9 8 9 6 3 6 4 6 2 9 1 2 0 5 9 2 7 7 2 8 8 5 0 6 0  [112] 0 2 9 0 4 7 7 1 5 7 9 4 6 1 5 7 6 5 0 4 8 7 6 1 8 7 3 7 3 1 0 3 4 5 4 0 5  [149] 4 0 3 5 1 0 8 3 7 0 9 6 6 9 5 4 6 9 3 5 4 2 4 8 7 7 5 8 8 8 2 6 9 3 1 0 4  [186] 1 5 9 0 6 2 1 3 0 6 0 0 8 3 2 0 0 6 0 0 4 7 2 7 1 9 9 3 9 8 4 6 6 5 3 8 1  [223] 8 7 1 3 7 6 3 6 3 6 3 2 3 2 2 7 9 2 3 2 7 5 5 8 8 2 0 1 4 0 6 3 7 1 1 1 4  [260] 7 0 2 9 2 0 5 6 0 8 9 6 2 0 0 7 2 0 4 2 0 9 1 6 9 3 0 0 2 0 6 8 4 0 7 2 1  [297] 9 5 2 4 8 5 2 9 7 9 2 9 7 4 9 3 2 7 3 6 3 6 8 8 3 7 0 9 2 7 9 0 5 4 5 8 4  [334] 3 3 1 7 8 9 7 6 2 1 7 0 5 6 5 2 9 5 4 6 2 2 2 9 0 7 7 2 2 6 3 4 2 0 5 9 6  [371] 2 1 9 0 6 0 4 8 4 3 1 5 4 2 9 5 7 3 1 5 4 5 3 7 3 8 6 2 4 6 1 1 4 0 0 5 8  [408] 6 7 4 2 8 0 2 5 4 8 3 0 6 4 8 6 4 1 8 1 5 4 9 4 3 2 0 5 0 7 9 2 9 8 9 6 5  [445] 2 4 4 6 4 8 4 1 7 5 8 9 5 9 3 2 5 8 2 2 7 2 8 4 1 9 3 6 0 2 2 9 1 2 7 2 1  [482] 3 4 9 1 8 0 2 2 3 4 1 3 7 4 1 4 1 5 9 6 9 0 5 7 6 8 2 0 7 3 5 8 2 8 2 4 8  [519] 5 8 9 7 1 2 4 5 5 1 8 1 4 4 6 5 8 9 2 3 0 5 1 4 0 5 1 2 9 2 4 1 6 8 0 4 9  [556] 0 0 5 9 2 3 5 9 4 4 3 9 2 3 5 6 5 2 7 2 4 2 4 7 2 5 3 7 6 1 0 7 5 4 5 1 6  [593] 9 7 1 6 3 3 1 2 2 0 5 0 6 8 3 6 7 7 3 8 1 7 9 3 9 2 8 3 7 4 1 2 3 6 5 0 1  [630] 8 6 9 2 1 6 0 2 8 0 8 8 9 1 2 2 1 4 8 1 4 4 5 1 8 7 7 9 7 0 6 9 4 5 6 2 5  [667] 7 4 7 2 3 0 8 4 8 0 0 9 7 7 9 8 2 1 6 5 5 1 1 9 7 7 8 6 4 7 5 3 1 6 4 5 7  [704] 4 1 8 3 5 1 7 1 1 8 6 4 3 8 3 1 2 8 9 0 9 1 2 3 3 0 3 0 2 0 3 3 8 3 5 7 0  [741] 5 9 0 5 9 1 5 1 1 2 6 5 5 4 5 1 6 0 2 2 8 0 7 1 0 8 5 6 3 2 9 4 3 6 0 3 4  [778] 1 5 9 3 0 5 0 6 2 7 6 6 6 9 6 7 8 2 0 6 0 8 9 5 3 6 7 4 3 9 7 2 0 4 7 2 2  [815] 8 2 7 0 4 0 5 2 8 7 7 9 1 4 0 1 1 2 3 6 2 0 6 6 1 9 4 5 2 7 7 8 9 5 8 3 8  [852] 5 6 2 0 9 7 1 8 2 6 9 8 4 9 4 1 3 8 4 0 7 7 3 7 6 6 8 8 2 7 0 4 3 7 7 0 8  [889] 4 7 4 0 6 9 8 6 0 1 6 4 5 2 7 3 6 2 2 9 2 7 4 8 7 2 9 5 3 4 8 0 4 4 6 5 6  [926] 1 2 2 8 4 5 7 8 0 6 8 9 1 7 7 2 6 3 9 9 1 0 4 2 5 4 4 9 2 6 7 2 8 3 3 2 7  [963] 0 4 7 0 7 7 8 1 7 3 7 8 0 1 0 2 9 7 6 2 2 6 9 0 6 8 8 9 6 3 5 0 2 2 5 9 6 [1000] 4 

Which produces the following histogram:

enter image description here

From what I have heard this is expected. I had assumed that a histogram would display the count of each occurrence of each number in my set. But this does not seem to be right, because 10 numbers occur in my set and there are only 9 columns in my histogram. Can someone explain what each of those 9 columns represents?

  • 0
    What does the number in [] bracket mean and where do they show on the output? Thanks.2012-08-29

2 Answers 2

7

Here are the counts of your data: (0, 107), (1, 96), (2, 124), (3, 90), (4, 102), (5, 89), (6, 97), (7, 105), (8, 93), (9, 97), where (0, 107) means that your data has 107 0's in it. It appears that whatever program you used to generate the histogram combined the counts of 0 and 1; notice that your leftmost bar has a height of just over 200 and that 107 + 96 = 203. The second bar corresponds to 2, the third bar to 3, etc.

3

A histogram is for continuous data. You are using it for discrete data and then wanting to see the discrete structure. Use a barplot instead.

Because a histogram is for continuous data, it divides the x-axis into equal sized intervals and counts up the number of observations in each interval, with intervals open on the left and closed on the right. So the intervals are $[0,1+\epsilon)$, $[1+\epsilon,2+2\epsilon)$, etc. for some small $\epsilon$.

  • 0
    If the discrete data is numerical it is perfectly valid to use a histogram although the use of $b$in intervals might not really be necessary. If the discrete data is categorical then a bar chart is used as it is the analog to the histogram for categorical data. Pie chart is another possibility, although often discouraged by statisticians.2012-08-29