2
$\begingroup$

I am currently plotting density plots in R to view the distribution of my data.

An example density plot is here

I understand N is the number of values in the data set and bandwidth is the the smoothing used. But on the Y-axis what is the density? and how is it calculated?

1 Answers 1

2

The amount being plotted is an approximation to the probability density function of the population from which your data is drawn. If your data points are $(x_1, x_2, \ldots, x_n)$ then this is

$ y = {1 \over n} \sum_{i=1}^n {1 \over \sigma} f\left({x-x_i \over \sigma}\right) $

where $f$ is some nonnegative function with $\int_{-\infty}^\infty f(x) \: dx = 1$, called the kernel, and $\sigma$ is some constant related to what R calls the bandwidth. Essentially what this does is to put a peak of width approximately $\sigma$ at each data point and then average those.