3
$\begingroup$

Consider this PDF.

PDF

I understand why $P(X = 0)$ is really close to $0$. That is the case for any $P(X = x)$.

Then why on the $y$-axis there are values ranging from $0$ to $0.4$ if for any $x$, $P(X = x) > 0$ (very close to 0) ? Can someone explain this to me in an intuitive way?

  • 0
    Because $P(x_10$2017-01-12
  • 0
    It is very close to 0. But why all the values are not close to 0 on the graph then?2017-01-12
  • 0
    The probability that $x$ lies in an interval is not always close to $0$2017-01-12
  • 1
    $P(x_1$x=x_1$ and $x=x_2$, while $P(x_1 \le X \le x_2)$ is much the same but in theory not "strictly", this makes no difference to the area or the probability. This means $P(X=x) = P(x \le X \le x) = 0$ because the area under a point is zero – 2017-01-12
  • 0
    So, if I understand correctly, the individual values on the y-Axis don't represent anything. It's just that they are drawn this way in order for the area under the curve (between x1 and x2) to represent the probability of x being between x1 and x2?2017-01-12

3 Answers 3

3

The misunderstanding here is that PDFs do not, in fact, show you the probability of events but their probability density.

Intuitively, consider that, if there are $10$ numbers with a uniform distribution, the probability of drawing one of them is $1/10$. If there are $100$ numbers, the probability is $1/100$. For $n$ numbers, it is $1/n$. However, there are an infinite number of real numbers between any two real numbers. So the probability of a specific real number being observed would be $0$ ($\lim_{n\rightarrow \infty} \frac{1}{n} = 0$). In fact, $P(X=0)$ is not close to zero, as you state, it is exactly zero.

However, just because the probability of drawing exactly a 0 is 0, that doesn't mean that we can't somehow describe the likelihood of a draw being less than 0, greater than 0, or close to 0. That's what the PDF allows us to do. For any PDF, the area under the curve must be 1 (the probability of drawing any number from the function's range is always 1). The integral $\int_{x_1}^{x_2} P(X)dX = y$ must always obey $0 \le y \le 1$, and $y$ will give the probability of $P(x_1 < X < x_2)$.

Confusingly, this means that knowing that the probability density of $0$ is $0.4$ doesn't actually tell you very much in isolation. The probability density may be greater than 1 (e.g., a normal distribution with $\sigma=1/100$ has a probability density of almost 40 at 0), or it may be very small everywhere (one with $\sigma=100$ has its greatest density at $0$ of $\sim 0.004$). Instead the curve tells you something about the probability of different numbers relative to each other.

Finally, there is a reason that it is called a probability density function: consider that with physical objects, the integral of an object's density over its volume yields its mass; the integral of its density over a small part of its volume will yields the mass of just that part (but the density of the part without information about the volume tells you relatively little, and the density at a single infinitesimal point in the object tells you basically nothing about the object's mass). With probabilities it is similar: the integral of the probability density over a range yields the probability of events within that range.

  • 0
    Great answer. Thank you!2017-01-12
0

The individual values don't represent probabilities, but they aren't meaningless. It is absolutely correct to think of them as a means of getting probabilities of falling within a set of values (by integration/area under the curve).

Additionally, the numbers give relative probabilities. That is, say for some random variable $X$ that $f(1)=0.4$ and $f(2)=0.1$. Then, on any particular realization, or sample, of $X$ it is 4 times as likely to be 1 than it is to be 2. This type of reasoning is very useful in trying to determine a distribution when you have a bunch of samples (e.g., maximum likelihood estimation).

  • 0
    I don't think your comment about the relative likelihoods is correct. The probability of observing either a 1 or a 2 is exactly 0. Neither is more likely than the other. Probably what you mean is that values close to 1 are 4 times as likely as values close to 2? Formally, $\lim_{\epsilon \rightarrow 0} \frac{P(x_1 - \epsilon < X < x_1 + \epsilon)}{P(x_2 - \epsilon < X < x_2 + \epsilon)} = \frac{f(x1)}{f(x2)}$?2017-01-13
  • 0
    What you've written is incorrect. Firstly, for an absolutely continuous R.V. (i.e., one with a Radon–Nikodym derivative, say $f(t)$) we have $\mathbf{P}(X \in A) = \int_A f(t)dt$ therefore $0 = \mathbf{P}(X=x_1) = \int_{x_1}^{x_1}f(t)dt = \lim_{\epsilon \rightarrow 0} \int_{x_1 - \epsilon}^{x_1 + \epsilon}f(t)dt = \lim_{\epsilon \rightarrow 0} \mathbf{P}(x_1 - \epsilon < X < x_1 + \epsilon)$. So, as you've written your limit you've simply give the ratio of probabilities and shown that we have an indeterminate form....(continued on next comment)2017-01-13
  • 0
    Now, instead we can define the ratio of probabilities $\frac{\mathbf{P}(X = x_1)}{\mathbf{P}(X = x_2)} = \lim_{\epsilon \rightarrow 0}\frac{F(x_1) - F(x_1-\epsilon)}{F(x_1) - F(x_1-\epsilon)} = \lim_{\epsilon\rightarrow 0}\frac{f(x_1-\epsilon)}{f(x_2-\epsilon)} = \frac{f(x_1)}{f(x_2)}$. Where the second to last equality comes from L-Hopital's rule (differentiating, of course, w.r.t. $\epsilon$).2017-01-13
  • 0
    $\frac{P(X=x_1)}{P(X=x_2)} = \frac{\lim_{\epsilon\rightarrow 0}(F(x_1) - F(x_1 - \epsilon))}{\lim_{\epsilon\rightarrow 0}(F(x_2) - F(x_2 - \epsilon))}$; I don't see how this must be equal to $\lim_{\epsilon\rightarrow 0}\frac{F(x_1) - F(x_1 - \epsilon)}{F(x_2) - F(x_2 - \epsilon)}$. This is not generally true ($\frac{\lim_{x\rightarrow 0} x^2}{\lim_{x\rightarrow 0} x} \ne \lim_{x\rightarrow 0}\frac{x^2}{x}$). If you are defining the "ratio of probabilities" to be this for convenience, then I don't think it's correct to state that a sample is "4 times as likely to be 1 as it is to be 2"?2017-01-13
0

Discrete Distributions. For a discrete distribution, such as the binomial a PDF (probability distribution function) or PMF (poin mass function) is essentially a list of probabilities, one for each possible value of the random variable. So, if $X \sim Binom(n=3,p = .4)$, we can give a table such as the one below for R statistical software. In particular, $P(X = 2) = 0.288.$

 x = 0:3;  pdf = dbinom(x, 3, .4)
 cbind(x, pdf)
      x   pdf
   ## 0 0.216
   ## 1 0.432
   ## 2 0.288
   ## 3 0.064

For a discrete distribution that takes countably many values, such as a Poisson distribution $Y \sim Pois(2),$ one can make only a partial list, including the most important values. In the table below, probabilities not shown are $0$ to four places. One might also give a formula such as $P(X = k) = e^{-2}\frac{2^k}{k!},$ for $k = 0, 1, 2, \dots .$

y = 0:10;  pdf = round(dpois(y, 2),3)
cbind(y, pdf)
 ##  y   pdf
 ##  0 0.135
 ##  1 0.271
 ##  2 0.271
 ##  3 0.180
 ##  4 0.090
 ##  5 0.036
 ##  6 0.012
 ##  7 0.003
 ##  8 0.001
 ##  9 0.000
 ## 10 0.000

One can also make 'bar charts' of such distributions to show which discrete values have how much probability.

enter image description here

Continuous Distributions. Continuous distributions have probabilities defined for intervals, not discrete points. If $f_W(x) \ge 0$ is the PDF (probability density function) of a continuous random variable $W,$ then we define $$P(a < W \le b) = \int_a^b f_W(x)\,dx.$$ So that the total probability is 1, it is understood that $\int_{-\infty}^\infty f_W(x)\,dx = 1.$

One way to view the density function $f(x)$ is to consider it as a 'smoothed histogram' of observations $W$ that might result when a random sample is taken from a population.

Specifically, suppose that $W \sim Norm(\mu=100, \sigma=15).$ Perhaps these are scores $W$ on a standardized college admissions test. Here is a histogram (tan bars) of a sample of size $n = 1000$ from this distribution. Superimposed on the histogram is the PDF (blue curve) of $NORM(100,15).$

enter image description here

Even with a sample as large as a thousand, the fit of the data to the histogram is not perfect, but it is good enough for an illustration. Notice that this is a special kind of histogram called a density histogram. It is scaled so that the total area of all the bars is 1. This matches the total area under the PDF.

Area under the curve in $(100,110].$ According to the PDF curve, the probability $P(100 < W \le 110)$ of a score between 100 and 110 is the area under the curve between the two vertical red lines. It is 0.2475.

diff(pnorm(c(100,110),100,15))
## 0.2475075

Nearly matching area of the histogram bar for $(100,110].$ Out of the 1000 test scores, the number in this interval was 260. The height of the corresponding histogram bar is 0.026 on the density scale; this bar has an area (width times height) of $10(0.026) = 0.260.$ This roughly matches the corresponding area 0.2475 under the density curve.

The match is not exact because a sample of size 1000 does not perfectly represent the population. The histogram of a larger sample would tend to have better matches. (Even in the current histogram, I could have gotten a better match using the interval $(90,100].$)