33
$\begingroup$

Whats the difference between probability density function and probability distribution function?

  • 2
    The density (when it exists) is the derivative of the distribution function.2012-07-27
  • 1
    You mean, "Difference between Probability density function and cumulative distribution function?"?2014-02-05

3 Answers 3

0

Some abuse of language exists in these terms, which can vary. Below is a common usage.

In the continuous case (density):

(continuous) probability distribution function = probability density function = density function
(continuous) probability distribution = density

In the discrete case (mass/distribution):

(discrete) probability distribution function = probability mass function
(discrete) probability distribution = distribution

Oddly enough, you may never see a probability mass function called a mass function or a distribution function, nor may you see a discrete probability distribution called a mass. I am sure there is some historical reason as to why. As they say, das war schon immer so und wird auch immer so bleiben.

32

Distribution Function

  1. The probability distribution function / probability function has ambiguous definition. They may be referred to:
    • Probability density function (PDF)
    • Cumulative distribution function (CDF)
    • or probability mass function (PMF) (statement from Wikipedia)
  2. But what confirm is:
    • Discrete case: Probability Mass Function (PMF)
    • Continuous case: Probability Density Function (PDF)
    • Both cases: Cumulative distribution function (CDF)
  3. Probability at certain $x$ value, $P(X = x)$ can be directly obtained in:
    • PMF for discrete case
    • PDF for continuous case
  4. Probability for values less than $x$, $P(X < x)$ or Probability for values within a range from $a$ to $b$, $P(a < X < b)$ can be directly obtained in:
    • CDF for both discrete / continuous case
  5. Distribution function is referred to CDF or Cumulative Frequency Function (see this)

In terms of Acquisition and Plot Generation Method

  1. Collected data appear as discrete when:
    • The measurement of a subject is naturally discrete type, such as numbers resulted from dice rolled, count of people.
    • The measurement is digitized machine data, which has no intermediate values between quantized levels due to sampling process.
    • In later case, when resolution higher, the measurement is closer to analog/continuous signal/variable.
  2. Way of generate a PMF from discrete data:
    • Plot a histogram of the data for all the $x$'s, the $y$-axis is the frequency or quantity at every $x$.
    • Scale the $y$-axis by dividing with total number of data collected (data size) $\longrightarrow$ and this is called PMF.
  3. Way of generate a PDF from discrete / continuous data:
    • Find a continuous equation that models the collected data, let say normal distribution equation.
    • Calculate the parameters required in the equation from the collected data. For example, parameters for normal distribution equation are mean and standard deviation. Calculate them from collected data.
    • Based on the parameters, plot the equation with continuous $x$-value $\longrightarrow$ that is called PDF.
  4. How to generate a CDF:
    • In discrete case, CDF accumulates the $y$ values in PMF at each discrete $x$ and less than $x$. Repeat this for every $x$. The final plot is a monotonically increasing until $1$ in the last $x$ $\longrightarrow$ this is called discrete CDF.
    • In continuous case, integrate PDF over $x$; the result is a continuous CDF.

Why PMF, PDF and CDF?

  1. PMF is preferred when
    • Probability at every $x$ value is interest of study. This makes sense when studying a discrete data - such as we interest to probability of getting certain number from a dice roll.
  2. PDF is preferred when
    • We wish to model a collected data with a continuous function, by using few parameters such as mean to speculate the population distribution.
  3. CDF is preferred when
    • Cumulative probability in a range is point of interest.
    • Especially in the case of continuous data, CDF much makes sense than PDF - e.g., probability of students' height less than $170$ cm (CDF) is much informative than the probability at exact $170$ cm (PDF).
12

The relation between the probability density funtion $f$ and the cumulative distribution function $F$ is $$ F(k) = \sum_{i \le k} f(i) $$ if $f$ is discrete and $$ F(x) = \int_{y \le x} f(y)\,dy $$ if $f$ is continuous.

  • 0
    what is meant by discrete and continuous?2012-07-27
  • 0
    @maximus if the variable ranges over a discrete or continuous set of values. So if you're rolling a die, you have $\{1,2,3,4,5,6\}$, which is discrete. If you're picking a random point on a line, then your set is, say, the interval $[0,L]$ which is continuous.2012-07-27
  • 0
    @maximus For example, when flipping a coin or rolling a dice the outcome is discrete whereas measuring the time until the bus arrives at a bus stop is continuous.2012-07-27
  • 0
    so discrete is when you can count it! and continuous is when there is much more probability in it? Is this description right or wrong? Pls correct me!2012-07-27
  • 0
    @maximus That's correct though you may have to count forever. Check out the concept of a countable set for an exact definition.2012-07-27