The image included below is about Bayesian statistics. While looking at the lecture, the lecturer expressed the probability distribution of prior probability as a uniform distribution. Somehow, I feel like I have difficulty in interpreting the X-axis and Y-axis. Can somebody explain what the image tells?
How can I interpret the image for prior probability?
2 Answers
The figure drawn is a pdf, probability density function, of random variable $\theta$ with uniform distribution on $[0,1]$. It is given by $$f(\theta)=\begin{cases}1&\text{ if }\theta\in[0,1]\\0&\text{ otherwise}\end{cases}$$
Loosely speaking, uniform distribution on interval $[a,b]$, $\mathcal{U}[a,b]$, is a distribution of random variable where each value from $[a,b]$ is equally likely.
-
0Right to the point (+1). My longer answer is in case the idea of 'prior' is also in question. – 2017-02-02
-
0Helps me picture the idea better. Thank you. – 2017-02-03
Bayesian framework. In doing Bayesian inference on the success probability $\theta \in (0,1)$ of $n$ Bernoulli trials, one has the likelihood function $p(x | \theta) \propto \theta^x (1 - \theta)^{n-x},$ where $x$ is the number of observed successes in $n$ trials. For observed $x$, the likelihood function is considered as a function of $\theta.$ (The proportionality symbol $\propto$ is used instead of $=$ because it is often unnecessary to consider the constant ${n \choose k}.$)
Prior distribution. A Bayesian inferential framework requires a prior distribution. For inference about Binomial $\theta$ it is common to select a member of the Beta family of distributions because they have support $(0, 1).$
In practice, sometimes one has no useful prior knowledge about $\theta.$ In that case, one chooses a 'flat' or 'non-informative' prior distribution. A common choice is $\mathsf{Beta}(\alpha_0=1,\beta_0=1) \equiv \mathsf{Unif}(0,1).$ This must be the choice made in the lecture you mention.
Posterior distribution. Then, according to the the continuous version of Bayes' Rule, often written as $$\mathrm{POSTERIOR} \propto \mathrm{PRIOR} \times \mathrm{LIKELIHOOD},$$
one has, as the posterior distribution of $\theta,$ $$p(\theta | x) \propto p(\theta)\times p(x|\theta) \propto \theta^{\alpha_0 - 1}(1-\theta)^{\beta_0 - 1} \times \theta^x(1-\theta)^{n-x}\\ \propto \theta^{\alpha_0+x+1}(1-\theta)^{\beta_0 + n - x + 1} = \theta^{\alpha_n - 1}(1 - \theta)^{\beta_n - 1}.$$ In the final expression in the display one sees the kernel of $\mathsf{Beta}(\alpha_n,\beta_n).$
Illustration. For example, if a trustworthy new poll shows $x = 735$ in favor of a Candidate out of $n = 1000$ subjects interviewed, and if we use the flat prior distribution $\mathsf{Beta}(1,1),$ then the posterior distribution is $\mathsf{Beta}(\alpha_n=736, \beta_n = 266).$ Cutting 2.5% of the probability from each tail of the posterior distribution, one would have a 95% Bayesian probability interval estimate $(0.707, 0.761)$ for the population proportion in favor of the Candidate (computed in R statistical software below).
qbeta(c(.025,.975),736,266)
## 0.7067679 0.7614072
Note: By comparison, with these polling data a traditional frequentist 95% confidence interval of the form $\hat \theta \pm 1.96\sqrt{\hat \theta(1 - \hat\theta)/n},$ where $\hat \theta = x/n = 0.735,$ would be $(0.708,0.763).$
Informative prior. By contrast, previous polls and prior experience might lead someone to choose the prior $\mathsf{Beta}(800,200).$ That would be roughly equivalent to believing (in advance of seeing the new poll) that the proportion in favor is very likely to be in the interval $0.80 \pm 0.03.$ Then, with the same new polling data as above, a 95% posterior probability interval would be $(0.748, 0.785).$ This posterior distribution melds prior opinion and new polling data to give a probability interval that is somewhat higher (centered near 0.77 instead of 0.74) and narrower (width about 0.037 instead of 0.054).
-
0Thank you for the kind explanations, it also helps me understand better. – 2017-02-03
