In the pattern recognition and machine learning book by Christopher Bishop, page 73, eq:2.19, we have that for a given i.i.d data set $D$ that are realizations of a random variable $X$ whose probability density function is parameterized with a random variable $\mu$ \begin{equation} p(x|D) = \int_0^1 p(x|\mu)p(\mu|D)\,\textrm{d}\mu = \int_0^1 \mu p(\mu|D)\,\textrm{d}\mu \end{equation} I have been struggling to show the steps in getting the two equalities above. Would someone please help in concisely showing the intermediate steps with their assumptions?
Conditinal Probability Distributions
1
$\begingroup$
probability
probability-theory
probability-distributions
machine-learning
-
0Needs more information on exactly *how* the density function is parameterised with $\mu$. – 2017-02-28
1 Answers
0
Taking a quick look at the book, it seems to be because $x\sim \text{Bern}(\mu)$ and $\mu\sim \beta(a,b)$.
Then: \begin{align*} P(x=1|D) &= \int\limits_0^1 P(x=1|D,\mu) \, P(\mu|D)\,d\mu\\ &= \int\limits_0^1 P(x=1|\mu)\, P(\mu|D)\,d\mu\\ &= \int\limits_0^1 \mu P(\mu|D)\,d\mu\\[1mm] &= \mathbb{E}[\mu|D] \end{align*} where
- The first step uses the law of total probability (applied to a conditional probability)
- The second step is because $x$ only depends on $D$ through $\mu$ (by assumption)
- The third is because $x\sim \text{Bern}(\mu)$ and thus by the definition of the $\mu$ parameter
- The last step is by the definition of conditional expectation
Hopefully I read it correctly!