1
$\begingroup$

In the pattern recognition and machine learning book by Christopher Bishop, page 73, eq:2.19, we have that for a given i.i.d data set $D$ that are realizations of a random variable $X$ whose probability density function is parameterized with a random variable $\mu$ \begin{equation} p(x|D) = \int_0^1 p(x|\mu)p(\mu|D)\,\textrm{d}\mu = \int_0^1 \mu p(\mu|D)\,\textrm{d}\mu \end{equation} I have been struggling to show the steps in getting the two equalities above. Would someone please help in concisely showing the intermediate steps with their assumptions?

  • 0
    Needs more information on exactly *how* the density function is parameterised with $\mu$.2017-02-28

1 Answers 1

0

Taking a quick look at the book, it seems to be because $x\sim \text{Bern}(\mu)$ and $\mu\sim \beta(a,b)$.

Then: \begin{align*} P(x=1|D) &= \int\limits_0^1 P(x=1|D,\mu) \, P(\mu|D)\,d\mu\\ &= \int\limits_0^1 P(x=1|\mu)\, P(\mu|D)\,d\mu\\ &= \int\limits_0^1 \mu P(\mu|D)\,d\mu\\[1mm] &= \mathbb{E}[\mu|D] \end{align*} where

  • The first step uses the law of total probability (applied to a conditional probability)
  • The second step is because $x$ only depends on $D$ through $\mu$ (by assumption)
  • The third is because $x\sim \text{Bern}(\mu)$ and thus by the definition of the $\mu$ parameter
  • The last step is by the definition of conditional expectation

Hopefully I read it correctly!