There many literature that is talking about the posterior binomial distribution. I understand the concept but this is what is confused me:
In general the posterior probability is given by
P(theta | x) = P(x | theta) * P( theta ) / P(x) ... (1)
where P(theta) is the prior distribution of theta. If you have a look on the pattern recognition and machine learning book at page 71 when he is discussing the posterior distribution of binomial R.V . You will see that he is using the annotation like this Beta(mu | a,b) to denote P(theta).
Now, the posterior probability for the Binomial Distribution, is the product of binomial and it conjugate probability; the Beta distribution. I understand the intuitive sense but what is confused me the annotation. According to equation (1) the prior probability is not a conditional probability, but in that book, it's written as Beta(mu | a,b). So, according to this book the equation 1 became
P (mu | m,l,a,b) = P(m | N,mu) * P (mu | a,b) / X.. (2)
Where N = m + l which is the number of samples in sample space and a and b are fictitious samples. X is a proportion that is not related to mu, so we can ignore them when we want to optimize this equation according to mu.
How can I relate equation (2) to the general formula in equation (1)?