8
$\begingroup$

This paper gives a somewhat gentle introduction to Bayesian inference: http://www.miketipping.com/papers/met-mlbayes.pdf

I got to section 2.3 without much problems but got stuck in understanding that section onwards. It starts by presenting a probabilistic regression framework where the likelihood of all data is given as:

$ p(t|x,w,\sigma^2) = \prod_{n}p\left(t_n|x_n,w,\sigma^2\right) $ where $t_n=y(x_n;w)+\epsilon_n$ is the 'target' value. Next, given a set of parameters $w$ and a hyperparameter $\alpha$, the prior is given as: $ p(w|\alpha)=\prod_{m}\left(\frac{\alpha}{2\pi}\right)^{1/2}\exp\left({-\frac{\alpha}{2}w_m^2}\right) $

I can then compute the posterior $p\left(w|t,\alpha,\sigma^2\right)$. What I don't understand is the following:

  • In the first equation above, how should I interpret the product over the $N$ pairs of data $(t_n,x_n)$? Lets say I get two initial measurements from the real world, is $p\left(t|x,w,\sigma^2\right)$ supposed to give me a single real-valued probability? And how do I account for $w$ since it is not known yet?
  • As far as I got it, $w$ is supposed to be a vector of size $M$ where $w_i$ contains the $i$th estimated value. Now, how can a prior for $w$ have a reference to its own vector elements if I don't know them yet? Shouldn't a prior be an independent distribution such as a Gaussian or Beta? Also, shouldn't a prior be independent of hyperparameters?
  • Figure 4, on the article's page 8 has a plot from the prior and from the posteriors of an example using the $y=\sin(x)$ function with added Gaussian variance 0.2. How could I plot something similar in, say, Octave/Matlab or R?

I don't have a strong background in statistics so forgive me if this is too basic. Any help is appreciated.

Thanks in advance!

  • 0
    Yes, well as explained in the [wikipedia article on likelihood functions](http://en.wikipedia.org/wiki/Likelihood_function), it is merely a matter of perspective. I think anytime you see $p(\cdot)$ you should try to visualize a plot with probability on the Y-axis and parameters on the X-axis. You can either evaluate that function for a parameter value and return a probability, or you can view it as a function of the variables, ie. the whole plot.2012-12-04

1 Answers 1

2

First question:

The product is the joint probability of the sample, often also called the likelihood (see the footnote on page 5). Yes, it gives you a single probability. It is simply the individual probabilities multiplied together, since they are assumed independent. This equation is sort of like an intermediate step. From there on, they drop $x$ from the notation. Then they end up with equation (11), where this first equation is combined with a prior and the normalizing constant. This is sort of the essence of Bayesian inference: we don't know the parameter $w$, but we know that the data depends on it. Using Bayes' theorem, we can thus get a posterior distribution by having a prior distribution.

Second question:

The vector $\mathbf{w}=(w_1, w_2, \dots, w_M)$ does not contain estimates. It contains the random variables $w_1, w_2, \dots, w_M$, i.e. the parameters. Not sure how/where they reference themselves?