3
$\begingroup$

Suppose I have a d-dimensional vector $x$ whose components $x_i$ are each i.i.d. Bernoulli with some $\theta_i$. $\theta$ is then the vector of parameters and $$ p(x|\theta) = \prod_{i=1}^d\theta_i^{x_i}(1-\theta_i)^{(1 - x_i)} $$ I have a dataset of vectors (all i.i.d) like $x$ - basically each vector is a black and white image where the pixels are either labelled 0 or 1. I want to estimate $\theta$ with the Maximum A Priori (MAP) method with a $Beta(2, 2)$ prior for each $\theta_i$.

So far what I have is this, ignoring normalization constants:

$$ p(\theta|x) \propto p(x|\theta)p(\theta) $$

$$ p(\theta|x) \propto \prod_{i=1}^d\theta_i^{x_i}(1-\theta_i)^{(1 - x_i)} \theta_i(1-\theta_i) $$

$$ p(\theta|x) \propto \prod_{i=1}^d\theta_i^{x_i +1}(1-\theta_i)^{(2 - x_i)} $$

The expression inside the product is another Beta distribution in disguise so $$ p(\theta|x) \propto \prod_{i=1}^d Beta(x_i + 2, 3 - x_i) $$

At this point I choose the mode for each individual Beta distribution, which is

$$\frac{x_i + 1}3$$ and so the MAP estimate is

$$p(\theta|x) \propto \prod_{i=1}^d \frac{x_i + 1}3$$

That's only using one vector from my dataset though. How do I extend it to include all the data? I don't know how to get the mode of a product of a product.

1 Answers 1

1

Let the dataset you have consists of $n$ i.i.d. $d$-dimensional vectors $x^{(1)},\ldots,x^{(n)}$. Then the posterior p.d.f. of $\theta$ given datas is proportional to $$p(\theta|x^{(1)},\ldots,x^{(n)}) \propto \prod_{k=1}^n\prod_{i=1}^d\theta_i^{x^{(k)}_i}(1-\theta_i)^{(1 - x^{(k)}_i)} \theta_i(1-\theta_i). $$ Switching inner and outer products, one get: $$p(\theta|x^{(1)},\ldots,x^{(n)}) \propto \prod_{i=1}^d\theta_i^{\left(\sum_{k=1}^n x^{(k)}_i +1\right)}(1-\theta_i)^{\left(n+1 - \sum_{k=1}^n x^{(k)}_i\right)}.$$ The expression inside the product is another Beta distribution: $$p(\theta|x) \propto \prod_{i=1}^d Beta\left(\sum_{k=1}^n x^{(k)}_i + 2, n+2 - \sum_{k=1}^n x^{(k)}_i\right).$$ Again choose the mode for each individual Beta distribution, which is $$ \dfrac{\sum_{k=1}^n x^{(k)}_i +1}{n+2} $$ and the MAP estimate is a vector of estimates. $$ \hat\theta_{\text{MAP}} =\left(\dfrac{\sum_{k=1}^n x^{(k)}_1 +1}{n+2},\ldots,\dfrac{\sum_{k=1}^n x^{(k)}_d +1}{n+2}\right). $$

In your experiment the coordinates of each data vector $x^{(k)}$ are independent, $\theta_i$ seems to be independent too, and therefore the collection of datas is just $d$ one-dimensional samples:

$x^{(1)}_1,\ldots,\,x^{(n)}_1$ - first coordinate of vectors in dataset,

$x^{(1)}_2,\ldots,\,x^{(n)}_2$ - 2nd coordinate of vectors in dataset,

and so on. Therefore you can estimate each coordinate of vector $\theta$ separately. Technically it would mean that we remove the inner product over $i=1,\ldots,d$ in all formulas and find ${\hat\theta_{\text{MAP}}}_i$.