1
$\begingroup$

I have the problem to understand the following simple Maximum (Log) Likelihood example. Let $X$ be a discrete variable with domain $\{1,\dots,K\}$ and the discrete distribution is parametrized

$P(X=k;\pi) = \pi_k$

With parameters $\pi = (\pi_1,\dots,\pi_K)$ that are constrainted to fulfill $\sum_k \pi_k = 1$ and there is some data $D = \{x_i\}_{i = 1}^n$

What is the log likelihood $\mathcal{L}(\pi)$ of the data under the model?

I have applied the definition which gives me:

$\mathcal{L}(\pi) = \log P(x_{1:n};\pi) = \sum_{i = 1}^n \log P(x_i;\pi)$

At first I thought that it must sum up to 1 and the liklihood is 0, but this does not make sense, this the sum is over the data set which can have different occurences of different $X=k$ values, also the $\log$ is applied every time. The only thing I can think of is that $\mathcal{L}(\pi) \leq 0$ since there is no value which is $\geq$ 1.

  • 0
    Not quite leonbloy. The lieklihood is the joint probability density of the n iid random variables evaluated at their observed values given the parameter values. The MLE maximizes the likelihood and hence also the loglikelihood. So you would normally take partial deriviates wrt the parameters. The difference here is that there is one linear constraint and hence Lagrange multipliers are needed.2012-06-01

1 Answers 1

2

Suppose $K=3$ and you data consists of 5 samples ${\mathbb X} =\{ 1, 3, 1, 1, 2 \}$ The likelihood of this realization would be $P(X=1)P(X=3)P(X=1)P(X=1) P(X=2) = \pi_1 \pi_3\pi_1\pi_1\pi_2 = \pi_1^3 \pi_2\pi_3 $ Calling $n_i$ the number of samples with value $i$, this can be written in general as $\pi_1^{n_1} \pi_2^{n_2} \pi_3^{n_3}$

In general, then $\mathcal{L}(\pi) = \sum_{i = 1}^n \log \pi_{x_i} = \sum_{j = 1}^K \log \pi_{j}^{n_j} = \sum_{j = 1}^K {n_j} \log \pi_{j} $

Now, you must consider this as a function of $\pi=\{\pi_j\}$ ($n_j$ are given by the realization) and find the value of $\pi$ that maximizes this - subjected to the restriction that $\sum \pi_j=1$ and $\pi_j \ge 0$.

This is now a typical problem of multivariate Calculus (maximize a differentiable funcion of several variables subjected to a restriction given as another function) Lagrange multipliers is the standard method . Can you go on from here?

  • 0
    Yes, that's the de$f$inition o$f$ Maximum Likelihood (the (log)likelihood is a function that has the parameter as variable; the ML estimator is the value of the variable that maximizes the function). BTW, your result is correct, and it's good to understand that it is intuitively satisfactory.2012-06-01