1
$\begingroup$

I have the problem to understand the following simple Maximum (Log) Likelihood example. Let $X$ be a discrete variable with domain $\{1,\dots,K\}$ and the discrete distribution is parametrized

$$P(X=k;\pi) = \pi_k$$

With parameters $\pi = (\pi_1,\dots,\pi_K)$ that are constrainted to fulfill $\sum_k \pi_k = 1$ and there is some data $D = \{x_i\}_{i = 1}^n$

What is the log likelihood $\mathcal{L}(\pi)$ of the data under the model?

I have applied the definition which gives me:

$$\mathcal{L}(\pi) = \log P(x_{1:n};\pi) = \sum_{i = 1}^n \log P(x_i;\pi)$$

At first I thought that it must sum up to 1 and the liklihood is 0, but this does not make sense, this the sum is over the data set which can have different occurences of different $X=k$ values, also the $\log$ is applied every time. The only thing I can think of is that $$\mathcal{L}(\pi) \leq 0$$ since there is no value which is $\geq$ 1.

  • 0
    Your likelihood is correct to begin with, but you must now write it explicitly. Hint: when you are at a loss, try with some concrete example. Say, take K=2, an n=5, and write down the likelihood.2012-05-31
  • 0
    @leonbloy Does this involve making use of the Lagrange multiplier?2012-05-31
  • 0
    Solving some concrete examples just leads me to likelihood with a negative result, but I lack the idea to generalize this.2012-05-31
  • 0
    It is not the _likelihood_ that is negative, it is the **log likelihood** that is working out to be negative.2012-05-31
  • 0
    It does not matter at all that the log-likelihood is negative, the log-lilelihood is not a probability density (not even the likelihood is), you just want to consider it as a function with the parameter/s as variable, and find its maximum2012-05-31
  • 0
    Not quite leonbloy. The lieklihood is the joint probability density of the n iid random variables evaluated at their observed values given the parameter values. The MLE maximizes the likelihood and hence also the loglikelihood. So you would normally take partial deriviates wrt the parameters. The difference here is that there is one linear constraint and hence Lagrange multipliers are needed.2012-06-01

1 Answers 1

2

Suppose $K=3$ and you data consists of 5 samples ${\mathbb X} =\{ 1, 3, 1, 1, 2 \}$ The likelihood of this realization would be $P(X=1)P(X=3)P(X=1)P(X=1) P(X=2) = \pi_1 \pi_3\pi_1\pi_1\pi_2 = \pi_1^3 \pi_2\pi_3 $ Calling $n_i$ the number of samples with value $i$, this can be written in general as $\pi_1^{n_1} \pi_2^{n_2} \pi_3^{n_3}$

In general, then $$\mathcal{L}(\pi) = \sum_{i = 1}^n \log \pi_{x_i} = \sum_{j = 1}^K \log \pi_{j}^{n_j} = \sum_{j = 1}^K {n_j} \log \pi_{j} $$

Now, you must consider this as a function of $\pi=\{\pi_j\}$ ($n_j$ are given by the realization) and find the value of $\pi$ that maximizes this - subjected to the restriction that $\sum \pi_j=1$ and $\pi_j \ge 0$.

This is now a typical problem of multivariate Calculus (maximize a differentiable funcion of several variables subjected to a restriction given as another function) Lagrange multipliers is the standard method . Can you go on from here?

  • 0
    Thanks a lot these were many hints, but they helped me to understand the problem at least in terms of the notation. I applied Lagrange multiplier and my result is $\pi_{k}^{ML} = \frac{m_k}{N}$, where $N$ is the size of the given data.2012-06-01
  • 0
    Anyway, can be said, that $\mathcal{L}(\pi)$ is the same as $\pi_k^{ML}$ ?2012-06-01
  • 0
    I meant is $\pi_k^{ML}$ the result for maximizing $\mathcal{L}(\pi)$? I forgot to say, that my $m_k$ is your $n_i$2012-06-01
  • 0
    Yes, that's the definition of Maximum Likelihood (the (log)likelihood is a function that has the parameter as variable; the ML estimator is the value of the variable that maximizes the function). BTW, your result is correct, and it's good to understand that it is intuitively satisfactory.2012-06-01