It was mentioned in pLSA paper that perplexity refers to the log-averaged inverse probability on unseen data. Can any one give me the exact formula for calculating perplexity
what is "log-average"?
-
1Also at http://stats.stackexchange.com/questions/10302/what-is-perplexity – 2011-05-05
1 Answers
You have looked at the Wikipedia article on perplexity. It gives the perplexity of a discrete distribution as
$2^{-\sum_x p(x)\log_2 p(x)}$
which could also be written as
$\exp\left({\sum_x p(x)\log_e \frac{1}{p(x)}}\right)$
i.e. as a weighted geometric average of the inverses of the probabilities. For a continuous distribution, the sum would turn into a integral.
The article also gives a way of estimating perplexity for a model using $N$ pieces of test data
$2^{-\sum_{i=1}^N \frac{1}{N} \log_2 q(x_i)}$
which could also be written
$\exp\left(\frac{{\sum_{i=1}^N \log_e \left(\dfrac{1}{q(x_i)}\right)}}{N}\right) \text{ or } \sqrt[N]{\prod_{i=1}^N \frac{1}{q(x_i)}}$
or in a variety of other ways, and this should make it even clearer where "log-average inverse probability" comes from.