2
$\begingroup$

How can I calculate the inverse CDF for Zipf distribution?

This distribution has

$ \mathrm{pdf}(x) = \frac{1}{x^s \times H_{N,s}} $ $ \mathrm{cdf}(x) = \frac{H_{x,s}}{H_{N,s}} $

where $s \gt 0$ is the scale parameter, $H$ is the Harmonic number $H_{N,s} = \sum_{i=1}^N \frac{1}{i^s}$

  • 0
    So this is a restricted Zipf distribution supported on the set $\{1,\dots,N\}$.2011-07-25

1 Answers 1

4

The cdf is given by the formula you wrote only when $x \in \{1,\ldots,N\}$. If $x$ is, for example, $3.6$, then $\Pr(X \le x) = \Pr(X \le 3.6) = \Pr (X \le 3)$, so you have to put the greatest integer less than or equal to $x$ in place of $x$ in your formula.

Since this cdf has jump discontinuities, it doesn't have an inverse function in the usual sense. Maybe what you want is a function $G$ such that if $U$ is a uniformly distributed random variable on the interval $[0,1]$, then $G(U)$ is a random variable with the restricted Zipf distribution you're writing about. If the cdf $F$ is continuous, then $G = F^{-1}$ will serve in that role. But this one is not continuous, so instead you can do something like this:

If $0\le U < \frac{H_{1,s}}{H_{N,s}}$, then $G(U) = 1$

If $\frac{H_{1,s}}{H_{N,s}}\le U < \frac{H_{1,s}+H_{2,s}}{H_{N,s}}$ then $G(U)=2$

If $\frac{H_{1,s}+H_{2,s}}{H_{N,s}}\le U < \frac{H_{1,s}+H_{2,s}+H_{3,s}}{H_{N,s}}$ then $G(U)=3$

If $\frac{H_{1,s}+H_{2,s}+H_{3,s}}{H_{N,s}}\le U < \frac{H_{1,s}+H_{2,s}+H_{3,s}+H_{4,s}}{H_{N,s}}$ then $G(U)=4$

and so on (where $U$ is a uniformly distributed random varibale as above).

  • 0
    it works! this solves a big problem in my thesis, i love you!2011-07-26