Sampling from a discrete pdf whose values $x_1,x_2,\ldots,x_n$ have probabilities $p_1,p_2,\ldots,p_n$ can be done with this simple Matlab function, which is the discrete version of the Inverse Transform method for generating random numbers from a given distibution.
function val = DiscrInvTrans1(x,p); cdf = p(1); % cdf is a scalar u = rand; k = 1; while u > cdf k = k+1; cdf = cdf + p(k); end val = x(k);
This builds the scalar cdf as it needs it. However this is for just one value. If this is to be used in a large sample of size $n_s$ then following function is better, where the cdf is calculated once by Matlab's cumsum function
function S = DiscrInvTransNs(x,p,ns); cdf = cumsum(p); % cdf is a vector for i = 1:ns u = rand; k = 1; while u > cdf(k) k = k+1; end S(i) = x(k); end
The number of times the while statement is executed is a random variable $N_w$ whose expectation is $E(N_w) = \sum_{k=1}^n kp_k$. This is the same for both functions. If the pdf is uniform then $p_k = 1/n$ for all $k$ and so $E(N_w) = \sum_{k=1}^n kp_k = n(n+1)/2n = (n+1)/2$. For a general pdf we have no nice closed-form expression for $N_w$ but this assertion seems to be true:
The value of $E(N_w) = \sum_{k=1}^n kp_k$ with $\sum_{k=1}^np_k = 1,\,$ is minimized if $p_1 \ge p_2 \ge \cdots \ge p_n$, and maximized if $p_1 \le p_2 \le \cdots \le p_n$.
I'm not a mathematician and have not been able to prove this assertion except for $n=2$. I've tested it in Matlab on random pdfs and it has always been true.
Can anyone give a proof, or disprove it with a counter-example?
I have asked this question elsewhere on this site (without the algorithmic preamble) so I hope this is not considered a double post.
Update -- The assertion is true. The proof is here: Minimize Expected Value