1
$\begingroup$

N urns are assigned m balls in a stochastic process based on a Pareto distribution. The process is as follows:

X is a Pareto random variable (xminimum = 1, alpha is a parameter) if X > N, throw the ball out Otherwise, put the ball in urn number floor(X)

Do this until m balls have been put in urns. Then rank the urns by number of balls.

What should the final distribution look like, as N and m approach infinity? Does it resemble any standard prob. distribution? Doing Monte Carlo simulations, it seems to be heavy tailed, but not a Pareto distribution. But I'm new to this - just getting my feet wet with stochastic processes.

Edit: Primary question is: What is prob. mass. function for balls in k-th urn (after ordering them). But I'm open to answers along the lines of: Look at it differently. Basic goal is - given a selection of choices, with balls assigned to them, but some choices a priori more popular, what is relationship between stochastic process and pmf of results? (And, yes, if you can't tell, I'm coming from an applied perspective... this question is a simplification of a model encountered assigning awards to favorites).

  • 0
    Maybe you want to throw out balls when X>N+1, otherwise, the $N$th urn will almost surely be empty.2011-02-28

2 Answers 2

2

Ignoring the reranking (as jorki says, it seems to makes things too complicated), I think you can say something like:

The probability that the any particular ball which goes into an urn turns out to go into the $k$th urn [$1 \le k < N $] is $P_k = \frac{1/k^{\alpha} - 1/(k+1)^{\alpha}}{1-1/N^{\alpha}}$

With $m$ balls in total in the urns, the probability that there are $m_k$ in the $k$th urn is $\Pr (M_k=m_k) = \binom{m}{m_k} P_k^{m_k} (1-P_k)^{m-m_k}$ i.e. a binomial distribution $B(m,P_k)$

Taking the distribution over the urns together, rather than individual urns, this is a multinomial distribution, with the parameters $m$ and $P_1, P_2, \ldots$

  • 0
    @Rashkolnikov - the rank order came up from the application. But I think I solved it! Can you please check my answer/2011-03-01
0

Got it! Unbelievable. The process of accumulating and then ranking is identical to simply computing the CDF, then reflecting over the y=x axis (taking the inverse)!!! P(Urn X has x balls) = M*Pareto(X). Let f(z) = $CDF^{-1}(M*Pareto(z))$. That means that (using frequentist terminology) n*z urns have less than f(z) balls.

So the final answer - to the original question without simplifying - is $n * CDF^{-1}(M*Pareto(x))$. (Or, to preserve the order I mentioned in the question, reflecting that over the y-axis).

I'm still a novice here, so can I ask you experts to verify or disprove this?

  • 0
    I see at least one mistake - P(Urn X has x balls) equals $M*Pareto^{-1}(x)$ (truncated to n).2011-03-01