1
$\begingroup$

I know that there is a way to "redistribute of the frequencies of a variable" as stated here:

Dorian Pyle book chapter 7 section 2 paragraph 3 (7.2.3):

The easiest way to adjust distribution density is simply to displace the high-density points into the low-density areas until all points are at the mean density for the variable. Such a process ends up with a rectangular distribution.

Illustrating this passage are figures like these:

Figures 7.7 and 7.8

In both the references the problem is clear and solved but does not provide sufficient information for writing an algorithm and in particular does not say how to calculate the displacement (the way to move the frequencies on the left or right of the center of the distribution).

Please can you help me to understand it ??

  • 0
    however @did tha$n$k you very much for your time and help2012-09-14

1 Answers 1

1

Assume that the random variable $X$ may take $n$ values, say $P(X=k)=p_k$ for every integer $1\leqslant k\leqslant n$, with each $p_k\gt0$ and $\sum\limits_{k=1}^np_k=1$. Choose $q\gt0$ such that $p_k\geqslant q$ for every $k$ and consider the following algorithm.

Draw $X$, say the value you obtain is $k$. Draw $U$ uniform on $[0,1]$, say the value you obtain is $u$.

  • If $p_ku\leqslant q$, declare that $Y=k$ and that you won.
  • Otherwise, throw away $k$ and $u$ and start over with some new values of $X$ and $U$.

Continue until you are able to declare that you won.

The probability that a given step yields $Y$ is $nq$ hence the number of steps necessary to produce a value $Y$ is almost surely finite and geometrically distributed with mean $1/(nq)$. The result of this algorithm is a random value $Y$ such that $\mathrm P(Y=k)=1/n$ for every integer $1\leqslant k\leqslant n$.


Edit Let me add a reason why a randomized procedure is necessary here. Imagine to simplify things that $X$ takes two values, with $\mathrm P(X=1)=\frac13$ and $\mathrm P(X=2)=\frac23$. Thus, one starts from the histogram with two columns, one twice higher than the other, and one wants to reach the histogram with two columns of equal heights.

There is no way to decide that if $X=1$, then $Y=$ something and if $X=2$ then $Y=$ something else, which can produce equal probabilities. Roughly speaking, one must add a random choice to split the column $2$ of the histogram of $X$ into two parts with adequate heights. This is what the rejection algorithm explained above does.

In the present case, one would draw several copies of $X$ and keep all the copies equal to $1$ and roughly half of the copies equal to $2$. Thus, starting from $3n$ copies of the random variable $X$ which has uneven weights, one produces roughly $2n$ copies of the random variable $Y$ which has even weights.

  • 1
    @frank just one re$m$ark. If you ask the same question in 2 different places, please make a cross reference so that the others will also be aware of this before preparing an answer for you.2012-09-14