1
$\begingroup$

I know that there is a way to "redistribute of the frequencies of a variable" as stated here:

Dorian Pyle book chapter 7 section 2 paragraph 3 (7.2.3):

The easiest way to adjust distribution density is simply to displace the high-density points into the low-density areas until all points are at the mean density for the variable. Such a process ends up with a rectangular distribution.

Illustrating this passage are figures like these:

Figures 7.7 and 7.8

In both the references the problem is clear and solved but does not provide sufficient information for writing an algorithm and in particular does not say how to calculate the displacement (the way to move the frequencies on the left or right of the center of the distribution).

Please can you help me to understand it ??

  • 0
    http://stats.stackexchange.com/questions/36001/variable-frequency-redistribution here another answer which same to be reasonable ..2012-09-14
  • 0
    however @did thank you very much for your time and help2012-09-14

1 Answers 1

1

Assume that the random variable $X$ may take $n$ values, say $P(X=k)=p_k$ for every integer $1\leqslant k\leqslant n$, with each $p_k\gt0$ and $\sum\limits_{k=1}^np_k=1$. Choose $q\gt0$ such that $p_k\geqslant q$ for every $k$ and consider the following algorithm.

Draw $X$, say the value you obtain is $k$. Draw $U$ uniform on $[0,1]$, say the value you obtain is $u$.

  • If $p_ku\leqslant q$, declare that $Y=k$ and that you won.
  • Otherwise, throw away $k$ and $u$ and start over with some new values of $X$ and $U$.

Continue until you are able to declare that you won.

The probability that a given step yields $Y$ is $nq$ hence the number of steps necessary to produce a value $Y$ is almost surely finite and geometrically distributed with mean $1/(nq)$. The result of this algorithm is a random value $Y$ such that $\mathrm P(Y=k)=1/n$ for every integer $1\leqslant k\leqslant n$.


Edit Let me add a reason why a randomized procedure is necessary here. Imagine to simplify things that $X$ takes two values, with $\mathrm P(X=1)=\frac13$ and $\mathrm P(X=2)=\frac23$. Thus, one starts from the histogram with two columns, one twice higher than the other, and one wants to reach the histogram with two columns of equal heights.

There is no way to decide that if $X=1$, then $Y=$ something and if $X=2$ then $Y=$ something else, which can produce equal probabilities. Roughly speaking, one must add a random choice to split the column $2$ of the histogram of $X$ into two parts with adequate heights. This is what the rejection algorithm explained above does.

In the present case, one would draw several copies of $X$ and keep all the copies equal to $1$ and roughly half of the copies equal to $2$. Thus, starting from $3n$ copies of the random variable $X$ which has uneven weights, one produces roughly $2n$ copies of the random variable $Y$ which has even weights.

  • 0
    I doubt that it is easily understandable for him after seeing his first link2012-09-13
  • 0
    @SeyhmusGüngören You mean, the algorithm itself, or the reason why the algorithm does what it does?2012-09-13
  • 0
    please can you tell me how do i calculate the values of the Displacement graph ? thank you2012-09-13
  • 0
    @did sorry for my noobnes in math.. but please can you just tell me when you write X at witch variabile you refer and p and so on ?2012-09-13
  • 0
    please just before accept your question just help me to clear it me in mind.. (i would like close it )2012-09-13
  • 0
    @did I think the algorithm side is okay. What it does? why it does? etc.. what is the motivation behind? such sort of things with less formulas and more explanations. I mean depends on the persons background of course. For frank perhaps it is easier in this way.2012-09-13
  • 0
    frank: Let us leave aside these considerations of *noobness* and the like, shall we? // There is no displacement graph in this answer, but a way to simulate random variables Y from (several) random variables X by accepting or rejecting these, according to the test p_kuq. // The X have the (uneven) distribution you start with. This is what P(X=k)=p_k means. The p_k are the heights of the columns in the histogram. // See the Edit in the answer. // Hope this helps.2012-09-13
  • 0
    here someone say a good way to get it ... however thank you ... http://stats.stackexchange.com/questions/36001/variable-frequency-redistribution2012-09-14
  • 1
    @frank just one remark. If you ask the same question in 2 different places, please make a cross reference so that the others will also be aware of this before preparing an answer for you.2012-09-14