4
$\begingroup$

I found statement in an article "Good Practice in ( Pseudo ) Random Number Generation for Bioinformatics Applications" that you should not use too many random variables in a single simulation. Authors says that it maximum number of random values taken from PRNG should be $\frac{p}{1000}$ or even better $\frac{1}{200}\sqrt{p}$. $p$ is the period.

But I cannot see any references in other articles.

Do you know any reasons why not to use more values ?

  • 0
    An excellent reference for pseudo random number generators is the first Chapter of Volume 2 of "The Art of Computer Programming" by Donald Knuth. I would take a peek at it to see if if answers your question, but my copy is in my office and I am...in my pyjamas...2012-05-02
  • 3
    (Roughly speaking, say $x_1$ is the first random value you got given, then the way a PRNG works means that you will not get (or at least are very unlikely to get, depending on the generator you are using) given $x_1$ again. However, that isn't very random, is it? If it was truly random then $x_1$ would have the same likelihood of appearing at the $i^{th}$ iteration as every other number!)2012-05-02
  • 0
    @user1729: that's conceptually a good start. But the output of a PRNG can be much smaller than its internal state -which is related to the period, so it can return consecutive repeated values (think of a PRNG that gives you one bit in each try).2012-05-02

1 Answers 1

4

Assuming $p$ is the period of the PRNG, this is good advice, because after $p$ values are taken the PRNG will repeat.

To avoid the issue, just use a PRNG with a very large period. It will barely take $O(\log p)$ time to extract each pseudorandom bit, so you can make $p$ much larger than the number of values you will ever need to extract.

  • 0
    Yes by why to limit number of values taken to only fraction of period.2012-05-02
  • 0
    see user1729 second comment to the question: a number can't repeat twice in the period, so you can predict future value (or you can predict what value can not appear again).2012-05-02
  • 0
    I don't know why $\frac{1}{1000}$ of the period is suggested; also I don't know why it is better to limit to $O(\sqrt{p})$, but I guess that many values may give enough information to bust the hash and predict future values.2012-05-02
  • 0
    @Darqer: Assuming you are using a very simple PRNG (say, you're using multiplication mod $p$, $p$ prime) then for $x$ such that $0$i^{th}$ iteration either $x$ will have been picked already and so have zero probability of being picked or $x$ will have a one-in-($p-i$) chance of being picked. For this to be random every number should have a on-in-($p-1$) chance of being picked. So, as the number of iteration increase the randomness of the PRNG decreases. So, you should pick a point to stop. – 2012-05-02
  • 0
    Truly random numbers _do not have a period_. PRNGs do, so to ensure this doesn't negatively affect your simulations, you want to avoid too many repeats - or, ideally, avoid _any_ repeats! Most common PRNGs are very poor (e.g., the C `rand()` function). But there are lots of good quality generators out there, if you specifically look for one.2012-05-02
  • 0
    I'm using Mersenne Twister, I cannot understand why number cannot repeat on the output of generator for example MT generator has array of internal values which determines its state and base on these values consecutive values are generated, therefore there is a possibility of repetition of the value on the output.2012-05-02
  • 0
    With MT19937 you have nothing to worry about, since $p = 2^{19937-1}$ and you can't possibly be iterating it anything near $\sqrt{p}$ times. (If you want to be _really_ safe, use [SDLH](http://www.senderek.com/SDLH/).)2012-05-02