3
$\begingroup$

I tried testing random integers for compliance with Benford's law, which they are apparently supposed to do. However, when I try doing this with Python,

map(lambda x:str(x)[0], [random.randint(0, 10000) for a in range(100000)]).count('1')

I get approximately equal frequencies for all leading digits. Why is this the case? Might it have something to do with how the pseudorandom number generator, the Mersenne twister, works?

  • 2
    Benford's law certainly does not assume that one chooses randomly the integers in $[1,10^n]$.2012-10-21
  • 0
    Therefore, this is the distribution expected if the mantissae of the logarithms of the numbers (but not the numbers themselves) are uniformly and randomly distributed. (Wikipedia)2012-10-21

4 Answers 4

1

Thank you for the answers, but I think I found a more explicit source of the error by reading the paper more carefully.

Benford's law does apply to random integers, but only as the upper and lower bounds go to infinity. The limit $\lim_{N\to \infty} P_N^1(1)$ (the proportion of integers from 1 to $N$ which have a leading digit of 1, as $N$ goes to infinity) diverges, and equals 1/9 at every $10^n, n\geq 2$, which explains my result. If I set the upper bound on the random integers to 12000, for example, I get different results.

  • 1
    Sure, and my answer mentioned that property of the range you were picking from, but you don't get Benford's law just by picking your finite bound carefully and continuing to draw uniformly.2012-10-21
  • 0
    What do you mean? Wouldn't it approach Bentford's law?2012-10-21
  • 0
    As you said yourself, the sequence $P^1_N$ diverges. In particular it oscillates forever, with the $\lim\sup$ too high for Benford and the $\lim\inf$ too low. That's why the author had to spend the next couple of pages constructing a sequence of limits of partial averages $P^m_N$ to finally derive the law.2012-10-21
  • 0
    Yes, I meant as the lower bound goes to infinity. Thank you!2012-10-21
6

The linked paper is titled unfortunately, at least as regards the current conception of the word "random." The whole point of Benford's law is precisely that it doesn't hold when integers are drawn uniformly from a range that, like yours, ends at a power of $10$: a well-designed pseudorandom number generator should give numbers with asymptotically exactly a $\frac{1}{10}$ chance of each leading digit $0,1,...,9$ in decimal notation.

Benford's law applies not to properly random sources of data, but to data produced by certain real-life random processes. One simple characterization of data that obey Benford's law is as those produced by processes that exhibit more-or-less exponential growth, such as populations. Such processes produce uniformly distributed logarithms of data points, but this uniform distribution gets splayed out into the characteristic Benford's law upon exponentiation.

  • 0
    I have a cryptographic (?) random number generator that i thought was obeying Benford's law because I did see a leading 1 more often than anything else. But checking with multiple sets of 100k, I see 1 @ ~52% and all other digits ~6%. Any ideas what kind of distribution would do that or if that's expected?2016-08-30
  • 1
    I think I have answered my own question. I was generating unsigned 64bit random integers, so my upper bound base10 is ~1.8x10^19. So ~44% of the numbers in this space are going to be greater than 1x10^19 and thus start with 1. That with the other numbers with fewer digits that start with 1 skewed my results. The quickest way to verify this was to switch to uint32 to see the distribution pile around 1,2,3 and drop off from there. If I cap the random numbers at some value of repeating 9's, I see the equivalent distribution you mentioned. This makes me feel more comfortable with my CRNG.2016-08-30
  • 0
    Yep, that sounds right.2016-08-30
4

Benford's law applies only to distributions that are scale-invariant and thus applies approximately to many real-life data sources, especially when we measure with arbitrary units: If the leading-digit distribution of a sample is essentially the same whether we measure in inches or centimeters, this is only possible if the logarithm is equidistributed (or approximately so over a range wide enough to cover several orders of magnitudes).

2

The simplest solution to approach the logarithmical distribution using mersenne twister is by using two dimensions of randomness. First the bandwidth for the amount of digits, say 15, secondly the bandwidth, say 10-100000. With more than 10,000 iterations the results shows an almost exact benford distribution.

Hope this helps.

Regards,

Dennes Spek