0
$\begingroup$

I am here asking a specific question as it relates to a programming project at work. Specifically, I have a file that a user can create which contains a large number of fields. Four of these fields are random numbers - they must be randomly generated. These four fields can also be repeated any number of times in a particular file. So for all intents and purposes let's explain like this.

--File-- 1. Randomly Generated Field 1 (9 Digits, repeats 90,000 times) 2. Randomly Generated Field 2 (9 Digits, repeats 90,000 times) 3. Randomly Generated Field 3 (9 Digits, repeats 90,000 times) 4. Randomly Generated Field 4 (9 Digits, repeats 90,000 times).

For this project it's critical that none of the above mentioned fields are the same, so what we are essentially looking at is:

360k fields that must ALL be different, each populated from a random 9 digit number. Unfortunately the coding that I have doesn't generate numbers with 0's intertwined (so no 000000001 for example) so I believe that we have a

360k / 999,999,999 odds of repeating the same number? I am very bad with math. I would really appreciate some feedback in getting an actual figure of how likely the same number is to appear twice so I can decide how to proceed.

  • 1
    This feels like it might be treated as a [generalisation of the birthday problem](https://en.wikipedia.org/wiki/Birthday_problem#Other_birthday_problems).2017-02-06
  • 0
    @alexwlchan Maybe, if $9$-digit-blocks are meant this is exactly what is needed.2017-02-06
  • 1
    I believe it's 9 digit blocks. And yes I'm basically asking if I repeat the same 9 digit block 360,000 times what are the odds that the same number will repeat? I want to determine if I need to implement advanced logic with the random numbers.2017-02-06
  • 0
    If as you say distinct numbers is a "critical" requirement then you may have to code for that even if the probability of a repeat is low.2017-02-06
  • 0
    The random generator will simply generate a number that is BETWEEN 1 and 999,999,999.2017-02-06
  • 0
    Well this functionality (that number of repeats) will only be used on rare occasions. Typically the fields only repeat a few times, 500 at the most. Very rare times some numbers such as 90k are required so if the odds are 1 in 2000 it would be fine - if there is a repeat the process can just be started over. But if it's 1 in 20 or 1 in 100 I will have to change the code.2017-02-06

1 Answers 1

2

If we have $N$ objects and randomly take an object $M$ times, the probability that no object was taken more than once is

$$\frac{N!}{(N-M)!\cdot N^M}$$

In your case, $N$ and $M$ are very large, so we can use Stirling-approximation. We have $N=9\cdot 10^8$ and $M=3.6\cdot 10^5$

The probability that no collision occurs is about $5\cdot 10^{-32}$, so it is virtually impossible to avoid a collision. Even, if we allow that a block starts with a zero, the probability is still only about $7\cdot 10^{-29}$. So, randomly chosen blocks won't do the job.

If you only require that the FIRST block is not repeated, then you have much better chances, but if I understand the question right, this is not the case.

  • 0
    Thanks, that is what I was afraid of.2017-02-06