11
$\begingroup$

A GUID (globally unique identifier) is a 32 character hexadecimal string:

http://en.wikipedia.org/wiki/Globally_Unique_Identifier

If you randomly generate 2, the chance of them being the same is incredibly small.

But what if you generate 1,000,000, what are the chances there is 1 or more duplicates in those 1,000,000?

What about 10,000,000, or 100,000,000 or even 1 billion? Each new GUID has a chance to match all those previously inserted into the set.

Graphs!

Thanks to Rawlings answer we have the following graphs:

enter image description here

enter image description here

enter image description here

1 Answers 1

11

Take a look at the Wikipedia article on the Birthday Problem.

In summary, if you have $n$ possible values (here, $2^{128}$) and you take $k$ values at random, there is probability

$$ \frac{k!{n \choose k}}{n^k} $$

of NOT having a collision.

(These are very large numbers to deal with, but that article has a section on approximations that might be useful.)

Here is an example of a graph of the probability of a GUID collision occurring against number of GUIDs generated, plotted using Wolfram Alpha and the second approximation suggested by Didier Plau below.

  • 1
    Your notation for binomial coefficients is potentially confusing.2011-04-18
  • 0
    Nice thanks! Is there some good graph tool I can use to plug that formula in, it'd be interesting to graph it like that wikipedia article but I'm a bit of a maths newbie2011-04-18
  • 0
    @Didier: yeah, let me just look up how to do that the other way around2011-04-18
  • 1
    {n\choose k} $ $2011-04-18
  • 1
    Try `${n \choose k}$` for ${n \choose k}$.2011-04-18
  • 0
    Hey, I like that! I've always done that with matrices and brackets.2011-04-18
  • 2
    In the range you are interested in, $k^2/n$ is very small hence good approximations are $1-k^2/(2n)$ or $\exp(-k^2/(2n))$.2011-04-18
  • 0
    I've put a link to a Wolfram Alpha plot of the second approximation, but could only get it to plot properly with k going up to 10^18, which I'm not sure is small enough for the approximation to still hold.2011-04-18
  • 0
    Thanks for that! Am I reading the graph correctly by saying if you have 1*10^18 keys generated (a lot!) there is a 0.0014% chance you will have a single collision?2011-04-18
  • 0
    It's not 0.0014% but 0.0014 out of 1, i.e. 0.14% chance. If I've got the underlying calculation correct, that's how to read it.2011-04-18
  • 2
    NB although a GUID is 128-bits long, not all of those are necessarily random. Microsoft's v4 GUIDs, for example, have 4 bits fixed. If I'm correctly interpreting Wikipedia, v1 GUIDs generated on the same computer would have only 76 bits of entropy.2011-04-18
  • 0
    @Peter Hm, that is a problem. Hopefully with an understanding of how the GUIDs are generated, one could derive an alternative value for n and continue to use the same formulae.2011-04-18
  • 0
    Suppose we want to estimate roughly how many GUIDs are needed to have a probability $p=\frac{1}{2}$ for a collision. With the notation of the Wikipedia article mentioned, we have $m=2^{128}$, and in the section _Square approximation_ they suggest the approximate formula $\sqrt{2mp} = \sqrt{m} = 2^{64} \approx 1.8\cdot 10^{19}$. So if you wait for code like `for (Int64 i = Int64.MinValue; i < Int64.MaxValue; ++i) { Guid.NewGuid(); }` to complete, there's a fair chance you will at one point have created a GUID that was created before. It will still take extremely long time to run that code!2013-09-26