3
$\begingroup$

I was reading about hashing.
The oldest/standard approach is to use a prime number to produce the hash.

At first I couldn't get why use a prime when I came to this Why hash functions use primes:

Primes are unique numbers. They are unique in that, the product of a prime with any other number has the best chance of being unique (not as unique as the prime itself of-course) due to the fact that a prime is used to compose it. This property is used in hashing functions.

I am not sure I get completely the concept here.

Does it mean that multiplying a*b and c*d the posibility to produce the same result is smaller if one of these numbers e.g. a is a prime?

I was wondering how is this proven in an intuitive manner?
I mean I can understand abstractly the idea but can not formulate something intuitive to base the whole hashing upon this possibility as a property of primes.

Any help?

UPDATE:
Just to be clear: My main problem was why the prime numbers have always been the standard approach for use in a hash function.
An ideal hash function rarely has colisions due to the fact that it produces "unique" hash values of its input.
The blog I mention in the post seemed to give an explanation in my problem.
But if the blog is not correct then I still end up in my original problem.
What is the property of prime numbers (I am not refering to 31 because other primes have been used for hashing) that makes them a must for a hash function.
Also this is I guess also related to the fact that the table of the hashtable must be of prime size?

  • 0
    Also, it is simply wrong that "the table of the hashtable **must** be of prime size". You will not find any explanation of that "fact", because it is not true.2012-02-04

1 Answers 1

4

I think the blog post you link to is just confused. This talk about numbers being more or less "unique" does not appear to make mathematical sense.

And it is flat out wrong when the article claims that any prime -- explicitly including $2$ -- would work as well as $31$. If Java's String hash function multiplied successively by $2$ instead of by $31$, every character that was more than $32$ from the end of the string would be ignored, because $2^{32}=0$ in Java's int arithmetic.

The reason why $31$ is a workable multiplier is:

  1. It is is relatively prime to $2^{32}$ (i.e., it is odd), which makes it invertible modulo $2^{32}$.
  2. Its order in the multiplicative group modulo $2^{32}$ is "large enough".
  3. It doesn't have any small powers that happen to be small integers modulo $2^{32}$.
  4. Since $31=2^5-2^0$, a constant multiplication by $31$ can be implemented efficiently even on hardware that does not have a quick general multiplier available.

Simply being prime is unlikely to be decisive -- every odd number will equal some prime modulo $2^{32}$, thanks to Dirichlet's theorem on arithmetic progressions.

  • 0
    I update OP on this2012-02-04