1
$\begingroup$

If I map 1024 bits onto a 16 bit hash (just random numbers for conversation sake), I will have an average of 64 collisions per hash (assuming a good algorithm).

I wonder if anyone knows any hash algorithms around today in which I could take a hash and extrapolate the ~64 1024 bit original values?

I suppose that's not easy, but I don't know for sure.

My thought process is to see if I could tack an index number onto the end of the hash to enable me to extrapolate the original (perhaps at a significant cpu cost) later.

  • 0
    As Henning points out, this is significantly more difficult than your question makes it out to be. And I have to say.. this is a very good thing! If it wasn't, there would be very little point of cryptographic hashing! Now granted, cryptographic hashing is going to have much more than 16 bits, but even so, if the math in this post were correct, it would be devastating for the security community. Of course, if it were correct, and they'd gone the last 50 years without realizing it, I'd feel a little safer :-)2011-09-19

2 Answers 2

5

If your hash is only 16 bits, then on average there will be $2^{1008}$ different possible preimages for any hash value. It may be that you're only using 64 of those in practice, but you can't identify those from the hash alone, without using (very strong) additional information about the distribution of the actually occurring values.

  • 0
    I see the error in my logic now, thanks for pointing that out!2011-09-19
1

Theoretically, there is a set of values that get hashed to a particular hash value, so there is such a function taking hash values back to the unhashed $1024$ bit numbers. However, if the hash is a good one, it should be as hard as generating the hash values for all the $1024$ bit originals and recording the mapping. That is, it is possible, but should be impractical.

  • 0
    @Henning: granted, $16$ bits is not a strong hash, but putting that aside, if it is a good $16$ bit hash, generating the list of possible sources should be as hard as generating all $1024$ bit patterns and correlating all the $16$ bit hashes with their source.2011-09-20