-1
$\begingroup$

CRC32 has hash collisions for the inputs "plumless" and "buckeroo". What's the smallest data length for collisions in MD4, MD5, SHA-1, and the recently accepted SHA-3 (Keccak)? We know that lim_len n->infty p(collision) = 1.0, but I'm curious how large your data has to be before a particular hash algorithm actually experiences a collisions.

Can MD4 collide on blocks less than 100 byes? 1000? 10^10? 10^20? ...

1 Answers 1

1

The range of MD4 is 128 bits or 16 bytes. That means that it necessarily collides on inputs of length 17 bytes or less. Heuristically, we expect it to collide on inputs of length 9 bytes or less (assuming it approximates a random function).

  • 0
    Thanks! I was curious because a lot of people are saying that rsync is *sufficiently* accurate, but I'm starting to think that hash-based syncing is quite risky.2012-10-03
  • 0
    The probability of collision is rather small. If you generate, say, $2^{50}$ random hashes, then it is very unlikely that there will be any collisions.2012-10-04
  • 0
    True, p() is small. But for a given colliding piece of data, hash-based sync software will be unable to sync it.2012-10-04
  • 0
    Hash-based sync software is an engineering solution - it works and it's efficient. The only other solution is to compare a (losslessly) compressed version of the complete files, or a complete overhaul of the system.2012-10-13