CRYPTO-GRAM, March 15, 2005
Bruce Schneier, <http://www.schneier.com>


                  SHA-1 Broken

SHA-1 has been broken.  Not a reduced-round version. Not a simplified 
version.  The real thing.

The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu 
(mostly from Shandong University in China) have been quietly 
circulating a paper describing their results:

	collisions in the full SHA-1 in 2**69 hash operations, much less than 
        the brute-force attack of 2**80 operations based on the hash length.

	collisions in SHA-0 in 2**39 operations.

	collisions in 58-round SHA-1 in 2**33 operations.

This attack builds on previous attacks on SHA-0 and SHA-1, and is a 
major, major cryptanalytic result: the first attack faster than 
brute-force against SHA-1.

I wrote about SHA, and the need to replace it, last September.  Aside 
from the details of the new attack, everything I said then still 
stands.  I'll quote from that article, adding new material where 
appropriate.

"One-way hash functions are a cryptographic construct used in many 
applications.  They are used in conjunction with public-key algorithms 
for both encryption and digital signatures.  They are used in integrity 
checking. They are used in authentication.  They have all sorts of 
applications in a great many different protocols.  Much more than 
encryption algorithms, one-way hash functions are the workhorses of 
modern cryptography.

"In 1990, Ron Rivest invented the hash function MD4.  In 1992, he 
improved on MD4 and developed another hash function: MD5.  In 1993, the 
National Security Agency published a hash function very similar to MD5, 
called SHA (Secure Hash Algorithm).  Then, in 1995, citing a newly 
discovered weakness that it refused to elaborate on, the NSA made a 
change to SHA.  The new algorithm was called SHA-1.  Today, the most 
popular hash function is SHA-1, with MD5 still being used in older 
applications.

"One-way hash functions are supposed to have two properties.  One, 
they're one way.  This means that it is easy to take a message and 
compute the hash value, but it's impossible to take a hash value and 
recreate the original message.  (By 'impossible' I mean 'can't be done 
in any reasonable amount of time.')  Two, they're collision free.  This 
means that it is impossible to find two messages that hash to the same 
hash value.  The cryptographic reasoning behind these two properties is 
subtle, and I invite curious readers to learn more in my book Applied 
Cryptography.

"Breaking a hash function means showing that either -- or both -- of 
those properties are not true."

Last month, three Chinese cryptographers showed that SHA-1 is not 
collision-free.  That is, they developed an algorithm for finding 
collisions faster than brute force.

SHA-1 produces a 160-bit hash.  That is, every message hashes down to a 
160-bit number. Given that there are an infinite number of messages 
that hash to each possible value, there are an infinite number of 
possible collisions.  But because the number of possible hashes is so 
large, the odds of finding one by chance is negligibly small (one in 
2^80, to be exact).  If you hashed 2^80 random messages, you'd find one 
pair that hashed to the same value.  That's the "brute force" way of 
finding collisions, and it depends solely on the length of the hash 
value. "Breaking" the hash function means being able to find collisions 
faster than that. And that's what the Chinese did.

They can find collisions in SHA-1 in 2^69 calculations, about 2,000 
times faster than brute force.  Right now, that is just on the far edge 
of feasibility with current technology.  Two comparable massive 
computations illustrate that point.

In 1999, a group of cryptographers built a DES cracker.  It was able to 
perform 2^56 DES operations in 56 hours.  The machine cost $250K to 
build, although duplicates could be made in the $50K-$75K 
range.  Extrapolating that machine using Moore's Law, a similar machine 
built today could perform 2^60 calculations in 56 hours, and 2^69 
calculations in three and a quarter years.  Or, a machine that cost 
$25M-$38M could do 2^69 calculations in the same 56 hours.

On the software side, the main comparable is a 2^64 keysearch done by 
distributed.net that finished in 2002.  One article put it this way: 
"Over the course of the competition, some 331,252 users participated by 
allowing their unused processor cycles to be used for key discovery. 
After 1,757 days (4.81 years), a participant in Japan discovered the 
winning key."  Moore's Law means that today the calculation would have 
taken one quarter the time -- or have required one quarter the number 
of computers -- so today a 2^69 computation would take eight times as 
long, or require eight times the computers.

"The magnitude of these results depends on who you are.  If you're a 
cryptographer, this is a huge deal.  While not revolutionary, these 
results are substantial advances in the field.  The techniques 
described by the researchers are likely to have other applications, and 
we'll be better able to design secure systems as a result.  This is how 
the science of cryptography advances: we learn how to design new 
algorithms by breaking other algorithms.  Additionally, algorithms from 
the NSA are considered a sort of alien technology: they come from a 
superior race with no explanations.  Any successful cryptanalysis 
against an NSA algorithm is an interesting data point in the eternal 
question of how good they really are in there."

For the average Internet user, this news is not a cause for panic.  No 
one is going to be breaking digital signatures or reading encrypted 
messages anytime soon.  The electronic world is no less secure after 
these announcements than it was before.

But there's an old saying inside the NSA: "Attacks always get better; 
they never get worse."  Just as this week's attack builds on other 
papers describing attacks against simplified versions of SHA-1, SHA-0, 
MD4, and MD5, other researchers will build on this result.  The attack 
against SHA-1 will continue to improve, as others read about it and 
develop faster tricks, optimizations, etc.  And Moore's Law will 
continue to march forward, making even the existing attack faster and 
more affordable.

Jon Callas, PGP's CTO, put it best: "It's time to walk, but not run, to 
the fire exits.  You don't see smoke, but the fire alarms have gone 
off."  That's basically what I said last August.

"It's time for us all to migrate away from SHA-1.

"Luckily, there are alternatives. The National Institute of Standards 
and Technology already has standards for longer -- and harder to break 
-- hash functions: SHA-224, SHA-256, SHA-384, and SHA-512.  They're 
already government standards, and can already be used.  This is a good 
stopgap, but I'd like to see more.

"I'd like to see NIST orchestrate a worldwide competition for a new 
hash function, like they did for the new encryption algorithm, AES, to 
replace DES.  NIST should issue a call for algorithms, and conduct a 
series of analysis rounds, where the community analyzes the various 
proposals with the intent of establishing a new standard.

"Most of the hash functions we have, and all the ones in widespread 
use, are based on the general principles of MD4.  Clearly we've learned 
a lot about hash functions in the past decade, and I think we can start 
applying that knowledge to create something even more secure."

Hash functions are the least-well-understood cryptographic primitive, 
and hashing techniques are much less developed than encryption 
techniques.  Regularly there are surprising cryptographic results in 
hashing.  I have a paper, written with John Kelsey, that describes an 
algorithm to find second preimages with SHA-1 -- a technique that 
generalizes to almost all other hash functions -- in 2^106 
calculations: much less than the 2^160 calculations for brute 
force.  This attack is completely theoretical and not even remotely 
practical, but it demonstrates that we still have a lot to learn about 
hashing.

It is clear from rereading what I wrote last September that I expected 
this to happen, but not nearly this quickly and not nearly this 
impressively.  The Chinese cryptographers deserve a lot of credit for 
their work, and we need to get to work replacing SHA.

Summary of the paper (the full paper isn't generally available yet):
<http://theory.csail.mit.edu/~yiqun/shanote.pdf>

My original essay:
<http://www.schneier.com/essay-074.html>

NIST standard for SHA-224, SHA-256, SHA-384, and SHA-512:
<http://csrc.nist.gov/CryptoToolkit/tkhash.html>

My second-preimages paper:
<http://eprint.iacr.org/2004/304>

More hash function news:
Two X-509 certificates with identical MD5 hashes:
<http://www.win.tue.nl/~bdeweger/CollidingCertificates/>
Faster MD5 collisions (eight hours on 1.6 GHz computer):
<http://cryptography.hyperlink.cz/md5/MD5_collisions.pdf>