COMPUTER SCIENCE: Cryptologists Cook Up Some Hash for New 'Bake-Off'

doi:10.1126/science.319.5869.1480

Account Information

UNC Chapel Hill (TRLN) Alerts | Access Rights | My Account | Sign In

Site Navigation

Article Views

Summary
Full Text (HTML)
Full Text (PDF)

Article Tools

My Science

Science 14 March 2008:
Vol. 319. no. 5869, pp. 1480 - 1481
DOI: 10.1126/science.319.5869.1480

Prev | Table of Contents | Next

News Focus

COMPUTER SCIENCE:
Cryptologists Cook Up Some Hash for New 'Bake-Off'

Dana Mackenzie*

A worldwide competition aims to keep the algorithms for authenticating electronic documents a jump ahead of forgers' ability to defeat them

In November, three Dutch cryptologists published on their Web site a "digital fingerprint" of their prediction of the winner of the United States's 2008 presidential election. According to them, the next commander in chief will be 3D515DEAD7AA16560ABA3E9DF05 CBC80.

After the election, these modern-day Nostradamuses will use a simple mathematical procedure called a hash function, which is available on nearly all computers, to show that a PDF document created before the election has that hexadecimal number as its digital fingerprint. This will prove that they knew the winner all along, because the PDF document could not have been altered after the election. Any such change would make its fingerprint, known as a hash value, no longer match the one that has been published.

But it's all a trick. The "Nostradamus attack" by Marc Stevens, Arjen Lenstra, and Benne de Weger is designed to highlight a serious problem in cryptology: The so-called hash functions that many of the world's computers use for authenticating documents are dangerously out of date. The cryptologists prepared 12 separate documents, one saying that John McCain will win, one naming Hillary Clinton, and one even predicting Paris Hilton. By carefully tweaking the contents of each PDF document, in a way not obviously noticeable, they made it so each one generates the same digital fingerprint, computed by a widely used hash function called MD5. As the Dutch effort shows, the ability to produce multiple documents with the same fingerprint renders MD5 useless for authentication.

Without the ability to authenticate files, such as passwords and online transactions, Internet commerce would be seriously threatened. Therefore, the U.S. National Institute of Standards and Technology (NIST) in November announced a worldwide competition to select a new standard for hash functions, which is expected to conclude by 2012. The winner will be certified for U.S. government use, and if past history is any guide, that will make it a de facto standard for the rest of the world.

NIST held a similar "bake-off " (as some cryptologists called it) from 1997 to 2000 to select a new standard cipher for government use, called the Advanced Encryption Standard (AES). "The AES competition was the most fun I've ever had in cryptography," says Bruce Schneier of BT Counterpane in Santa Clara, California, who designed one of the five AES finalists and who plans to enter the hash-function competition as well. "Think of it as a giant cryptographic demolition derby: A bunch of us put our best work into the ring, and then we beat on each other until there was only one standing. … I personally learned an enormous amount about [cipher design] from the AES competition, and we as a community benefited immeasurably."

Like the previous competition, the new bake-off offers no financial reward, and the submissions must be unpatented so that they can be incorporated into any software. The real payoff to the winner will be prestige. The AES competition drew 15 entries, and the hash-function competition is expected to attract as many as 50.

Although the contests will be similar, the products could hardly be more different. A cipher, like AES, encrypts data in such a way that it can be recovered but only by someone with a key. Hash functions, on the other hand, are not meant to be reversible, and they have no secret key. They merely show that a document is what it claims to be. They tend to be much simpler than ciphers and have a much broader range of applications--so many that they are often called the "duct tape" or "Swiss army knife" of cryptology.

Hash functions work by converting any string or message of 1s and 0s to a new string, usually much shorter. For example, the function might take a gigabyte MPEG movie file and boil it down to anywhere from 128 to 512 bits. In the current government standard, called SHA-1, the output, known as the hash, hash value, or hash sum, is 160 bits, whereas for MD5 it is 128 bits. The hash should look more or less random so that no one can guess what the original message said. On the other hand, it should be generated in a completely deterministic manner so that anybody who knows the hash function used can have a computer verify that the hash value matches the hashed file's contents. In addition, it should be staggeringly unlikely that any other document hashes to the same value. When that happens, cryptographers call it a "collision."

Compressing gigabyte-sized files to 160 bits inevitably results in many collisions. But the key point is that it is virtually impossible to find one by random search. That is because there are so many possible hash values--2¹⁶⁰ of them for a 160-bit hash sum. For such a hash value, the number of movies you would have to hash before generating the same hash twice would be roughly 2⁸⁰, or a trillion a year for a billion years. Not even Bollywood is that prolific.

However, a hash function can be defeated if it is possible to deliberately create two colliding documents. If this takes more than 2⁸⁰ attempts, then the hash function is considered secure, because such an attack is no better than random guessing. However, if a collision can be found in less than 2⁶⁴ tries, then a supercomputer, or a very large network of personal computers, might be able to do it in a year. If it takes less than 2³² tries, one can do it on a PlayStation in a few minutes--as the Dutch "Nostradamuses" did.

The current standard, SHA-1, borrows its basic architecture from MD5, which was invented in 1992. You feed your message into a device that compresses and randomizes 512 bits at a time. You add another piece of your file to the first piece and feed it through again, and you repeat this procedure until you run out of message.

Iterative algorithms, like MD5 and SHA-1, are very easy to program and quick to run. But recently, they have come under heavy attack from cryptologists. In 2004 and 2005, cryptologist Xiaoyun Wang of Shandong University in China showed that MD5 could be cracked in fewer than 2⁴⁰ steps, and SHA-1 in fewer than 2⁶⁴. No one has produced an actual collision in SHA-1, but a search, using the spare time of many personal computers, is under way at Graz University of Technology in Austria.

"If you find a [SHA-1] collision, people in industry will be forced to upgrade their products," says Bart Preneel, a cryptologist at Katholieke Universiteit Leuven in Belgium. Anticipating such a breakdown, Microsoft in 2005 banned both SHA-1 and MD5 from new products and has removed MD5 from all its current products, says Kristin Lauter, head of the Cryptography Group at Microsoft Research in Redmond, Washington. Fortunately, a good backup is already available. In 2004, NIST issued several new standards, collectively called SHA-2, which are more secure than SHA-1 because they produce longer hashes (up to 512 bits instead of 160).

But NIST worries that SHA-2 could eventually fall, too. "Everything that has been attacked is in the same family," says William Burr of NIST's Security Technology Group. "It may turn out that they aren't broken or can't be broken, but we didn't want to get caught out on the wrong side."

After extensive debate, including two international workshops in 2005 and 2006, NIST decided that a new competition could turn up completely new approaches to hash functions. "We'll be reluctant to pick something that looks just like SHA-2," says Burr. "We want some biodiversity."

Although no designs have been formally submitted yet--the deadline is in October--experts predict that most entrants will continue to be iterative algorithms subtly retooled to defeat the new kinds of attacks. For instance, Preneel's RIPEMD--one of the few first-generation hash functions still standing--performs two parallel iterations, making it difficult for an attacker to figure out which one to attack.

A second approach, called "provably secure" hash functions, derives its presumptive security from math problems that are considered to be hard to crack (see sidebar, above). This type of algorithm typically does not require multiple iterations, but it does require cryptologists to put their faith in a mathematical "black box." Also, such algorithms tend to be slower than iterative algorithms because they require a more elaborate calculation--even though it is performed only once. Speed is at a premium for hash functions, as they are typically used to tag a document in the split-second it's electronically transmitted.

Not surprisingly, mathematicians love provably secure systems, whereas cryptologists have little use for them. "They are typically only provable with respect to one property but are weak with respect to other properties," says Joan Daemen of STMicroelectronics, co-winner of the AES competition. For instance, a "provably secure" hash developed by Lenstra and his colleagues, called Very Smooth Hash (VSH), was compromised last year when Markku-Juhani Saarinen at a Spanish company called Kinamik showed that it was easy to find "near-collisions" in VSH. In practice, engineers often truncate a long hash value to a shorter one, assuming that the truncated hash will inherit the long one's security. Saarinen's result means that they can't count on that with VSH.

In the final analysis, what makes it so hard to come up with good hash functions-- and prove they work--is that they are expected to do so many things. "You expect them to do everything and blame them when they don't work," says Preneel. Perhaps a 4-year bake-off will be just what the chef ordered to make some new hash that will satisfy everybody's tastes.

Dana Mackenzie is a freelance writer in Santa Cruz, California.

The editors suggest the following Related Resources on Science sites:

In Science Magazine

NEWS FOCUS COMPUTER SCIENCE: Hash of the Future?: Dana Mackenzie (14 March 2008)
Science 319 (5869), 1481. [DOI: 10.1126/science.319.5869.1481]
| Summary » | Full Text » | PDF »

To Advertise | Find Products

Featured Jobs

Science. ISSN 0036-8075 (print), 1095-9203 (online)

You have reached the bottom of the page. Back to top

Site Tools

Site Search

Account Information