Narrow pipe hash function designs have recently come under fire, particularly in reference to some SHA-3 candidates. Is this criticism valid? Can it be explained more simply than this paper does?
How valid is the concern over narrow pipe cryptographic hash function designs?
-
0@Pete L. Clark: I think the mathematical question I'm asking here is clear. I will answer my own question in a way that I hope makes it clear how it is a mathematical question. – 2011-05-23
2 Answers
You're asking whether an ongoing research-level discussion has merits or not. This is difficult to know since experts are actively working on this stuff and you never know what can happen: criticism can fizzle out (as did Courtois's algebraic attacks on AES, after causing a major scare 10 years ago) or pay off (as did Xiaoyun Wang's MD5 attacks).
You want my opinion? I'll give it: existential proofs for hash function security (which is what the cited paper in your question offers) do not hold much sway. Unless you can exhibit an actual attack, I don't think the alarm bells should be ringing. Here's an appropriate example:
Consider some SHA-3 hash function $H$. (For the uninitiated, a cryptographic hash function is a map from a string of any length to one of some specified fixed length; it's a public function; there is no key.) Since $H$ has an infinite domain and a finite range, so by the pigeon-hole principle, there exists a collsion. Therefore $H$ is not collision-resistant and thus it's insecure.
Bogus right? Existential proofs don't mean much in a computational setting. Of course there are collisions, but the hope is that we cannot find them.
That said, there are other papers giving unconvincing attacks against narrow-pipe constructions, so even in the absence of concrete practical attacks, sentiment will grow against the approach even if there will never be any reason to reject it. As sad as it is, cryptography is still partly religious when it comes to designing primitives (meaning we rely a lot on intuition and instinct rather than on proofs, because proofs of security don't exist for hash functions and blockciphers and integer factorization, etc).
Edited to Add: I prefer to remain anonymous here. My PhD is in cryptography, I have 5-6 papers related to cryptographic hashing, and I'm a professor at a "pretty good" place in the U.S. You can read the above opinion with this in mind, or you can discard it as "suspect" and "unsubstantiated." I'll leave that to you.
-
0@Pete L. clark: I think your criticism is valid. This is an opinion answer, and the person realizes this and refuses to back the opinion with the credentials that would give it (rightly or wrongly) more weight, but claims they exist. That said, I still think it's useful and respect the desire of @Fixee to remain anonymous. I did get an answer elsewhere that talked about the mathematics behind the attack, and I will give an answer here that adresses this and simplifies the mathematics to a level most could understand. – 2011-05-23
This is a purely theoretical attack that is quite minor and not of any great value in carrying out an actual attack on the security properties of any given hash function.
The explanation involves a bit of math, that while obvious, if explained pedantically using formal mathematics is quite obscure. But, since such pedantry is what is desired on this site, I will give the obscure and strictly correct formal math answer, and then explain it so someone who isn't actually particularly familiar with formal mathematical notation might hope to make sense of it. The original paper, of course, uses formal mathematical reasoning and never really bothers to explain it, which is why it is inaccessible and why I asked this question in the first place.
Personally, I feel that papers that rely on this kind of thing and don't bother to explain it in a way that makes sense to people who don't want to wade through a maze of specialized symbolic notation should be rejected for publication. Then maybe the people who write them would learn to communicate.
So, here's the semi-useless math-out section:
There is a function such that:
$f(x) \to y$ ${ X \equiv \{ a \mid a \in \mathbb Z, 0 \leq a < 2^n \} }, { x \in X }$ ${ Y \equiv \{ a \mid a \in \mathbb Z, 0 \leq a < 2^m \} }, { y \in Y }$
In an ideal hash function that functions as a random oracle, $f(x)$ maps each individual member of set $X$ onto a completely random member of set $Y$. We'll define $O$ as the range of $f$ like so:
$O \equiv \{ {y \in Y} \mid {y = f(x)}, x \in X \}$ $O \subseteq Y$
if $n < m$ it is clearly the case that $O \neq Y$. Also, if $n = m$ it will also be true that, in the most probable case, $O \neq Y$ (even though $O \subset Y$). Though as $n$ grows larger than $m$ it becomes increasingly probable that $O = Y$.
It turns out that, on average, $\frac{n(O)}{n(Y)} = {1 - \frac{1}{e}}$ when $n = m$, as is the case in narrow pipe hash functions. ${1 - \frac{1}{e} } \approx 0.632$, so this means that only a little more than half the members of $Y$ will ever be mapped to from $X$.
$\frac{2^n}{2} = 2^{n-1}$ $\frac{2^n}{\frac{1}{1 - \frac{1}{e}}} = {2^{n-\log_2 {1 - \frac{1}{e}}}} \approx {2^{n-0.66}}$
This means that $\log_2 n(O)$ is approximately 0.66 less than $\log_2 n(Y)$ if $n = m$. If $n$ is much larger than $m$, as is the case in wide-pipe designs, then it's most likely that $O = Y$ so $n(O) = n(Y)$.
Less formally, a narrow pipe design maps an $n$ bit value to another $n$ bit value. If this mapping is a completely random mapping, that means some values will not appear in the output since there will be some output values for which there are multiple input values that map to it.
It turns out that the number of output values is (in the average case) $2^n \cdot (1 - \frac{1}{e})$. Since ${1 - \frac{1}{e}} \approx 0.632$ that means nearly half of the possible output values won't actually happen. This sounds kind of scary until you think about it. If you lose a full half of the output range, that means you've only effectively lost one bit, and since you're losing less than half of the output range you've effectively lost less than one bit. This is very minor.
So, it's a valid attack, yes. But this does not mean it is at all something to worry about.
-
0@Brandon Carter: I consider the fact the number of pre-images is not$1$to be common sense obvious. The fact that it's $\frac{1}{1-\frac{1}{e}}$ is not obvious. I explained it to my girlfriend this morning in terms of an infinite sized scrabble bag with a fixed set of numbers in it and drawing from it to fill in a fixed set of slots. It's obvious that the probability of every number in the scrabble bag appearing in a slot is low if the number of slots is the same as the number of different numbers in the bag. – 2011-05-23