3
$\begingroup$

Narrow pipe hash function designs have recently come under fire, particularly in reference to some SHA-3 candidates. Is this criticism valid? Can it be explained more simply than this paper does?

  • 0
    Not that it's ready yet, but for future reference: http://area51.stackexchange.com/proposals/15811/cryptography2011-05-22
  • 1
    @Qiaochu Yuan: This fragmentation really distresses me. I answer a lot of questions on StackOverflow, and this fragmentation is making it a huge pain for me to figure out where to get a question answered. And half the time, StackOverflow is a better place to ask it anyway. *sigh*2011-05-22
  • 0
    @Omnifarious: I mention this mostly because of your use of the word "criticism." Not being an expert in this field I don't know whether that indicates a mathematical question or a matter of opinion.2011-05-22
  • 1
    @Qiaochu: It is an appropriate question. He is asking whether a certain flaw in a specific class of hash functions causes them to have sufficient vulnerability to be classified as broken.2011-05-22
  • 0
    @Brandon: thank you. I am mildly worried that nobody on math.SE has the expertise to answer this question, but we'll see what happens.2011-05-22
  • 0
    @Omnifarious: in light of the answer below and my comments to it, could you clarify what **mathematical** question you are asking here? The point is that in cryptographic papers the mathematics is usually valid as far as it goes. What matters is the practicality of the assumptions made and even how the community feels about them. But it seems clear to me that such criticisms -- while extremely important -- are outside the scope of this site (and thus it is a good thing that a cryptography site is on the horizon).2011-05-23
  • 0
    @Pete L. Clark: I think the mathematical question I'm asking here is clear. I will answer my own question in a way that I hope makes it clear how it is a mathematical question.2011-05-23

2 Answers 2

3

You're asking whether an ongoing research-level discussion has merits or not. This is difficult to know since experts are actively working on this stuff and you never know what can happen: criticism can fizzle out (as did Courtois's algebraic attacks on AES, after causing a major scare 10 years ago) or pay off (as did Xiaoyun Wang's MD5 attacks).

You want my opinion? I'll give it: existential proofs for hash function security (which is what the cited paper in your question offers) do not hold much sway. Unless you can exhibit an actual attack, I don't think the alarm bells should be ringing. Here's an appropriate example:

Consider some SHA-3 hash function $H$. (For the uninitiated, a cryptographic hash function is a map from a string of any length to one of some specified fixed length; it's a public function; there is no key.) Since $H$ has an infinite domain and a finite range, so by the pigeon-hole principle, there exists a collsion. Therefore $H$ is not collision-resistant and thus it's insecure.

Bogus right? Existential proofs don't mean much in a computational setting. Of course there are collisions, but the hope is that we cannot find them.

That said, there are other papers giving unconvincing attacks against narrow-pipe constructions, so even in the absence of concrete practical attacks, sentiment will grow against the approach even if there will never be any reason to reject it. As sad as it is, cryptography is still partly religious when it comes to designing primitives (meaning we rely a lot on intuition and instinct rather than on proofs, because proofs of security don't exist for hash functions and blockciphers and integer factorization, etc).

Edited to Add: I prefer to remain anonymous here. My PhD is in cryptography, I have 5-6 papers related to cryptographic hashing, and I'm a professor at a "pretty good" place in the U.S. You can read the above opinion with this in mind, or you can discard it as "suspect" and "unsubstantiated." I'll leave that to you.

  • 0
    your opinion would be much more valuable if it came with your real name, so that any interested party could check out your experience with cryptographic issues. I don't see much merit in a pseudonymous opinion.2011-05-23
  • 0
    @Pete: Point well-taken. However, I prefer remaining anonymous on these forums, so I'll have to live with the loss-of-value engendered as a result.2011-05-23
  • 0
    I'm afraid that I *do* find the "I claim that I'm an expert -- you can trust me if you like" form of answer that you've given problematic, but I won't discuss that further here. I can't even figure out whether this should count as an answer to the question. Worse, I can't even figure out whether the question is on-topic for the site. Nothing in this answer **mathematically** addresses the validity of the cited paper, which makes me think that, as Fixee essentially says, its validity is not really a matter of mathematical proof/disproof. If so, this question may be off-topic here..2011-05-23
  • 0
    A better phrasing of my above comment is: "Nothing in this answer addresses the **mathematical** validity of the cited paper". The answer has some mathematics in it, but it doesn't make any claim about the mathematical soundness of the cited paper.2011-05-23
  • 0
    @Pete L. clark: I think your criticism is valid. This is an opinion answer, and the person realizes this and refuses to back the opinion with the credentials that would give it (rightly or wrongly) more weight, but claims they exist. That said, I still think it's useful and respect the desire of @Fixee to remain anonymous. I did get an answer elsewhere that talked about the mathematics behind the attack, and I will give an answer here that adresses this and simplifies the mathematics to a level most could understand.2011-05-23
0

This is a purely theoretical attack that is quite minor and not of any great value in carrying out an actual attack on the security properties of any given hash function.

The explanation involves a bit of math, that while obvious, if explained pedantically using formal mathematics is quite obscure. But, since such pedantry is what is desired on this site, I will give the obscure and strictly correct formal math answer, and then explain it so someone who isn't actually particularly familiar with formal mathematical notation might hope to make sense of it. The original paper, of course, uses formal mathematical reasoning and never really bothers to explain it, which is why it is inaccessible and why I asked this question in the first place.

Personally, I feel that papers that rely on this kind of thing and don't bother to explain it in a way that makes sense to people who don't want to wade through a maze of specialized symbolic notation should be rejected for publication. Then maybe the people who write them would learn to communicate.

So, here's the semi-useless math-out section:


There is a function such that:

$$f(x) \to y$$ $${ X \equiv \{ a \mid a \in \mathbb Z, 0 \leq a < 2^n \} }, { x \in X }$$ $${ Y \equiv \{ a \mid a \in \mathbb Z, 0 \leq a < 2^m \} }, { y \in Y }$$

In an ideal hash function that functions as a random oracle, $f(x)$ maps each individual member of set $X$ onto a completely random member of set $Y$. We'll define $O$ as the range of $f$ like so:

$$O \equiv \{ {y \in Y} \mid {y = f(x)}, x \in X \}$$ $$O \subseteq Y$$

if $n < m$ it is clearly the case that $O \neq Y$. Also, if $n = m$ it will also be true that, in the most probable case, $O \neq Y$ (even though $O \subset Y$). Though as $n$ grows larger than $m$ it becomes increasingly probable that $O = Y$.

It turns out that, on average, $\frac{n(O)}{n(Y)} = {1 - \frac{1}{e}}$ when $n = m$, as is the case in narrow pipe hash functions. ${1 - \frac{1}{e} } \approx 0.632$, so this means that only a little more than half the members of $Y$ will ever be mapped to from $X$.

$$\frac{2^n}{2} = 2^{n-1}$$ $$\frac{2^n}{\frac{1}{1 - \frac{1}{e}}} = {2^{n-\log_2 {1 - \frac{1}{e}}}} \approx {2^{n-0.66}}$$

This means that $\log_2 n(O)$ is approximately 0.66 less than $\log_2 n(Y)$ if $n = m$. If $n$ is much larger than $m$, as is the case in wide-pipe designs, then it's most likely that $O = Y$ so $n(O) = n(Y)$.


Less formally, a narrow pipe design maps an $n$ bit value to another $n$ bit value. If this mapping is a completely random mapping, that means some values will not appear in the output since there will be some output values for which there are multiple input values that map to it.

It turns out that the number of output values is (in the average case) $2^n \cdot (1 - \frac{1}{e})$. Since ${1 - \frac{1}{e}} \approx 0.632$ that means nearly half of the possible output values won't actually happen. This sounds kind of scary until you think about it. If you lose a full half of the output range, that means you've only effectively lost one bit, and since you're losing less than half of the output range you've effectively lost less than one bit. This is very minor.

So, it's a valid attack, yes. But this does not mean it is at all something to worry about.

  • 0
    Unless someone can give a better and clearer explanation, this is the answer I will accept as soon as I am able. (It won't let me yet.)2011-05-23
  • 0
    I don't understand your "fairly obvious math": if by the average number of $x$-valued that map to a given $y$-value you mean the arithmetic mean of the number of preimages of a point $y$ in the codomain, then if the domain $X$ and the codomain $Y$ are finite sets, this average value is $\# X / \# Y$. Under the assumption that $X$ and $Y$ have the same size -- this called **finite narrow domains** in the linked to paper (it considers both this case and the case in which $\# X \gg \# Y$), this means the average number of preimages is $1$, not $\frac{1}{1-\frac{1}{e}}$.2011-05-23
  • 0
    Let me say that the above remark -- in that it focuses on the mathematics rather than the cryptographic issues and assumptions made -- is rather shallow and nit-picky. But as a mathematician, it is the only kind of remark I am qualified to make here...2011-05-23
  • 0
    @Pete L. Clark: I hate when people vote me down when I'm not actually wrong. If you do not like questions that involve cryptography in any way, regardless of whether math is involved or not, please leave this question alone and go elsewhere.2011-05-23
  • 0
    @Omnifarious: I don't mean to be confrontational, but...I didn't say you were not actually wrong. My first comment pointed out what I understand to be a mathematical mistake. It is confined to the sentence "In fact..." What you say after that is correct as far as I can see. If you fix this sentence -- or, of course, if it turns out that I am mistaken -- I will gladly remove the downvote. It's not meant to cause you any distress. (I see that this is your first answer on this site. Maybe things work differently on other SE sites; apologies if this was not what you were expecting.)2011-05-23
  • 0
    @Pete L. Clark: I'm not sure I understood your original criticism. I re-wrote my answer, and I hope that you think it's correct. I am not a mathematician, and I am not at all comfortable with formal mathematical notation, so please keep that in mind if you point out other flaws. I will try my best to understand what you say, but the original paper was basically proof by math-out, and I have a strongly negative reaction to that kind of thing.2011-05-23
  • 0
    @Omnifarious: I have removed my downvote. My comments seem to have prompted you to say some less than kind things about this site, mathematics and cryptography. I'm very sorry for that, and I will stay out of your way in the future.2011-05-23
  • 0
    @Omnifarious: Please refrain from being confrontational. Dr. Clark was only attempting to clarify your claim that certain math was "obvious," when that is not the case. The reason why the average number of preimages is $\frac{1}{1-\frac{1}{e}}$ instead of $1$ is because it is a uniform distribution of **ideal random functions**, which operate on a different model. This is not explicitly stated in the paper, and definitely not obvious.2011-05-23
  • 0
    On a second (somewhat unrelated) point, much of computer security (including cryptography) relies heavily on mathematics for its stability, and if you plan to continue in the field I highly recommend that you become more open-minded to its uses. This paper did not use math-out at all; rather a semi-rigorous argument showing precisely what the average decrease in security is.2011-05-23
  • 0
    @Brandon Carter: I was a little confrontational. I'm frustrated, but shouldn't have taken it out on @Pete L. Clark. I do not want to be a cryptographer. I simply want to understand the results of cryptographic analysis to make better implementation decisions. I think the role of rigorous mathematical proofs in papers is very important, I think the role of less formal and more accessible explanations is even more important. But this is a complex discussion to have in the comment section of a stack exchange question.2011-05-23
  • 0
    @Brandon Carter: I consider the fact the number of pre-images is not 1 to be common sense obvious. The fact that it's $\frac{1}{1-\frac{1}{e}}$ is not obvious. I explained it to my girlfriend this morning in terms of an infinite sized scrabble bag with a fixed set of numbers in it and drawing from it to fill in a fixed set of slots. It's obvious that the probability of every number in the scrabble bag appearing in a slot is low if the number of slots is the same as the number of different numbers in the bag.2011-05-23