There are several papers giving graph-theoretical proofs that the smallest grammar problem cannot be solved with a polynomial time algorithm. Could someone give a short intuitive proof? I haven't studied the complexity classes and that area of computer science, but I would like an intuitive understanding of what makes it a hard problem. I'm not convinced myself.
Why is the smallest grammar problem hard?
-
0One can only wonder what the "AH" could mean in such a respectfully crafted comment. – 2012-04-28
1 Answers
The hardness of the smallest grammar problem follows from a reduction to minimum vertex cover; that is, if you could solve the smallest grammar problem in polynomial time, then you could could also solve the minimum vertex cover problem in polynomial time. And vertex cover is a classic NP-hard problem; that is, not solvable in polynomial time unless P = NP.
The "minimum vertex cover" problem involves finding the smallest number of vertices in a graph such that every edge is incident to some vertex in the set.
The reduction works by encoding the graph as a string so that adding a nonterminal to the grammar corresponds to selecting a vertex in the graph. Furthermore, adding the ideal set of nonterminals to minimize the grammar size corresponds to selecting a minimum vertex cover.
So if one could find smallest grammars in polynomial time, then one could find minimum vertex covers in polynomial time, which would imply P = NP.
Alternatively, consider problems of the following variety: compute x^257, x^483, x^1828, and x^3828 using the smallest number of multiplications. This may not seem so hard. But people have studied this problem pretty extensively, but not found even a decent approximation algorithm. And this surprisingly-hard problem can be recast pretty directly in terms of smallest grammers, e.g. what's the smallest grammar for the string:
257 X's A 483 X's B 1828 X's C 3828 X's
On the other hand, as far as I know, some straightforward algorithms for the smallest grammar problem may actually give constant approximations; that is, no family of strings is known for which those algorithms perform badly.
(Sorry in advance if this information is dated or simply wrong.)