3
$\begingroup$

I am looking for software/algorithm for the smallest context free grammar describing a finite set of words (and no other words).

For a single word I found sequitur

Related to this seems: given a CFG what is the shortest functionally equivalent CFG?

I would prefer the size to be measured by the CNF (for a precise estimate of the size).

To clarify I know the set can be described by a regular expression, this question is about the smallest CFG.

  • 4
    Since you want to de$s$cribe *precisely* a finite set, you can never use recursion. Therefore, all you can do is look for (longest) common subsequences. In terms on non-terminals, surely the trivial (non-CNF) grammar just deriving all words from the starting symbol is minimal.2011-02-05

3 Answers 3

2

The GAP package Automata can do the following (quoting its homepage):

  • compute a rational expression for the language recognized by a finite automaton;

  • compute an automaton for the language given by a rational expression;

  • minimalize a finite automaton;

  • visualize automata, using the external program GraphViz;

Hope it may be helpful here.

1

Any finite set of words is recognized by a finite state automaton (i.e., a regular expression) and therefore does not need the expressive power of a CFG. You can build a deterministic or non-deterministic automaton and minimize it. To do this programmatically, use a Lex tool from the Lex/Yacc tool family. The size of the automaton (typically the number of states is used) is a measure for the complexity of the set.

  • 0
    For the string $a^{2^k}$ CFG give exponential advantage over regular expressions via doubling B->A A.2011-02-06
0

For a finite set of words you can just concatenate them with differente separator symbols and use approximations for the Smallest Grammar Problem.

You may find some here:

http://www.lix.polytechnique.fr/~ponty/index.php?lang=en&css=bioinfo&page=grammarapproximations

I developed some others which perform better for my PhD. Ping me if you are interested in the code/papers.