2
$\begingroup$

I have a non-ambiguous context-free grammar. Is there some standard algorithm to create list of all the words in the language the CFG defines?

This can be done with an abvious brute-force search by listing all strings in lexicographic order and testing which belong to the language, but I would prefer some more tractable method.

  • 0
    Oh...I overlooked your explanation of brute-force as essentially generate $\Sigma^{*}$ and test. Then dsg's answer sounds simplest, doing what the grammar says on sets of words (which you can restrict to length. That algorithm is conceptually the right thing, but depending on the size of the sets, can also be slow for small length (too much set manipulation), depending on the number of terminals.2011-05-02

1 Answers 1

2

Yes, this can be done using a recursive algorithm called expand as follows.

expand will take in a node in the grammar and return a set of trees that can be generated from that node.

In the base case, expand is given a terminal and returns a singleton set containing a leaf (one node tree) for that terminal.

In the recursive case, expand is given a node, say $X$, that can be rewritten as $Y_1 Y_2 \ldots Y_n$ via CFG rule $X \to Y_1 Y_2 \ldots Y_n$. expand combines the trees, $z_i$, returned from the recursive calls on each of the $Y_i$ using a Cartesian product. That is, expand(X) will return the following set of trees: $\left\{ X \to z_1 z_2 \ldots z_n \mid z_i \in \text{expand}(Y_i) \right\}$. Here, each tree in the set is rooted at $X$ and contains the trees from the recursive calls as its children.

EDIT

For efficiency, you can use a different collection instead of sets (e.g. array / list). As long as there is no spurious ambiguity, you will compute the same answer. If the grammar turns out to be infinite, replace the sets in the algorithm with lazy streams, and it will continue to work.

  • 0
    @rank -- It's easier to implement for CFGs in Chomsky Normal Form (production rules of arity $\leq 2$) because the Cartesian product is no harder than a simple list comprehension (`cartProd xs ys = [(x,y) | x <- xs, y <- ys]`). StackOverflow has some nice pointers for larger products.2011-05-03