11
$\begingroup$

The Myhill-Nerode Theorem gives an exact characterization of the regular languages. Given any language, one can check whether it meets the criteria of the Myhill-Nerode theorem to decide whether or not it is regular. Note that this is stronger than the pumping lemma for regular languages, which gives a necessary (but not sufficient) condition for a language to be regular.

Is there an analogous result for context-free languages? That is, is there a theorem that gives a necessary and sufficient condition for a language to be context-free?

Thanks!

  • 0
    @QiaochuYuan- That's true, but by the same token the pumping lemma isn't all that useful either, since if the language is described as a TM you can't use the pumping lemma on it. I was figuring that the languages were presented in some form of set-builder notation, if that's reasonable.2012-03-09

4 Answers 4

3

I posted this as an answer to this question on CS, but I think it is also relevant for this one.

There is a very nice characterization of context-free languages (credited to Wechler) in the paper by Berstel and Boasson, Towards an algebraic theory of context-free languages. Let me introduce a few definitions in order to state this result (Theorem 3.1 in the paper).

The polynomial closure $Pol(\mathcal{L})$ of a class $\mathcal{L}$ of languages of $A^*$ is the set of all languages that are finite unions of products of the form $L_0a_1L_1 \dotsm a_nL_n$, where $L_0, ..., L_n \in \mathcal{L}$ and $a_1, ..., a_n \in A$.

An algebra is a class $\mathcal{A}$ of languages containing the language $\{1\}$ and such that $\mathcal{A} = Pol(\mathcal{A})$. It is finitely generated if $\mathcal{A} = Pol(\mathcal{F})$ for some finite set $\mathcal{F}$ of languages. It is stable if, for each $L \in \mathcal{A}$ and $u \in A^*$, the language $u^{-1}L = \{v \mid uv \in L\}$ also belongs to $\mathcal{A}$. Note that it suffices to have $a^{-1}L \in \mathcal{A}$ for all $a \in A$.

Theorem. A language of $A^*$ is context-free if and only if it belongs to a finitely generated stable algebra.

See the article for illuminating examples and many nice consequences.

6

This is probably not really what you are looking for, but it's the best I know. It's a strengthening of the fairly well-known Parikh's Theorem.

There is a characterisation of the bounded context-free languages. A language $L$ is bounded if $L\subseteq w_1^* w_2^* \ldots w_n^*$ for some fixed words $w_1,\ldots,w_n$, in which case we can define a corresponding subset of $\mathbb{N}_0^n$:

$\Phi(L) = \{(m_1,m_2,\ldots,m_n) \mid w_1^{m_1} w_2^{m_2} \ldots w_n^{m_n}\in L\}.$

By a theorem of Ginsburg and Spanier (Thm 5.4.2 in Ginsburg's book 'The Mathematical Theory of Context-free Languages'), a bounded language $L$ is context-free if and only if $\Phi(L)$ can be expressed as a finite union of linear sets, each with a stratified set of periods. For the definitions of the terms in the last sentence, see my MO question.

This characterisation can be very useful for proving (not necessarily bounded) languages not to be context-free. If we can find words $w_1,\ldots,w_n$ such that $L\cap w_1^*\ldots w_n^*$ is not context-free, then $L$ is not context-free either, since $w_1^*\ldots w_n^*$ is a regular language, and the intersection of a context-free language with a regular language is context-free.

5

Myhill-Nerode theorem is very closely related to the following theorem: language $L$ is regular if and only if its syntactic monoid is finite. This also gives you precise characterization of regular languages. Moreover, majority of definitions regarding regular languages can be nicely translated to group theory.

To answer you question, the mentioned theorem can be generalized a bit, i.e. a group is context-free if and only if it contains a finitely generated free subgroup of finite index (compare with this article). Unfortunately this doesn't make anything easier, still it is a characterization of context-free languages that are word problem for some group (to be clear those are not the all context-free languages).

Hope this helps ;-)

Edit: fixed broken link.

Edit 2: made clear that those are not the only context-free languages. Loony me, it must have been a bad digestion (and the phase of the moon too, naturally), that made me write the previous version, sorry for that!

  • 0
    @TaraB You are definitely right!2012-03-13
3

If you interpret the Myhill-Nerode theorem as being about the algebraic structure of the corresponding family of automata and the monoid associated with that family, then there is an interesting necessary and sufficient generalization to context-free languages. One version of this is here -- http://arxiv.org/abs/math.GR/0601061. It is, in effect, that a language $L$ is context free iff it is accepted by a polycyclic monoid automaton iff it is accepted by a free group automaton.

For both these kinds of automata, the monoid or group acts as secondary storage attached to a non-deterministic finite-state control mechanism. Transitions in secondary storage are accomplished by right multiplication in the monoid or group, and the automaton starts with the identity 1 in the store and accepts only if it ends in an accepting state with 1 again in the store. The finite-state control cannot directly "read" the monoid/group store, but it essentially blocks based on what happens in the store, and that, combined with the non-determinism in the finite-state control, confers a lot of computational power on the combination.

The complete state set for the resulting automaton is thus $Q \times M$ where $Q$ is the state set of the finite state control and $M$ is the monoid (or group). For regular languages, the state set is just $Q$, which is finite, or $Q \times M$ where $M$ is just $\{1\}$ or finite. For CFLs $M$ has the structure stated above. So the generalization is about the relationship between the family of languages and the structure of $M$.

To finish the story, a polycyclic monoid is strings over a finite set of symbols X^' = \{L_x, R_x: x \in X \} for some finite set of symbols $X$. The size of $X$ is the rank of the polycyclic monoid. Multiplication in this monoid is given by: $L_x R_x = 1$, $x \neq y \Rightarrow L_x R_y = 0 $, anything else just freely concatenates the right-hand symbol. Think of $L_x$ a pushing x onto a stack (right hand end of a string), $R_x$ as popping x. Trying to pop a symbol that isn't on top blocks the computation. So a polycyclic monoid is mimicking a pushdown store with k symbols, and it's not hard to see that automata using it as a store define the context-free languages.

That makes it seem that we've just trivially algebraicized the intuitive notion of a pushdown store, and indeed we have. But a polycyclic monoid is definable in a number of other ways, for example as the free group on k symbols, with $R_x$ acting as $L_x^{-1}$, and then things get more interesting -- Context-free-ness and group-ness are deeply related, as dtldarek points out, and the finiteness of the whole automaton for regular languages finds an analogy in the finite index of the free group.

  • 0
    @DavidLevis Yeah, I have checked it out. I happily upvoted it, but as for now that's the only thing I can do, besides, that's probably a lost case (not the reason why he didn't do that though).2012-03-12