1
$\begingroup$

The following is an exercise in a book I am reading:

Let $\Sigma=\{a,b,c\}$, define $L$ to be the language of all words over $\Sigma$ that do not contain $ab$ as a sub-word. Find a regular expression for $L$.

I am unable to solve this problem, I would like to know how to tackle this type of question, what is the solution and what is the thought process to get to it

  • 0
    @GerryMyerson, yes. I saw the keywords. Sorry, but I only know the algebra.2012-08-21

1 Answers 1

4

One easy if tedious approach is to design a finite-state automaton that recognizes $L$ and then apply the algorithm that converts such an automaton to a corresponding regular expression.

Going about it directly, I observe that if $w\in L$, every $a$ in $w$ must be immediately followed by another $a$ or a $c$ or be at the end of the word; otherwise there are no restrictions. Similarly, every $b$ must be immediately preceded by another $b$ or a $c$ or be at the beginning of the word. Assume for the moment that $w$ does not begin with $b$ or end with $a$. If $w$ begins with $a$, it must begin $aa^*c(b\mid c)^*$. This pattern may be repeated any number of times: $\big(aa^*c(b\mid c)^*\big)^*$. To allow it to end with $a$, just tack on $a^*$: $\big(aa^*c(b\mid c)^*\big)^*a^*$. To allow it to begin with $b$, prefix $b^*$: $b^*\big(aa^*c(b\mid c)^*\big)^*a^*$. In fact, a little thought reveals that we can instead prefix $(b\mid c)^*$ to cover all cases:

$(b\mid c)^*\big(aa^*c(b\mid c)^*\big)^*a^*\tag{1}$

Added: Gerry Myerson’s approach in the comments is more elegant, though I’d carry it out a little differently. $\Sigma^*=(a\mid b\mid c)^*$, and we want to keep all of it that doesn’t have an $a$ followed immediately by a $b$. Thus, except at the end of a word we want to replace the selection $a$ by one of $ac,aac,aaac$, etc. If we allowed infinite disjunctions, this would give us $(b\mid c\mid ac\mid aac\mid aaac\mid\dots)^*a^*\tag{2}$ for $L$, where the last $a^*$ is to allow the word to end with $a$. We don’t allow such expressions, but $ac\mid aac\mid aaac\mid\dots$ can be written legitimately as $aa^*c$, and $(2)$ can then be written legitimately as $(aa^*c\mid b\mid c)^*a^*\;.\tag{3}$

It’s not hard to see that the languages described by $(1)$ and $(3)$ are subsets of $L$: neither regular expression permits $ab$. To see that these languages include all of $L$, note first $\lambda$, the empty word, is in both. Now let $w=x_1^{n_1}\dots x_m^{n_m}\in L$ be non-empty, where $x_k\in\Sigma$ and $n_k\in\Bbb Z^+$ for $k=1,\dots,m$, and $x_k\ne x_{k+1}$ for $k=1,\dots,m-1$. Assume further that $w$ is minimal in length among all words of $L$ not matching one of the regular expressions $(1)$ and $(3)$.

If $a$ does not occur in $w$, $w$ clearly matches both $(1)$ and $(3)$, so let $i$ be the first index such that $x_i=a$, and let $u=a^{n_i}x_{i+1}^{n_{i+1}}\dots x_m^{n_m}$. To show that $w$ matches $(1)$, it suffices to show that $u$ matches $\big(aa^*c(b\mid c)^*\big)^*a^*$; to show that $w$ matches $(3)$, it suffices to show that $u$ matches $(3)$. Both of these are clear if $u=a^{n_i}$, since in that case $u$ matches the final $a^*$ of each regular expression. Otherwise, $x_{i+1}=c$, and $a^{n_i}c$ matches $aa^*c$ in both regular expressions. Now let $v$ be what’s left of $u$ after the initial $a^{n_i}c$ is removed. Then $w$ matches $(1)$ iff $v$ matches $(1)$, and $w$ matches $(3)$ iff $v$ matches $(3)$. But $v\in L$, and $|v|<|w|$, so by hypothesis $v$ matches both $(1)$ and $(3)$, and hence so does $w$. Thus, each of these regular expressions represents $L$.

  • 0
    @Gerry: Not that I can see, but I was deliberately sticking as close to $(2)$ as I legally could in order to simplify the explanation.2012-08-22