1
$\begingroup$

Problem $L = \{ w \in \{a, b\}^* : n(a) \neq 2 \cdot n(b)\}$

This can be done easily with NPDA, but I couldn't find a way to make it work with CFG. My idea was to break it into 2 cases: $n(a) > 2 \cdot n(b)$ or $n(a) < 2 \cdot n(b)$. I first try to generate a language which makes them equal, then add more $a$ or more $b$ but it couldn't handle many special cases. Here is what I got: $$ S \rightarrow U | V $$ $$ U \rightarrow TaU | TaT$$ $$ V \rightarrow TbV | TbT$$ $$ T \rightarrow TaTaTbT | TaTbTaT | TbTaTaT$$

$U$ generates $n(a) > 2 \cdot n(b)$, $V$ generates $n(a) < 2 \cdot n(b)$, and $T$ takes care of the equal case. Any idea would be greatly appreciated.

Update Following Hint by Brian
enter image description here

2 Answers 2

2

Here’s a less general but more direct approach than mercio’s.

Your basic idea is fine, but as it stands, your grammar does not generate any terminal words: at the very least you need to expand that last production to

$$T\to TaTaTbT \mid TaTbTaT \mid TbTaTaT\mid\varepsilon\;.$$

Let $L'$ be the set of strings with exactly twice as many $a$’s as $b$’s. Think about the kind of string that should be generated by $U$, the ones with too many $a$’s. Every $b$ in such a word can be incorporated into a member of $L'$. Adjacent members of $L'$ can be concatenated into longer members of $L'$. If there’s more than one of these maximal members of $L'$, they must be separated by non-empty strings of $a$’s. There might also be strings of $a$’s at the beginning and end. Thus, the words generated by $U$ must have the form $a^*(L'a^+)^+L'a^*$. It would be convenient, therefore, to have a non-terminal $A$ that generates $a^*$, and another, $U_1$, that generates $(L'a^+)^+$. We might try something like this:

$$\begin{align*} &A\to Aa\mid \varepsilon\\ &U\to AU_1TA\\ &U_1\to U_1TaA\mid TaA \end{align*}$$

Certainly $A$ generates $a^*$, and $U_1$ generates $(L'a^+)^+$, so $U$ generates $a^*(L'a^+)^+L'a^*$. This may not be the most efficient solution, but it appears to work, and the same idea can be used to handle $V$.

  • 0
    I wonder how could you generate the string with less $a$? Because if you take $T$, it always generates at least 2 $a's$ each time. For example, string $bab$ should be ok. By the way, I forgot an empty string in my $T$, thanks for pointing that out.2011-12-11
  • 0
    @Chan: All strings with an odd number of $a$’s are in $L$, and you can easily generate them separately. Then you can use the method above to generate the strings that have an even number of $a$’s but too many $b$’s.2011-12-11
  • 0
    I understand the idea, but the order of stacking more $b$ is really annoying. In fact, I was able to figure out the case $n(a) > 2n(b)$. But the other condition is a bit tricky, and again I ran into the situation that I have to break it up into many subcases.2011-12-11
  • 0
    @Chan: $O\to BaBaBO\mid aB$ and $B\to Bb\mid b$ will get all strings with an odd number of $a$’s: you get $(b^*ab^*ab^*)^*ab^*$.2011-12-11
  • 0
    So is $O$ another special case besides $U$ and $V$?2011-12-12
  • 0
    @Chan: Yes; it’s the case of an odd number of $a$’s. It overlaps the $U$ and $V$ cases, but that’s harmless.2011-12-12
  • 0
    I posted my solution based on your hint, I kinda see it but I wasn't still able to fit the odd case in. Like you said, they're overlap, so should these "odd" cases already included in $V$? By the way, thanks a lot for your help.2011-12-12
2

First, the following grammar produces the language $L_0$ containg words where $n(b) \neq n(a)$ $$S \rightarrow U \;|\; V $$ $$U \rightarrow TbT \;|\; TbU$$ $$V \rightarrow TaT \;|\; TaV$$ $$T \rightarrow bTaT \;|\; aTbT \;|\; \epsilon $$ It follows your idea of making two cases, and $T$ is the one giving words with equality and is the only non-obvious (but classic) one.

We have a map from $L$ to $L_0$ by transforming every $b$ into an $bb$. Let's try to pull this grammar back through this map. In order to do this, we need a grammar for the image of this map :

We introduce new symbols $U_b, V_b, T_b$ where $b U_b$ is intended to decompose into the words that $U$ can decompose into but beginning with an $b$. We do the same and have new symbols $U^b, V^b, T^b$ where $U^b b$ decomposes into the words $U$ can decompose into but ending with an $b$.

We get (you have to be careful when a $T$ appears because it can be empty) : $$ S \rightarrow U \;|\; V $$ $$ U \rightarrow (T^b b)bT \;|\; Tb(bT_b) \;|\; (T^b b)bU \;|\; Tb(bU_b) $$ $$ U_b \rightarrow (T_b^b b)bT \;|\; T_b b(bT_b) \;|\; T \;|\; (T_b^b b)bU \;|\; T_bb(bU_b) \;|\; U $$ $$ V \rightarrow TaT \;|\; TaV $$ $$ T \rightarrow b(bT_b)aT \;|\; a(T^bb)bT \;|\; aTb(bT_b) \;|\; \epsilon$$ $$ T_b \rightarrow TaT$$ $$ T^b \rightarrow b(bT_b)aT^b \;|\; a(T^bb)bT^b \;|\; aTb(bT_b^b) \;|\; aT$$ $$ T_b^b \rightarrow TaT^b$$ which describes the words of $L_0$ where every $b$ appears in a pair $bb$.

Now we can pull back and simplify the two symbols having only one rule and get this :

$$ S \rightarrow U \;|\; V $$ $$ U \rightarrow T^b bT \;|\; TbTaT \;|\; T^b bU \;|\; TbU_b $$ $$ U_b \rightarrow TaT^b bT \;|\; TaTbTaT \;|\; T \;|\; TaT^b bU \;|\; TaTbU_b \;|\; U $$ $$ V \rightarrow TaT \;|\; TaV $$ $$ T \rightarrow bTaTaT \;|\; aT^bbT \;|\; aTbTaT \;|\; \epsilon$$ $$ T^b \rightarrow bTaTaT^b \;|\; aT^bbT^b \;|\; aTbTaT^b \;|\; aT$$

Which should work, if I didn't mess up. Note that this method can be generalized to pulling back any grammar through any such map.

  • 0
    Your CFG accepts $aab$, but rejects $bab$.2011-12-11
  • 0
    @Chan that's right. I think I should switch $a$ with $b$ everywhere. I removed words with 2n(a)=n(b) instead of the other way around.2011-12-12