1
$\begingroup$

Suppose I have a simple regular expression describing a language like $(a+b)^* a?b (a+b)^*$ (a language in $\Sigma = \{a,b\}$ consisting of all words with substring $a?b$). I haven't found a general way to negate any regular expression, and it seems that no such way exists. There is similar question: A regular expression for the words that don't contain the sequence $ab$ over $\{a,b,c\}$, but the way described there quickly becomes very complicated when regular expression length increases. How to deal with this?

  • 0
    Isn't $a?b=\{ab,b\}$? Then you want all words not containing $b$, i.e. $a^*$.2012-09-09
  • 0
    No, $a?b = a \{a,b\} b$ ($?$ is any symbol).2012-09-09
  • 0
    D'oh, if you've encountered DOS wildcards and PERL r.e. in real life, you are doomed to mix things sooner or later. :)2012-09-09
  • 1
    However @Hagen's meaning of `?` is the usual meaning for `?` in "regular expressions", and the "any character" meaning is only commonly used in simple patterns that have no repetition constructs and so are _not_ called "regular expressions".2012-09-09
  • 0
    @HenningMakholm, I have some books about theory of formal languages, and the meaning of $?$ is "any character" in many of them. But in programming-related fields yes, it's as Hagen wrote.2012-09-09

1 Answers 1

2

Convert the regular expression to a finite automaton accepting $L$. Then interchange accepting and non-accepting states, which produces an automaton accepting $\Sigma^*\setminus L$. Finally, convert the automaton to a regular expression.

  • 0
    It's not so easy to build a DFA for $a?b$, and for NFA it's not easy to build a regex...2012-09-09
  • 2
    @chersanya: The regex-to-DFA is easy enough (and in most practical cases ends with a nice smallish automaton), but DFA-to-regex is very complicated and tends to produce huge resulting regexes. Unfortunately it's the best avaliable _general_ procedure for negating regexes.2012-09-09