Say I want to find the largest regular language containing one set of strings but not another set.
So, I've got a programming problem and I've posed it in a good form above. Now what? How do I find a paper that talks about this or find out if it exists?
What I have access to:
- Directory of Open Access Journals
- arXiv
- "Mathematics on the Web" index
For searching:
-Google Scholar - Scirus - CiteSeerX
And to my recent surprise some of the articles in "Annals of Math" are available freely in full text.
And if the paper is interesting enough I would pay a fee. Now, why is it so hard for me to find whether or not someone has answered this question? It shouldn't be so tedious. [End Rant]
So, the question. Given two sets $S$ and $T$ of $n$ text strings each that are on average $10^4$ characters long. I need an algorithm that finds the largest regular language containing all of $S$ and none of $T$.
This is interesting because I've already got a use for it. Given a set of example web pages that satisfy a certain property (like a set of Annals of Math pages on which there is a link for Full Text), and a set that don't, it should construct a regular expression that I could use to crawl a site and find all pages that satisfy the property (and list all free Annals papers on a web page). Now I could just construct a simple regex to do that, but then I'd have to do that each time and the regex required may not be as simple as *Full Article*
Oh yeah, the algorithm needs to construct a regular expression to represent the language.