2
$\begingroup$

My question is how to prove that $\Sigma$-formulas are uniquely readable (in our course this was wasn't really proved - in the proof it said just "proof by induction", but I'm confused what was meant by that, this we aren't dealing with numbers on which I can induct; was an induction by the length of the formula meant ? A colleague said,, that probably induction on the construction of the formula was meant, but I don't know what this menas either). I would very much like to see a proof, so that I can get the idea and use this type of proof later on my own, since a lot of "proofs by induction" which involve formulas follow...

Sadly, for this I have to introduce some definitions since they seem to vary from book to book and I haven't found a book, which uses similar definitions to the ones in your course (and since my questions are very much about the details, I can't use a different seet of defintions). So after these apologies, here comes the definition:

We have defined the set $\Sigma_{Term}$ of $\Sigma$-formulas as the smallest set of strings over the alphabet $\Sigma\cup\mathbb{V}$ (where $\Sigma$ is the signature and $\mathbb{V}=\left\{v_0,v_1,v_2,\ldots \right\}$ is a countable set of variables) such that:

$ \blacktriangleright$ every variable is a term

$ \blacktriangleright \ \ \ \mathtt{ft_1\ldots t_n}\in \Sigma_{Term}$ if $\mathtt{t_1,\ldots ,t_n}$ were $\Sigma$-terms and $\mathtt{f}$ was a symbol for a function of arity $n$ in the signature $\Sigma$

  • 3
    Proving unique readability is different than most proofs about formulas. Most of those use structural induction, but you can't use that alone for proving unique readability, because the whole point of unique readability is to prove that there is only one way to recover structure from a string. In other words structural induction is generally used for things that have already been parsed. Proofs of unique readability often involve an induction on the length of the string, as in my proof below.2011-06-24

1 Answers 1

6

Your definition in the question just defines terms, not formulas. Here is how to prove unique readability of terms.

Say that a string $s$ encodes a sequence of terms if there is a sequence $t_1, \ldots, t_n$ such that $s$ is the concatenation of the string for $t_1$ followed by the string for $t_2$ and so on up to the string for $t_n$. Here I use "term" for the semantic object and "string" for the syntactic one being parsed.

Next, suppose for a contradiction that $s$ is a nonempty concatenation of nonempty strings representing terms which is of minimal length with respect to the property that $s$ can be parsed two ways, i.e., there are two different sequences of terms the concatenation of whose string makes $s$. In other words there is more than one way to parse $s$ into a sequence of terms, but no shorter string which encodes a sequence of terms has this property. If there is any string with that non-unique-readability property there is one of minimal length.

$s$ cannot be empty, there is only one way to parse that.

If $s$ starts with a variable then that variable must be the first term in any sequence of terms that makes up $s$, so the rest of $s$ must be able to be parsed in two ways, and the rest is a shorter string.

Thus we can assume $s$ starts with a function symbol $f$ of some arity $n$. Now look immediately after $f$. Because of the definition of terms, $f$ must be followed by a concatenation of strings for $n$ terms. So if we remove $f$ from the front of $s$ to get s', then s' is still a concatenation of terms ($n-1$ more terms than were in $s$, in a sense). Because $s$ was minimal, s' can be parsed into a sequence of terms in only one way. But then $s$ also can be parsed into a sequence in only one way: the first $n$ terms of s' go with $f$.

So the parsing algorithm is simply to repeat the following as long as the input is nonempty:

  • If you see a variable, take it from the input
  • If you see a function symbol, find its arity $n$, and then call this algorithm recursively to take $n$ terms from the input

When the input runs out, if you were in the middle of filling some function symbol's arguments, the string is invalid. Otherwise it is valid.

This sort of parsing algorithm is common for systems that use Polish notation, and its simplicity is one of their key advantages. Parsing for infix operators is much more difficult, because it requires parentheses to get unique readability.