3
$\begingroup$

Let $A$ be a finite alphabet with $|A| = n$.   Is there an elementary formula for counting the number of strings over $A$ that don't contain any repeated substrings of length greater than $1$? More formally, let $t$ be a substring of a string $s$ over $A$ with $|t| > 1$.   $t$ is said to repeat in $s$ if there is at least two disjoint copies of $t$ in $s$. For example $s=abaaab$ is not counted since $ab$ is a repeat, however $s=abaaa$ contains no repeats since $aa$ has no two disjoint copies.

I've tried your basic discrete math, balls-in-bins, type solutions, but each time end up over counting. I know that for fixed $n$, there is a finite number of such strings, and a bound on their length.

  • 0
    @joriki: No, I don't see a reason for it, and I guess I wouldn't conjecture from only four terms that it is definitely always the case.2012-04-18

1 Answers 1

1

Not an exact number but is possible to give a somewhat decent lower bound for the values this can take. First of all, note that we are fine looking only at substrings of length 2, as a repeated substring of greater length would imply a repeated substring of length 2 also. Moreover, disjointedness for these is only an issue for substrings of the form 'aa', not of those of the form 'ab'. Consider a 'reduced' string which has no substrings of the form 'aa'. From this, valid strings for our language can be built up by multiplying at most one occurence of each letter into 2 or 3 occurences, eg. multiply a into 3 in abc gives aaabc. Note that if a 'reduced' string has two or more occurences of say 'a', then we can pump only one of them like this else 'aa' will get repeated. Similarly for other letters of the alphabet. In this way, if a 'reduced' word has $k_1$ of the first letter, $k_2$ of the second one, ... $k_n$ of the nth one, then it can be turned into $(1+2k_1)(1+2k_2)...(1+2k_n)$ words. Moreover, each word corresponds to a unique reduced word.

Note that the length of a reduced word can be of length at most $n(n-1) + 1$, as number of substrings of length 2 not of the form 'aa' is $n(n-1)$. We can show that this can actually be attained, by induction:

Induction hypothesis is that this string exists for n letters, and begins and ends with the nth letter.

For n = 2, just use say $a_2a_1a_2$.

Suppose we have a string like this say $a_nsa_n$.of length $n(n-1)$. Consider the string $a_{n+1}a_1a_{n+1}a_2a_{n+1}...a_{n-1}a_{n+1}a_nsa_na_{n+1}$.

We can easily see that this satisfies the condition. Its length = $n(n-1) + 2(n-1) + 2 = n(n+1)$.

In this, if the elements $a_1, a_2, ... a_n$ are replaced by some $\sigma(a_1), \sigma(a_2), ... \sigma(a_n)$ where $\sigma$ is some permutation of $(1 ... n)$, then too the condition will hold. This immediately gives us $n!$ such words, so a polynomial bound is out of the question. Moreover, in each of these, the first letter comes n times while the rest come $n-1$ times, so each can be turned into $(1+2n)(1+2(n-1))^{n-1}) = (2n+1)(2n-1)^{n-1}$ more valid strings. So we get a lower bound of $(2n+1)(2n-1)^{n-1}n!$ for number of valid strings.

  • 0
    The important thing is that the length of such no-repeat strings is bounded.2012-04-17