3
$\begingroup$

Let $A$ be a finite alphabet with $|A| = n$.   Is there an elementary formula for counting the number of strings over $A$ that don't contain any repeated substrings of length greater than $1$? More formally, let $t$ be a substring of a string $s$ over $A$ with $|t| > 1$.   $t$ is said to repeat in $s$ if there is at least two disjoint copies of $t$ in $s$. For example $s=abaaab$ is not counted since $ab$ is a repeat, however $s=abaaa$ contains no repeats since $aa$ has no two disjoint copies.

I've tried your basic discrete math, balls-in-bins, type solutions, but each time end up over counting. I know that for fixed $n$, there is a finite number of such strings, and a bound on their length.

  • 0
    It might be helpful to find the minimal $m$ for which a string of length $m$ _must_ have a repeat. Then $n^m$ is an upper bound.2012-04-17
  • 0
    I know, but it would be great to have a formula for what I'm doing, and even greater if it were a polynomial in |A|. I drew a graph with $n$ nodes labeled with letters in $A$, and drew a loop from each node two itself and labeled it '2', and a node to every other node labeled it '1'. So the count would be equivalent to the number of paths on in the graph such that '2' edges are only traversed twice and '1' edges once.2012-04-17
  • 0
    @Dan: The '2' edges could only be traversed twice in immediate succession, not twice in general.2012-04-17
  • 0
    I'm trying to figure out a formula for the first few k, and see if there's pattern. Will post what I've got tomorrow.2012-04-17
  • 0
    A thought: work out by hand or computer the first few values for $|A|=1,2,3,\ldots$ and then check if there is any information on [OEIS](http://oeis.org/).2012-04-17
  • 0
    Here's a graph reformulation, just because it's a fun exercise even if it does not help. How many nodes ("nonrepeat" words) are are in a maximal tree having (1) a single distinguished root (the empty word) (2) with $|A|$ colorless edges emanating from the root (3) with all other edges colored using $|A|^2$ colors (words of length 2) (4) such that no no node has a repeated color emanating from it (in a direction away from the root) and (5) no root-to-leaf branch has a repeated color unless that color is one of the first $|A|$ colors and the repeated colorings are adjacent?2012-04-17
  • 0
    The first four terms are $4,55,8254,32544157$. [Here's the code](https://gist.github.com/2408136) to produce those counts. This isn't in OEIS.2012-04-17
  • 0
    @joriki: I thought it was possible that not including the empty word might give something in OEIS, but it doesn't, even when divided by 3.2012-04-18
  • 0
    @Tara: I'd tried without the empty word, too. I hadn't tried dividing by $3$; I hadn't realized the numbers without the empty word are all divisible by $3$. Do you see a reason for that?2012-04-18
  • 0
    @joriki: No, I don't see a reason for it, and I guess I wouldn't conjecture from only four terms that it is definitely always the case.2012-04-18

1 Answers 1

1

Not an exact number but is possible to give a somewhat decent lower bound for the values this can take. First of all, note that we are fine looking only at substrings of length 2, as a repeated substring of greater length would imply a repeated substring of length 2 also. Moreover, disjointedness for these is only an issue for substrings of the form 'aa', not of those of the form 'ab'. Consider a 'reduced' string which has no substrings of the form 'aa'. From this, valid strings for our language can be built up by multiplying at most one occurence of each letter into 2 or 3 occurences, eg. multiply a into 3 in abc gives aaabc. Note that if a 'reduced' string has two or more occurences of say 'a', then we can pump only one of them like this else 'aa' will get repeated. Similarly for other letters of the alphabet. In this way, if a 'reduced' word has $k_1$ of the first letter, $k_2$ of the second one, ... $k_n$ of the nth one, then it can be turned into $(1+2k_1)(1+2k_2)...(1+2k_n)$ words. Moreover, each word corresponds to a unique reduced word.

Note that the length of a reduced word can be of length at most $n(n-1) + 1$, as number of substrings of length 2 not of the form 'aa' is $n(n-1)$. We can show that this can actually be attained, by induction:

Induction hypothesis is that this string exists for n letters, and begins and ends with the nth letter.

For n = 2, just use say $a_2a_1a_2$.

Suppose we have a string like this say $a_nsa_n$.of length $n(n-1)$. Consider the string $a_{n+1}a_1a_{n+1}a_2a_{n+1}...a_{n-1}a_{n+1}a_nsa_na_{n+1}$.

We can easily see that this satisfies the condition. Its length = $n(n-1) + 2(n-1) + 2 = n(n+1)$.

In this, if the elements $a_1, a_2, ... a_n$ are replaced by some $\sigma(a_1), \sigma(a_2), ... \sigma(a_n)$ where $\sigma$ is some permutation of $(1 ... n)$, then too the condition will hold. This immediately gives us $n!$ such words, so a polynomial bound is out of the question. Moreover, in each of these, the first letter comes n times while the rest come $n-1$ times, so each can be turned into $(1+2n)(1+2(n-1))^{n-1}) = (2n+1)(2n-1)^{n-1}$ more valid strings. So we get a lower bound of $(2n+1)(2n-1)^{n-1}n!$ for number of valid strings.

  • 0
    That makes sense - at least $n!$ permutations.2012-04-17
  • 0
    The important thing is that the length of such no-repeat strings is bounded.2012-04-17