2
$\begingroup$

Problem: Given an arbitrary, finite alphabet $\Sigma$ with $|\Sigma| > 1$, define the language

$\qquad L = \{w \in \Sigma^* \mid w \text{ has no subword of the form } aaa, a \in \Sigma\}$.

Let $L_n = L \cap \Sigma^n$. Determine $|L_n|$; either a closed form or efficient computation scheme is desired.

Example:

Let $\Sigma = \{a, b\}$. Then, $aabaa, ababa, baaba \in L_5$ but $aaabb, baaab,aabbb \notin L_5$.

Note:

This isn't homework or anything... I am a developer at Oklahoma University and this question came up in a discussion with some fellow developer friends of mine. I have some C# code that uses recursion but I believe it to be far from efficient. I can post it if anybody is interested in looking at it as well.

  • 0
    ...oh. An$d$ what those words are.2011-12-08

3 Answers 3

2

Assume there are $\ell+1\geqslant2$ letters. Call $A_n$ the set of words of length $n$ terminating with twice the same letter, $B_n=L_n\setminus A_n$, $a_n=|A_n|$ and $b_n=|B_n|$, hence $|L_n|=a_n+b_n$.

Each word in $A_{n+1}$ corresponds to a unique word in $B_n$ to which its last letter is appended, hence $a_{n+1}=b_{n}$. Each word in $B_{n+1}$ corresponds to a unique word in $L_n$ to which a letter other than its last letter is appended, hence $b_{n+1}=\ell(a_n+b_{n})$.

Thus, $b_{n+2}=\ell b_{n+1}+\ell b_n$ for every $n\geqslant0$, with $b_0=0$ and $b_1=\ell+1$. The rest is routine. Using $k=\sqrt{\ell^2+4\ell}$, one gets for every $n\geqslant1$, $ |L_n|=\frac{\ell+1}{k}\left(\left(\frac{\ell+k}2\right)^n-\left(\frac{\ell-k}2\right)^n+\left(\frac{\ell+k}2\right)^{n-1}-\left(\frac{\ell-k}2\right)^{n-1}\right). $ One can check that this is indeed an integer by expanding each of the four binomial terms and noting that the odd powers of $k$ disappear, leaving only the even powers of $k$, which are in fact polynomials in $k^2=\ell^4+4\ell$.

The fastest way to compute $|L_n|$ might be to note that $|L_n|=\ell_n\pm\vartheta^{n-1}(1-\vartheta)$ with $ \vartheta=\dfrac{k-\ell}2=\dfrac{2\ell}{\ell+k},\qquad \ell_n=\frac{(\ell+1)(\ell+k+2)}{k(\ell+k)}\left(\frac{\ell+k}2\right)^{n}. $ Since $\vartheta\in(0,1)$, for every $n\geqslant2$, the error term is at most $\pm\frac14$. One knows that $|L_n|$ is an integer, hence $|L_n|$ is the unique integer in the interval of length $\frac12$ centered at $\ell_n$.

  • 1
    @uli You might wish to read again: there is no "formula above" for $|L_0|$.2012-01-08
1

Let $a_n$ the number of words in $L_n$ that end in $ba$ for $a,b \in \Sigma, a\neq b$ and $b_n$ similarly the number of words in $L_n$ that end in $baa$. Obviously, $|L_n| = a_n + b_n$.

The following recursions hold (with $s = |\Sigma|$):

$\begin{align*}\qquad a_1 &= s \\ \qquad a_n &= (a_{n-1} + b_{n-1})\cdot(s-1) \qquad \text{ for } n > 1\end{align*}$

and

$\begin{align*}\qquad b_1 &= 0 \\ \qquad b_2 &= s \\ \qquad b_n &= (a_{n-2} + b_{n-2})\cdot(s-1) \qquad \text{ for } n > 2\end{align*}$

If you have trouble seeing that, imagine taking valid strings of length $n-1$ and $n-2$ and appending symbols up to length $n$ while maintaining validity. Now, with $c_n = a_n + b_n$ we get

$\begin{align*}\qquad c_1 &= s \\ c_2 &= s^2 \\ c_n &= (c_{n-2} + c_{n-1}) \cdot (s-1) \qquad \text{ for } n>2\end{align*}$

With the ansatz $C(z) = \sum_{n>0}c_nz^n$ you get that

$\displaystyle\qquad C(z) = \frac{sz(z + 1)}{(1-s)z^2 + (1-s)z + 1}$

Now $|L_n| = c_n = [z^n]C(z)$; unfortunately, the latter expression seems to be hard to find and/or ugly.

Fortunately, recursions of this kind can always be computed by using memoisation in time $\mathcal{O}(n)$ and space $\mathcal{O}(1)$:

calc(n : int, s : int ) {   c = new int[] {s, s*s}    if ( n <= 0 ) return 0   else if ( n == 1 ) return c[0]   else {     for ( i=3; i<=n; i++ ) {       ci = (c[0] + c[1]) * (s-1)       c[0] = c[1]; c[1] = ci     }     return c[1]   } } 

Note that this scheme also works if you can not find a nice recursion for $c_n$; just compute $a_n$ and $b_n$ simultaneously.

If you expect many calls to calc with different parameters, it might be worthwhile to store all values ever computed. Depending on your queries, this might result in amortised constant evaluation time for the price of as much memory as your largest query.

  • 0
    Hope it's correct now. The recursion for $c_n$ seems obvious now, even without the detour via $a_n, b_n$; I leave those in the answer for illustration, as this is how I proceeded to find the solution.2012-01-08
0

A string in $L_n$ contains $r$ doublets and $n-2r$ singlets, for some $r$ with $0 \le 2r \le n$. Let $S = |\Sigma|$. Then such a string has $S$ choices for its first character, and $S-1$ choices for each of the subsequent $n-r-1$ changes of character. Also, the number of possible positions of the doublets is $\binom{n-r}{r}$.

So the number of strings containing $r$ doublets is $\binom{n-r}{r}S(S-1)^{n-r-1}$. To get the total number of strings in $L_n$, just sum this over all valid $r$.

Edited to add: As uli points out in the comments, $n=0$ is a special case: $|L_0|$ is equal to $1$, not $S/(S-1)$.

  • 0
    @uli: You are absolutely right! I have edited my answer accordingly.2012-01-08