2
$\begingroup$

Whilst solving a question, I have come across a problem regarding the maximal number of possible distinct $k$-length binary sub-strings in an $n$-length binary string.

My thought process was that if you take some $n$-length binary string, then the number of possible sub-strings could be found as follows:

$\sum_{k=1}^{n}{n-k+1}=n^{2}-\frac{n(n+1)}{2}+n=\frac{n^{2}+n}{2}$

But this doesn't take into account the maximum possible number of distinct sub-strings, which will be $2^{k}$, when we're looking for sub-strings of length $k$.

Does anyone have any suggestions as to how I could find the maximal number of distinct sub-strings (of length $k\le n$) contained within some binary string of length $n$?

Thanks in advance!

  • 0
    Calculate it for some small values of $n$, then look it up in the Online Encyclopedia of Integer Sequences.2012-06-16

2 Answers 2

1

I took my own advice: it's "Maximal number of distinct nonempty substrings of any binary string of length n." It seems that there is a question about this sequence that is very simple to state but is still open.

  • 1
    That's my interpretation of what it says at the OEIS.2012-06-16
1

"On the maximum number of distinct factors of a binary string" (also here), by Jeffrey Shallit, proves that the answer (the attained maximum number of distinct factors of a length-$n$ binary string) is $\binom{n-k+1}{2}+2^{k+1}-1,$ where $k$ is the unique integer such that $2^k + k - 1\ \le\ n\ \lt\ 2^{k+1}+(k+1)-1.$ (This counts the empty string as a factor of every string.)


NB: A "closed form" for the answer can be found by noting that $k$ is just $\lfloor x \rfloor$, where $x$ is the real number such that $2^x + x - 1=n.$ Now, the substitutions $\begin{align} X &:= \log(2)\ (n-x+1)\\ Y &:= \log(2)\ 2^{n+1} \end{align} $ transform this equation into $X\ e^X = Y$ whose solution is $X = W(Y),$ where $W$ is the (principal branch of) the Lambert $W$ function.

Therefore, $\log(2)\ (n-x+1) = W(\log(2)\ 2^{n+1})$ giving
$k = \left\lfloor n+1-\frac{W(\log(2)\ 2^{n+1})}{\log(2)}\right\rfloor.$


NB: Some comments at the OEIS link refer to the above results as "conjectures", in spite of the proof given in the cited paper.