1
$\begingroup$

We have the following balanced brackets permutations of length $4\cdot 2$ in lexicographical order:

1.  (((()))) 2.  ((()())) 3.  ((())()) 4.  ((()))() 5.  (()(())) 6.  (()()()) 7.  (()())() 8.  (())(()) 9.  (())()() 10. ()((())) 11. ()(()()) 12. ()(())() 13. ()()(()) 14. ()()()() 

I tried an algorithm by Frank Ruskey found by user MJD in Knuth's "The art of computer programming, volume 4A". However, it prints the $n$-th string from the end so I'd like to ask for a hint: how can I reverse the algorithm?

I know that I could just replace the argument $N$ with $C(n)-N$ but as $n<5000$ I'd have to use bignums to write the program and I believe it's unnecessary. Moreover, in that case I'd waste time for calculating $C(n)$ and Mr Ruskey invented the algorithm so that it wouldn't be necessary to calculate $C(x)$ every time as it's calculated at the beginning and modified in progress.

Here's an explanation of this algorithm. Note that there are $C(n)$ such strings (where $C(n)$ is the $n$-th Catalan number).

  • 0
    I don't know if that's few but I have to calculate up to $10^{18}$th string. So bignums are not necessary, but I don't know how to calculate the number of () at the end or the beginning.2012-09-01

1 Answers 1

3

After some study of the material from Ruskey’s thesis, I was able to come up with the following algorithm more or less in the style of Knuth’s version. I don’t guarantee its correctness; minor programming errors are certainly possible. I have, however, included an example on which it works correctly. I haven’t the facilities to code the algorithm and test it more extensively, so I’d appreciate feedback if you try it.

  1. Set $p\leftarrow 0,c\leftarrow 1$. While $p, set $p\leftarrow p+1,c\leftarrow\dfrac{2(2p-1)}{p+1}c$.

  2. Set $q\leftarrow n,c\leftarrow\dfrac{6(p+1)(2p+1)}{(p+2)(p+3)}c$.

  3. Set $a_1=\text{'('},m\leftarrow 2$.

  4. If $m>2n$, stop.

  5. If $p<0$, set $a_m\leftarrow\text{')'},m\leftarrow m+1$ and return to step (4).

  6. If $N\le c$, set $a_m\leftarrow \text{'('},m\leftarrow m+1,p\leftarrow p-1,c\leftarrow\dfrac{(p+1)(q-p+1)}{(q+p+1)(q-p)}c$, and return to step (4).

  7. Set $a_m\leftarrow \text{')'},m\leftarrow m+1,N\leftarrow N-c,q\leftarrow q-1,c\leftarrow \dfrac{(q-p+1)(q+2)}{(q+p+1)(q-p+2)}c$. If $q-p<2$, set $a_m\leftarrow\text{'('},m\leftarrow m+1,p\leftarrow p-1,c\leftarrow\dfrac{(p+1)(q-p+1)}{(q+p+1)(q-p)}c$, and return to step (4).

Example: Take $n=4,N=6$.

$\begin{array}{c} N&m&a&p&q&c\\ \hline 6&&&0&&1\\ 6&&&1&&1\\ 6&&&2&&2\\ 6&&&2&4&9\\ 6&2&\text{(}&2&4&9\\ 6&3&\text{(}&1&4&4\\ 2&4&\text{)}&1&3&3\\ 2&5&\text{(}&0&3&1\\ 1&6&\text{)}&0&2&1\\ 1&7&\text{(}&-1&2&0\\ 1&8&\text{)}&-1&2&0\\ 1&9&\text{)}&-1&2&0 \end{array}$

The output string is $\text{(()()())}$, exactly as it should be.

Added: Here’s a brief explanation of the algorithm. Let $C(q,p)$ be the number of Dyck paths (allowing only steps down or to the left and not crossing the diagonal) from $\langle q,p\rangle$ to the origin; $C(n,n)$ is simply $C_n$, the $n$-th Catalan number. A balanced string of $2n$ parentheses corresponds to a Dyck path from $\langle n,n\rangle$ to the origin: each '(' correponds to a down-step and each ')' to a left-step. When such a path reaches $\langle q,p\rangle$, there are still $C(q,p)$ ways to complete the string. Here’s a table of $C(q,p)$ for $0\le p\le q\le 4$, since I’ll use the case $n=4$ for illustration.

$\begin{array}{c|rr} p\backslash q&0&1&2&3&4\\ \hline 4&&&&&14\\ 3&&&&5&14\\ 2&&&2&5&9\\ 1&&1&2&3&4\\ 0&1&1&1&1&1 \end{array}$

We start at $\langle 4,4\rangle$, with $14$ possible strings. The first symbol must be '(', so the first step must be a downstep, to $\langle 4,3\rangle$; of course any of the $14$ strings is still possible, and indeed $C(4,3)=14$. Now, though, there are two possibilities. If the second symbol is also '(', there are still $C(4,2)=9$ ways to complete the string, but if it’s ')', there are only $C(3,2)=5$. There’s an important difference, though. The $5$ strings that are eliminated when the second symbol is '(' are all later in the lexicographic order than any string that is still possible. The $9$ strings that are eliminated when the second symbol is ')', however, precede all of the strings that remain possible. Thus, if we’re trying to construct one of the first $9$ strings, we cannot have ')' as the second symbol and cannot take a left-step to $\langle 3,2\rangle$.

What we’d really like is a table showing how many earlier possibilities are eliminated at each left-step, i.e., when we generate a ')'. For the case $n=4$ we can simply build the table from the one above:

$\begin{array}{c|rr} p\backslash q&0&1&2&3&4\\ \hline 4&&&&&\cdot\\ 3&&&&9&\cdot\\ 2&&&3&4&\cdot\\ 1&&1&1&1&\cdot\\ 0&0&0&0&0&\cdot \end{array}$

It’s immediately apparent that this is the lower righthand corner of the previous table, with an extra row of zeroes at the bottom. If I denote these numbers by $e(q,p)=C(q+1,p)-C(q,p)$, it appears that $e(q,p)=C(q+1,p-1)$.

This is not hard to prove. It’s clear from the definition that $C(q,p)=C(q,p-1)+C(q-1,p)$ whenever $q,p>0$, so in that case we have

$e(q,p)=C(q+1,p)-C(q,p)=\Big(C(q+1,p-1)+C(q,p)\Big)-C(q,p)=C(q+1,p-1)\;.$

If you want one of the first $9$ strings, you can’t afford to eliminate the first $9$, so you can’t take the left-step from $\langle 4,3\rangle$ to $\langle 3,3\rangle$. If you want the $11$-th string, you must get past the first $9$, so you must take that left-step. At that point you’re on the diagonal, so you can’t take a left-step; you have to take a down-step. Now you’re at $\langle 3,2\rangle$, and the $e$-table indicates that a left-step will eliminate another $3$ strings. That’s too many $-$ you want the $11$-string, not the $13$-th or $14$-th $-$ so you take another down-step, to $\langle 3,1\rangle$. Now a left-step eliminates only one string, for a total of $9+1=10$ strings eliminated, so you take it and land at $\langle 2,1\rangle$. You have now eliminated (or gone past) $10$ strings, so you’re at the $11$-th string and must not eliminate any more. That forces you to take the down-step to $\langle 2,0\rangle$, and at that point you just run out the strong with closing parentheses until you have $2n$ symbols.

I’ve used $q$ and $p$ here exactly as they are used in my algorithm. At step (2) $c$ is initialized to $C(n,n-2)=e(n-1,n-1)$. Step (3) generates the leading '('. Step (5) tests the current partial string to see whether it’s balanced; if it is, the next symbol must be '('. Step (6) handles the case in which a left-step would eliminate too many strings, so I must take a down-step by making the next symbol a '('; $p$ is decreased by $1$ to take the down-step, and $C(q,p)$ is updated accordingly. Step (7) handles the other possibility, that I need to take a left-step in order to reach my desired string; it generates the associated ')', decreases $q$ by one to take the left-step, updates $C(q,p)$, and subtracts the number of strings eliminated from my target number $N$, so that I know how many remain to be eliminated.

The formulas for updating $C(q,p)$ are essentially equations $(9)$ and $(10)$ from page 19 of Ruskey’s thesis.

  • 0
    @user39042: Okay, I’ve added some explanation. You’ll probably have to play with it a bit, but I think that I’ve said enough to make it possible to see how the algorithm works. Let me know if you have any questions.2012-09-05