Question: Does anyone have a precise mathematical proof of Durstenfeld's Algorithm?
The first $O(n)$ shuffle or random permutation generator was published by Richard Durstenfeld in 1964. This algorithm came to the notice of programmers because it was included in Knuth 1969, page 125, 2, as Algorithm P. A succinct statement of the algorithm is: \begin{equation} \text{for }\ k \leftarrow n,n-1,\ldots,2\ \text{ do } \begin{cases} \text{Choose at random a number}\ \ r \in [1,k] \\\ \text{Interchange }\pi[r] \text{ and }\pi[k],\\ \end{cases} \end{equation} where $\pi[1 \ldots n]$ is the array to be shuffled or randomly permuted.
This is a truly elegant algorithm, computationally and mathematically. Computationally it handles set operations by dividing the array $\pi$ into an unshuffled set $\pi[1,\ldots, k]$, and a shuffled set $\pi[k+1,\ldots, n]$. At each iteration $k$, it chooses a random element $\pi[r]$ from the unshuffled set $\pi[1,\ldots, k]$, and interchanges it with $\pi[k]$. This moves $\pi[r]$ into the new shuffled set $\pi[k,\ldots, n]$, where it remains, unmoved, for all subsequent iterations. This is done in $O(1)$ time and space. Thus, the time and space complexity of Durstenfeld's algorithm is $O(n)$, which is optimal.
Mathematically, the algorithm is elegant because it implicitly uses the well-known lemma that every permutation $\pi$ is a product of transpositions. At each iteration $k$, the algorithm chooses a random number $r_k \in [1,\ldots,k]$ and performs a transposition $(\pi[k], \pi[r_k])$. Thus the random permutation produced can be written as
$ \pi = (\pi[n],\pi[r_n])(\pi[n-1],\pi[r_{n-1}]) \cdots (\pi[k],\pi[r_k])\cdots (\pi[2],\pi[r_2]), $
where $(\pi[k],\pi[r_k])$ is the transposition or interchange of $\pi[k]$ and $\pi[r_k]$.
Matlab Implementation of Durstenfeld's Algorithm
function p = GRPdurG(p) % ------------------------------------------------------------- % Randomly permute the elements of p(1:n) using Durstenfeld's % Random Permutation Algorithm, CACM, Vol 7, No. 7, 1964. % See Knuth, Section 3.4.2, TAOCP, Vol 2, 3rd Ed. % Derek O'Connor, 8 Dec 2010. derekroconnor@eircom.net % % USE: n=10^7; p = 1:n; p = GRPdur(p); % ------------------------------------------------------------- n = length(p); for k = n:-1:2 r = 1+floor(rand*k); % random integer between 1 and k t = p(k); p(k) = p(r); % Swap(p(r),p(k)). p(r) = t; end % GRPdurG
A Combinatorial Conundrum
A random permutation of the integers $\{1,2, \ldots,10^7\}$ can be generated in a few seconds using the Durstenfeld Shuffle algorithm on a standard PC. The combinatorial space $\mathcal{C}$ in this case is set of all permutations of size $10^7$. The size of this sample space is \begin{equation*} |\mathcal{C}| = n! = (10^7)! \approx 10^{65,657,059} \end{equation*} This is a gigantic number and very far beyond limits of any known random number generator. That is, $|\mathcal{C}| \ggg |\mathcal{G}|$, the state space of the random number generator. For example, the period (size of state space) of the well-known Mersenne Twister RNG (MT) is about $10^{6000}$. Hence random permutation generators using MT must miss nearly all permutations in this space.
None-the-less, the Matlab function above produces random permutations of length $10^7$ in the correct statistical proportions, e.g, $1/e$ are derangements, $1/n$ are cycles, etc., yet it must miss most of the possible permutations. Hence the conundrum. The most important question to me is:
Do random permutation generators sample uniformly from the space of permutations?
assuming they use the best random number generator available. I suspect that this is a difficult mathematical problem.
See this Matlab discussion: What Does RANDPERM Miss?
Historical Note
I believe that Durstenfeld's Algorithm was incorrectly attributed to Fisher and Yates by Knuth, in (2) below. See here O'Connor Scribd Note on Fisher-Yates. The original Durstenfeld Algorithm is here: CACM Algorithm
References
- Durstenfeld, R. 1964: ACM Algorithm 235: Random Permutation, Communications of the ACM, Vol 7, No 7.
- Knuth, D. 1969, 1998: Seminumerical Algorithms 1st & 3rd Eds. The Art of Computer Programming Series, Volume 2.