6
$\begingroup$

Let $F: D \rightarrow D$ be a random function on finite domain $D$ of size $n$. It is well-known that, from any $x \in D$, iterating $F$ on $x$ traces out a sequence of values $x, F(x), F(F(x)), \ldots$ that must eventually repeat (since $D$ is finite) and resembles a Greek "$\rho$". The expected number of total elements in this $\rho$ is $\sqrt{n\pi/2}$ with half being on the tail and half on the head. I've seen this stated in several places, but cannot find a proof.

Illustration: In order to clarify what I'm asking (since the comments make it clear that this might be useful), fix an integer $n$. Uniformly sample $a_0 \in [1,n]$ and make a vertex with label $a_0$. Do this again, obtaining a vertex labeled $a_1$ and make a directed edge from $a_0$ to $a_1$ (this does not preclude $a_0 = a_1$). The pigeon-hole principle tells us that eventually we will create a cycle in this digraph; we terminate the process when this occurs. Clearly the graph will look like the letter $\rho$: the "tail" is the part of the graph before we enter the cycle (the tail might have length zero, of course), and the "head" is the is part of the graph comprising the cycle.

Image taken from wikipedia

My question is this: what is the expected length of the tail and head, in terms of $n$. Asymptotically, the answer is known to be $\sqrt{n\pi/8}$, but I cannot find a derivation.

  • 0
    @Adrian Oh, I see you modified the link. Perfect.2011-03-28

2 Answers 2

4

Well, I've spent some time on this and I have (sort of) got it figured out.

This question has roots in the "birthday problem" where we ask how many people we must have in a group before the probability that at least two share a birthday exceeds $1/2$. The answer is 23. (This assuming that birthdays are uniform, which they're not. The non-uniformity only lowers the number of people required.)

With a little thought, we can see that the length of the $\rho$ as asked above is equivalent to a birthday-like question: how many random points do we need before we repeat ourselves. We can ask this in a few different forms: what $t \in [1,n]$ do we need before we exceed probability $p$, what is the expected number of points needed to see a collision, etc.

Let's focus on the question asked: expectation. The probability that there is no collision after $t$ trials is clearly $\frac{n}{n} \frac{n-1}{n} \frac{n-2}{n} \cdots \frac{n-t+1}{n} $

Now let $X$ be the R.V. for the number of trials until we repeat for the first time. Then the probability above is $\Pr[X > t]$ so $E[X]$ is $ \sum_{t \geq 0} \frac{n}{n} \frac{n-1}{n} \frac{n-2}{n} \cdots \frac{n-t+1}{n} $ (I'm using an alternative formulation of expectation here).

This is all well-and-good, except one doesn't get a very good sense of the answer from the sum. As mentioned above, this is claimed to be about $\sqrt{n\pi/2}$, but how?

It turns out that a sum similar to the above was studied by Ramanujan nearly 100 years ago; he called it the Q-Function.

$ Q(n) = \sum_{1 \leq t \leq n} \frac{n!}{(n-t)!n^t} $

Since the Q-Function starts at 1 and our desired sum starts at 0, our sum will be $1 + Q(n)$. Asymptotic analysis shows that

$ Q(n) \sim \sqrt{\frac{n \pi}{2}} + \frac{2}{3} $

The derivation I found for this is in "Analysis of Algorithms", R. Sedgewick and P. Flajolet, Addison-Wesley, 1996. It's quite involved. Interestingly, the $\sqrt{\frac{n \pi}{2}}$ term comes from evaluating the famous integral over $e^{-x^2/2}$.

To answer the final question: what is the length of the tail and head? When a repetition occurs, it is uniformly likely to collide with any point already established. (Imagine the waving of hands now) therefore on average it will land in the middle, putting half of the $\sqrt{\frac{n \pi}{2}}$ elements on the tail and the other half on the head. Therefore the expected length of the head and tail is $\sqrt{\frac{n \pi}{8}}.$

2

I'm also very interested in the answer to your question, so I wrote up a quick simulation in Python to "experimentally" arrive at some of these values. The script runs 100,000 simulations for $n=|D|\in\{1,\ldots,100\}$ and computes the average tail length, loop length, and total length, for each $n$. Here is the code:

from math import pi, sqrt from texttable import Texttable from random import randint  NUM_TRIALS = 100000 MAX_N = 100  def rand_map(n):   return [randint(0, n - 1) for x in range(n)]  def iterate_map(m):   k = m[0]   seen = []   while k not in seen:     seen.append(k)     k = m[k]   tail = seen[0:seen.index(k)]   loop = seen[seen.index(k):]   return [tail, loop]  table = Texttable() table.add_row(["n", "tail(n)", "loop(n)", "rho(n)", "sqrt(pi*n/2)"]) for n in range(1, MAX_N + 1):   avg_tail = 0   avg_loop = 0   avg_rho  = 0   for it in range(NUM_TRIALS):     rho = iterate_map(rand_map(n))     avg_tail = avg_tail + 1 + len(rho[0])     avg_loop = avg_loop + len(rho[1])     avg_rho  = avg_rho  + 1 + len(rho[0]) + len(rho[1])   table.add_row([n,       avg_tail/float(NUM_TRIALS),       avg_loop/float(NUM_TRIALS),       avg_rho/float(NUM_TRIALS),       sqrt(pi * n / 2.0)])   print "Done row %d" % n  print table.draw() 

And here is the output (I truncated a bunch of rows in the middle for the sake of brevity; just run it yourself if you want to say everything):

+-----+---------+---------+----------+---------------+ | n   | tail(n) | loop(n) | rho(n)   | sqrt(pi*n/2)  | +-----+---------+---------+----------+---------------+ | 1   | 1.0     | 1.0     | 2.0      | 1.25331413732 | +-----+---------+---------+----------+---------------+ | 2   | 1.0     | 1.24856 | 2.24856  | 1.77245385091 | +-----+---------+---------+----------+---------------+ | 3   | 1.0739  | 1.44635 | 2.52025  | 2.17080376367 | +-----+---------+---------+----------+---------------+ | 4   | 1.16473 | 1.61136 | 2.77609  | 2.50662827463 | +-----+---------+---------+----------+---------------+ | 5   | 1.25775 | 1.75732 | 3.01507  | 2.8024956082  | +-----+---------+---------+----------+---------------+ | 6   | 1.34817 | 1.87995 | 3.22812  | 3.06998012384 | +-----+---------+---------+----------+---------------+ | 7   | 1.44218 | 2.00315 | 3.44533  | 3.31595752198 | +-----+---------+---------+----------+---------------+ | 8   | 1.52348 | 2.13125 | 3.65473  | 3.54490770181 | +-----+---------+---------+----------+---------------+ | 9   | 1.61659 | 2.23206 | 3.84865  | 3.75994241195 | +-----+---------+---------+----------+---------------+ | 10  | 1.70321 | 2.32671 | 4.02992  | 3.96332729761 | +-----+---------+---------+----------+---------------+ |  .       .         .          .            .       | |  .       .         .          .            .       | |  .       .         .          .            .       | +-----+---------+---------+----------+---------------+ | 90  | 5.40747 | 6.28115 | 11.68862 | 11.8899818928 | +-----+---------+---------+----------+---------------+ | 91  | 5.41534 | 6.32827 | 11.74361 | 11.9558548728 | +-----+---------+---------+----------+---------------+ | 92  | 5.46854 | 6.32997 | 11.79851 | 12.0213668967 | +-----+---------+---------+----------+---------------+ | 93  | 5.5118  | 6.36461 | 11.87641 | 12.0865238341 | +-----+---------+---------+----------+---------------+ | 94  | 5.52473 | 6.41426 | 11.93899 | 12.151331397  | +-----+---------+---------+----------+---------------+ | 95  | 5.55828 | 6.46288 | 12.02116 | 12.2157951459 | +-----+---------+---------+----------+---------------+ | 96  | 5.58111 | 6.49214 | 12.07325 | 12.2799204954 | +-----+---------+---------+----------+---------------+ | 97  | 5.66136 | 6.50754 | 12.1689  | 12.3437127194 | +-----+---------+---------+----------+---------------+ | 98  | 5.66465 | 6.54559 | 12.21024 | 12.4071769563 | +-----+---------+---------+----------+---------------+ | 99  | 5.6971  | 6.58051 | 12.27761 | 12.4703182138 | +-----+---------+---------+----------+---------------+ | 100 | 5.734   | 6.5971  | 12.3311  | 12.5331413732 | +-----+---------+---------+----------+---------------+ 

We can see that the $\sqrt{\frac{\pi n}{2}}$ approximation is quite good. I don't see any obvious relation between the size of the tail and the loop; at the point $n=100$ they differ almost exactly by 1, but looking at their progression, this seems to be just a coincidence; the tail grows slightly slower than the loop all the time.

These numbers are unlikely to be a satisfying answer to your question, but they might be a useful starting point for someone who knows more math than I do to get some intuition about the problem.