4
$\begingroup$

Consider the following algorithm:

function Rand():
    return a uniformly random real between 0.0 and 1.0

function Sieve(n):

    assert(n >= 2)

    for i = 2 to n
        X[i] = true

    for i = 2 to n
        if (X[i])
            for j = i+1 to n
                if (Rand() < 1/i)
                    X[j] = false

    return X[n]

What is the probability that Sieve(k) returns true as a function of k ? What is the limit of this probability as k goes to infinity (if it has one) ?

  • 1
    Do you have any guesses? Have you tried to run the algorithm and empirically determine the answers?2012-07-15
  • 1
    http://stackoverflow.com/q/11492811/11314672012-07-15
  • 0
    user: This does not answer @Yuval's query, does it?2012-07-15
  • 0
    There is an empirical estimate in the answer on the link, and also a DP solution. I'm after a closed form and limit.2012-07-16

1 Answers 1

5

This is difficult to do exactly because the probabilities are all correlated: a small number of small "primes" will lead to more big "primes", and vice versa. The answer at stackoverflow ignores these correlations, yet gets quite close to the value for $k=1000$ ($0.1227$ instead of $0.1199\pm0.0003$ as determined by computer simulation), so it seems it's a reasonable approximation to model the probabilities as independent.

Doing so actually yields a much simpler recursion than the one used in the stackoverflow answer: $k+1$ has exactly the same chances of getting marked "composite" as $k$, and if $k$ is "prime" it has an additional chance during the $k$-th iteration, with probability $1/k$. Thus the probabilities $p_k$ of $k$ being "prime" satisfy the recursion relation

$$p_{k+1}=p_k\left(1-\frac{p_k}k\right)\;.$$

We can rewrite this as

$$p_{k+1}-p_k=-\frac{p_k^2}k$$

and approximate it by a differential equation:

$$p'=-\frac{p^2}k\;,$$

whose general solution is

$$p=\frac1{c + \log k}$$

with an arbitrary constant $c$, which is influenced by the approximation errors for small $k$. From $p_{1000}\approx0.12$ we get $c\approx1.43$, so a good approximation is given by

$$p_k\approx\frac1{1.43+\log k}\;.$$

Here's a plot of the values up to $p_{1000}$, which shows rather satisfactory agreement (red: model, green: simulations):

plot showing agreeement between model and simulations

Thus, the probabilistic sieve asymptotically behaves like the actual prime sieve.