3
$\begingroup$

We are given positive integers $n$ and $M$, with $n \ll M$, and a sequence $(a_i)_{i=1}^n$ of distinct integers satisfying $0 \le a_i < M$. Let $k$ be the smallest positive integer such that all the $a_i$ are distinct $\mod k$.

Questions: With $n$ and $M$ fixed, what is the expected value of $k$? And what is its maximum possible value?

Motivation: The $a_i$ are breakpoint addresses in a computer program. The number of breakpoints $n$ will typically be less than 50, and the maximum address $M$ will typically be less than half a million. Given an address, I need to be able to find the breakpoint at that address very quickly. One solution is simply a table of pointers, with $M$ entries, but that is very wasteful of space $-$ most of the entries would be zero. So I would like to use a table of size $k$, if I can be sure of finding a suitable $k$ reasonably quickly.

Once $M$ is large enough, its actual order of magnitude won't affect the expected value very much. In other words, as $M \rightarrow \infty$ with $n$ fixed, the expected value of $k$ tends to a limit. But it will certainly affect the worst case. For instance, given any $m$, we can set $a_i = i \times m!$ to ensure that $k$ is greater than $m$.

I have run some simulations. With $n=50$, and $M=2^{30}$, a run of $100000$ trials gives a mean of about $307$, with a worst case of $k=557$. With $n=100$, the mean is about $972$ and the worst case is $1570$.

  • 0
    Ok, I'm starting a new answer (this seems slightly better than re-editing my existing answer).2011-11-28

3 Answers 3

0

Assuming that the $a_i$ are randomly distributed in ${0,...,k-1}$ and that for two different k this distributions are

independent then the problem to found the probability that all $a_i$ are different is the birthday problem. I

implemented in maxima the the probability

f(n,k):=combination(k,n)*n!/k^n 

that all $n$ numbers randomly chosen from ${0,...,k-1}$ (this needs the load(functs); before executiong this definition)

are different and also an approximation

p(n,k):=exp((-n^2)/(2*k)) 

of $f$.

I estimate the mean by

ll:1; m:0; n:50; L:800; for k:n thru L do (   q:float(f(n,k)),   m:m+ll*q*k,   ll:ll*(1-q) ); m; 

and get $305.2774133346098$. Using the approximation $g$ instead of $f$ I get the smaller value $293.0324415803322$. For n=100 I evaluate the for-loop until L:2000 to get $970.2315452158832$ (using f) and $944.4295352864864$ (using p) for

the mean.

For $n=1000$ I only eveluate the approximation until L:86000 and get $55595.16420043662$

To estimate the maximum I calculated

ll:1; m:0;   n:50; L:675; for k:n thru 675 do (   q:float(f(n,k)),   m:m+ll*q,   ll:ll*(1-q) ); m; 

This is the probability that $k$ is between $50$ and $675$. The result is $0.99999999999998$. So the probability that k is larger than $675$ is very small. For $n=100$ and $n=1000$ I calculated values about L:1900 and L:86000 for the upper bound containing most of the $k$.

2

Some thoughts:

Your example $a_i = im!$ gives a $k$ whose size is about $(\log(M/n))/\log\log(M/n)$, since that's how large one can choose $m$ before $nm!$ exceeds $M$. One can get a worse $k$ as well, by choosing $a_1=0$, $a_2$ to be the product of all primes up to about $\log M$, $a_3$ to be the product of all primes between $\log M$ and $2\log M$, and so on: then $k$ is at least $(n-1)\log M$ asymptotically. (This is a slight lie, as one needs to put the prime powers in there somewhere too. The asymptotic should still be correct.) Of course this doesn't even take advantage of primes dividing $a_3-a_2$ and so on.

When $M$ is much larger than a prime $p$, the probability that the $a_i$ are distinct modulo $p$ is essentially $(p/p)((p-1)/p)\cdots((p-n+1)/p) = p!/((p-n)!p^n)$; if $p=\lambda n$, then by Stirling's formula this is about $c(\lambda)^n$ where $c(\lambda) = (\lambda/(\lambda-1))^{\lambda-1}/e < 1$. This equals $1/2$ when $\lambda$ is about $n/(2\log 2)$, or $p\approx n^2/(2\log 2)$. But the question of finding even the median (never mind the expectation) of the least $p$ for which they are distinct modulo $p$ seems more daunting. (At least if $M$ is sufficiently large, one should be able to consider these modulo-$p$ events essentially independent.)

[EDITED TO ADD:] I was thinking so much about the asymptotics of the smallest such $k$ that I didn't realize I could at least write down an expression for the expectation of that $k$. Let $D(p) = p!/((p-n)!p^n)$ be the probability that the $a_i$ are distinct modulo $p$; we define $D(p) = 0$ if $p. These events aren't exactly independent: if they're not distinct modulo $p$, then they're slightly less likely to be distinct modulo $p^2$, and so on. If we define $E(p) = D(p)$ and, recursively, $E(p^k) = (D(p^k)-D(p^{k-1}))/(1-D(p^{k-1}))$, then $E(p^k)$ is the probability that they're distinct modulo $p^k$ conditional on the event that they're not distinct modulo $p^{k-1}$. Oi, I just realized that the formula for $E(n)$ for general composite $n$ is going to depend upon $E$ at its prime factors in a complicated way.... Then the expected value of the least $k$ modulo which the $a_i$ are distinct is given by the infinite sum: $ \displaystyle \sum_{q} \bigg( q E(q) \prod_{r

1

We choose $n$ integers independently uniformly at random from $\{1,2,\dots,M\}$ where $M$ is very large. (The original problem specified that they must be distinct integers, but that probably won't affect the final answer, so let's ignore it.) Let ${\mathbf C}(k)$ denote the event that the $n$ integers are not distinct modulo $k$. Define$ F(k) = \Pr\big( {\mathbf C}(k) \ \big|\ {\mathbf C}(1) \wedge \cdots \wedge {\mathbf C}(k-1) \big) $ and note that $1-F(k)$ is the probability that $k$ is the least integer modulo which the $n$ integers are distinct, conditional on knowing that they are not distinct modulo any of $1,2,\dots,k-1$. Note also that$ F(1) F(2) \cdots F(k-1) = \Pr \big( {\mathbf C}(1) \wedge \cdots \wedge {\mathbf C}(k-1) \big), $ as can be shown by expanding the definition of conditional probability and telescoping the resulting product. Therefore the expectation of the least $k$ for which the $n$ integers are distinct is given by the expression$\mu= \sum_{k=1}^\infty k \Pr \big( {\mathbf C}(1) \wedge \cdots \wedge {\mathbf C}(k-1) \big) \big( 1 - \Pr\big( {\mathbf C}(k) \ \big|\ {\mathbf C}(1) \wedge \cdots \wedge {\mathbf C}(k-1) \big) \big), $ or equivalently$\mu=\sum_{k=1}^\infty \bigg( k (1-F(k)) \prod_{1\le j\le k-1} F(j) \bigg). $ Therefore it suffices to be able to calculate $F(k)$. All this has been probability so far; now we need to input some number theory.

Define $c(k) = \Pr({\mathbf C}(k))$. It is not hard to show that$ c(k) = 1 - \frac kk \frac{k-1}k \cdots \frac{k-n+1}k = 1-\frac{k!}{k^k(k-n)!} $ (where the last expression must be interpreted appropriately when $k ). (Really we should say that $c(k)$ approaches this expression as $M$ goes to infinity.) Now define a function $f(k)$ as follows: $f(1)=1$; if $k=p^a$ is a power of a prime then $f(p^a) = c(p^a)/c(p^{a-1})$ (this includes $f(p)=c(p)$); otherwise, write $k=p_1^{a_1}\cdots p_r^{a_r}$ with $r\ge2$, in which case $f(k) = c(k)/\big( c(p_1^{a_1}) \cdots c(p_r^{a_r}) \big)$. I claim that $F(k) = f(k)$ (as $M$ goes to infinity).

Proving this statement would be the most technical part of the derivation, I think. [Intuitively, the reason this should be true is: when $k$ and $l$ are relatively prime, then the events ${\mathbf C}(k)$ and ${\mathbf C}(l)$ will turn out to be independent (up to a vanishingly small error as $M$ tends to infinity). So the only events ${\mathbf C}(l)$ that directly influence ${\mathbf C}(k)$ are when $l$ divides $k$. Furthermore, the event ${\mathbf C}(k)$ is a subset of ${\mathbf C}(l)$ whenever $l$ divides $k$. Therefore we only need to consider the maximal prime-power divisors of $k$, and these events are all independent.]

The definition of $f(k)$ can be coded easily, and when substituted into the second equation for $\mu$ above (replacing $F$ by $f$), it leads to the values I mentioned in my comments on the original post.