3
$\begingroup$

Ideally, the distribution over the acceptable pairs would be close to uniform.

x and the pair are all positive integers

(This is for code, so I need a constructive solution)

Thanks!

  • 0
    Do you mean to draw uniformly from ordered pairs, or unordered pairs?2012-12-23

3 Answers 3

5

Dirichlet showed that the number of pairs of positive integers $(a,b)$ with $ab \le x$ is $x \log x + (2 \gamma - 1)x + O(\sqrt{x})$ where $\gamma$ is Euler's constant (see e.g. http://www.math.uiuc.edu/~hildebr/ant/main.pdf Theorem 2.20). Of those, $x + O(\sqrt{x})$ have $a,b \le \sqrt{x}$, while there are $\lfloor x/a \rfloor$ with a given value of $a$. So we can do the following.

  1. With probability $\dfrac{1}{\log x + (2 \gamma - 1)}$ choose $a$ and $b$ independently and uniformly from $\{1,2,\ldots,\lfloor \sqrt{x}\rfloor\}$.
  2. Otherwise, with probability $1/2$ we'll take $a < \sqrt{x}$ and $b > \sqrt{x}$ and with probability $1/2$ the reverse. These cases are symmetric, so WLOG suppose we want the first of these two cases.
  3. Now for each positive integer $a < \sqrt{x}$ there are $\lfloor{x/a}\rfloor - \lceil\sqrt{x}\rceil + 1$ possible $b$. The total for $a \le m$ is approximately $\sum_{a=1}^m \left( \dfrac{x}{a} - \sqrt{x} \right) = x \log(m) + x \gamma - m \sqrt{x}$ So we take a uniform random variate $u \in (0,1)$, solve (numerically) $x \log(t) + x \gamma - t \sqrt{x} = u (x \log(\sqrt{x}) + x \gamma - x)$ for $t \in (0,\sqrt{x})$, round the result to the nearest integer $a$, and then take $b$ to be a random integer from $\lceil \sqrt{x} \rceil$ to $\lfloor x/a \rfloor$.
1

You said close to uniform, so I proffer the following. Given $x$, the acceptable pairs $(a,b)$ are the lattice points in the first quadrant subject to the constraint $ab. I shall approximate their number by comparing areas under the curve $b=x/a$. Obviously the range of possible values of $a$ is from $1$ to $x$. Thus the total number of acceptable pairs $N(x)$ can be approximated by the sum $ N(x)\approx x+\frac{x}2+\frac{x}3+\cdots+\frac{x}{x}=x\sum_{a=1}^x\frac1a\approx x(\ln x-\gamma), $ where $\gamma\approx0.577$ is the Euler-Mascheroni constant. Similarly the number of acceptable pairs with $a\le A$, where $A$ is a parameter in the range $1\le A\le x$, is approximately $N(x,A)\approx x(\ln A-\gamma)$.

This suggests the following heuristic approach. The cdf of the random variable $a$ is approximately $ F(A)=Pr(a\le A)=\frac{\ln A-\gamma}{\ln x-\gamma}. $

The usual trick would be then to

  1. Generate a uniformly distributed random variable $t$ in the range $0\le t\le1$.
  2. Find the value of $a$ such that $F(a)\approx t$, i.e. let $a\approx e^{\gamma+t(\ln x-\gamma)}.$
  3. Generate a uniformly chosen random integer $b$ in the range $1\le b\le\lfloor x/a\rfloor$.

At this time this is a very crude description. We need to do a lot of critical thinking and testing to verify that the approximation errors inherent in this approach won't disturb the algorithm too much. When $x$ is relatively small, the Euler-Mascheroni approximation has an error that may disturb things. Also I'm not sure how to best do the `rounding' of the value of $a$ in step 2. Simple rounding or truncation may not be best.

1

Let's assume for simplicity that the size of $x$ is "feasible", i.e. we can afford to store an array of size $\approx \sqrt x$. For $i=1,\ldots \lfloor\sqrt x\rfloor$ let $a_i=\lfloor \frac xi\rfloor$, $b_i=2a_i-2i+1$, $s_i=\sum_{j\le i} b_j$. Then $s_{\lfloor\sqrt x\rfloor}$ is the number of valid points.

  1. Generate a uniform random integer $r$ in $[0,s_{\lfloor\sqrt x\rfloor}-1]$.
  2. Let $i=1$
  3. while $r>b_i$, let $r=r-b_i$, $i=i+1$.
  4. If $r+i\le a_i$, return $(i,a_i-r)$, else return $(2i+r-a_i,i)$

If the above is not feasible (e.g. $x$ is too big), the following rejection-method should be fine:

  1. generate a uniform random real $z\in[0,x(1+\ln x))$.
  2. If $z\le x$, return $(1,\lceil z\rceil)$.
  3. Otherwise let $r_1=\lceil e^{\frac zx-1}\rceil$, $r_2=\lceil z-x(1+\ln(r_1-1))\rceil$.
  4. If $r_1r_2\le x$ return $(r_1,r_2)$. otherwise go back to step 1.

This essentially generates a point in the area bounded by $0\le r_1\le x$, $0\le r_2\le x$ and $r_1r_2\le x$ and rejects points not in the corresponding unit squares. This could be improved (less rejecting) by treating the cases $r_1=2$ and $r_2=2$ specially (not just $r_1=1$, $r_2=1$ as the above does).