3
$\begingroup$

Let $A_k$ be a random variable which represents the number of distinct integers seen after sampling $k$ independently and uniformly at random from the range $1, \dots, n$. Let $B_k$ be a random variable which represents the number of distinct integers seen after sampling $k$ independently and uniformly at random from the range $1, \dots, n^2$.

We know that $E(A_k) = n(1-(1-1/n)^k)$ and $E(B_k) = n^2(1-(1-1/n^2)^k)$. We also know that for $k \ll n $ both grow almost linearly and hence almost identically in $k$ but for $k \geq n$ they behave completely differently.

What is the asymptotic growth of the expected value of the ratio of $A_k$ and $B_k$? That is I am looking for an asymptotic expression for $E(B_k/A_k)$ assuming $n$ is large and especially a formula that captures the difference between the small and large $k$ phenomena.

My intuition is that $E(B_k/A_k)$ looks like $E(k/A_k)$ when $k < n\log{n}$ and like $E(B_k/n)$ otherwise. For large $n$ we also have that $E(A_k) \sim n(1-e^{-k/n})$ and $E(B_k) \sim n^2(1-e^{-k/n^2})$ it seems.

  • 0
    @did, Yes. Any ideas for how to do this?2012-08-19

1 Answers 1

1

Here are some computations that sort of ended in a rather messy result. However, here it is written down anyway. Will have to see if I (or someone else) can bring it to a conclusion.

Let the distribution $\cal{D}(n,k)$ denote the number of distinct values when $k$ values are drawn from $\{1,\ldots,n\}$: i.e., $A_k\sim \cal{D}(n,k)$, while $B_k\sim\cal{D}(n^2,k)$.

As has already been pointed out in the comments, if $A_k$ and $B_k$ are assumed to be independently drawn, $\text{E}[B_k/A_k]=\text{E}[B_k]\cdot\text{E}[1/A_k]$. This may give different results than if they are dependent: e.g., if $B_k$ is the number of non-empty cells in a $n\times n$ board, while $A_k$ is the number of non-empty columns in the same board.

I will assume that $A_k$ and $B_k$ are independent. Thus, I will compute $\text{E}[X]$ and $\text{E}[1/X]$ for all $X\sim\cal{D}(n,k)$, and then plug the result into $\text{E}[B_k/A_k]=\text{E}[B_k]\cdot\text{E}[1/A_k]$.

Considering $X_{n,k}\sim\cal{D}(n,k)$, the likelihood that the $k$ values drawn consist of $k_1,\ldots,k_n$ of each value is $\binom{k}{k_1,\ldots,k_n}/n^k$. We can encode this distribution for different $k$ into a generating function $ F_n(x,t)=\sum_{k=0}^\infty \text{E}\left[x^{X_{n,k}}\right]\cdot\frac{t^k}{k!} =f(x,t/n)^n $ where $f(x,t)$ is the expression for $n=1$ $ f(x,t) =\sum_{k=0}^\infty \text{E}\left[x^{X_{1,k}}\right]\cdot\frac{t^k}{k!} =1+x\cdot(e^t-1). $ The $f(x,t)$ simply uses $t^k/k!$ to encode that $k$ values are drawn, and $x$ to indicate that $k>0$. The reason for the $t/n$ when expressing $F_n(x,t)=f(x,t/n)^n$ is to capture the likelihood $1/n^k$ of each combination, otherwise it would be counting the number of combinations instead.

If we plug in $x=1$, all we get is $F_n(1,t)=e^t=\sum_k 1\cdot t^k/k!$ where the $1$ represent the likelihood $1$: it's a nice safety check so ensure that this holds. We may obtain the expected value by differentiating in $x$: $ \sum_{k=0}^\infty\text{E}[X_{n,k}]\cdot\frac{t^k}{k!} =\frac{\partial F_n}{\partial x}(1,t) =n\cdot\left(e^t-e^{(1-1/n)t}\right) =\sum_{k=0}^\infty n\left(1-\left(1-\frac{1}{n}\right)^k\right) \cdot\frac{t^k}{k!} $ as was already noted by the poster.

Next, we wish to compute $\text{E}[1/X_{n,k}]$ in a similar way. To do this, we first exclude the case $k=0$, and use $F_n(x,t)-1$. Converting $x^m$ to $1/m$ can be done by first dividing by $x$to get $x^{m-1}$, and then integrate from $0$ to $1$: $ \sum_{k=0}^\infty\text{E}\left[\frac{1}{X_{n,k}}\right]\cdot\frac{t^k}{k!} =\int_0^1 \frac{F_n(x,t)-1}{x}\,dx. $ Maple did not give a closed form integral for this. However, after differentiating in $t$ first, it gave $ \sum_{k=0}^\infty\text{E}\left[\frac{1}{X_{n,k}}\right]\cdot\frac{t^{k-1}}{(k-1)!} =\int_0^1 \frac{\partial}{\partial t}\frac{F_n(x,t)-1}{x}\,dx =\frac{e^{t/n}}{n}\cdot\frac{e^t-1}{e^{t/n}-1} =\frac{e^{t/n}+\cdots+e^{nt/n}}{n} $ which makes $ \text{E}\left[\frac{1}{X_{n,k}}\right] =\frac{1}{n}\sum_{i=1}^n\left(\frac{i}{n}\right)^{k-1} =\frac{1^{k-1}+\cdots+n^{k-1}}{n^k}. $

At first glance, it seems the approximations $ \text{E}[X_{n,k}]\approx n\cdot\left(1-e^{-k/n}\right) $ and $ \text{E}\left[\frac{1}{X_{n,k}}\right] \approx\frac{1}{n\cdot\left(1-e^{-(k-1)/n}\right)} $ should do a fairly good job, but I'll have to check that more closely. This is close to what the poster suspected.