30
$\begingroup$

Let $H_n$ denote the $n$th harmonic number; i.e., $H_n = \sum\limits_{i=1}^n \frac{1}{i}$. I've got a couple of proofs of the following limiting expression, which I don't think is that well-known: $\lim_{n \to \infty} \left(H_n - \frac{1}{2^n} \sum_{k=1}^n \binom{n}{k} H_k \right) = \log 2.$ I'm curious about other ways to prove this expression, and so I thought I would ask here to see if anybody knows any or can think of any. I would particularly like to see a combinatorial proof, but that might be difficult given that we're taking a limit and we have a transcendental number on one side. I'd like to see any proofs, though. I'll hold off from posting my own for a day or two to give others a chance to respond first.

(The probability tag is included because the expression whose limit is being taken can also be interpreted probabilistically.)


(Added: I've accepted Srivatsan's first answer, and I've posted my two proofs for those who are interested in seeing them.

Also, the sort of inverse question may be of interest. Suppose we have a function $f(n)$ such that $\lim_{n \to \infty} \left(f(n) - \frac{1}{2^n} \sum_{k=0}^n \binom{n}{k} f(k) \right) = L,$ where $L$ is finite and nonzero. What can we say about $f(n)$? This question was asked and answered a while back; it turns out that $f(n)$ must be $\Theta (\log n)$. More specifically, we must have $\frac{f(n)}{\log_2 n} \to L$ as $n \to \infty$.)

  • 0
    Thanks to all for the answers! I got what I wanted by asking this question; namely, a deeper understanding of why this limiting result holds. I'm accepting Srivatsan's first answer because I like how it formalizes the intuition that the limit should be $\log 2$ - as the summation should be close to $\log n/2$.2011-09-14

6 Answers 6

17

I made an quick estimate in my comment. The basic idea is that the binomial distribution $2^{−n} \binom{n}{k}$ is concentrated around $k= \frac{n}{2}$. Simply plugging this value in the limit expression, we get $H_n−H_{n/2} \sim \ln 2$ for large $n$. Fortunately, formalizing the intuition isn't that hard.

Call the giant sum $S$. Notice that $S$ can be written as $\newcommand{\E}{\mathbf{E}}$ $ \sum_{k=0}^{\infty} \frac{1}{2^{n}} \binom{n}{k} (H(n) - H(k)) = \sum_{k=0}^{\infty} \Pr[X = k](H(n) - H(k)) = \E \left[ H(n) - H(X) \right], $ where $X$ is distributed according to the binomial distribution $\mathrm{Bin}(n, \frac12)$. We need the following two facts about $X$:

  • With probability $1$, $0 \leqslant H(n) - H(X) \leqslant H(n) = O(\ln n)$.
  • From the Bernstein inequality, for any $\varepsilon \gt 0$, we know that $X$ lies in the range $\frac{1}{2}n (1\pm \varepsilon)$, except with probability at most $e^{- \Omega(n \varepsilon^2) }$.

Since the function $x \mapsto H(n) - H(x)$ is monotone decreasing, we have $ S \leqslant \color{Red}{H(n)} \color{Blue}{-H\left( \frac{n(1-\varepsilon)}{2} \right)} + \color{Green}{\exp (-\Omega(n \varepsilon^2)) \cdot O(\ln n)}. $ Plugging in the standard estimate $H(n) = \ln n + \gamma + O\Big(\frac1n \Big)$ for the harmonic sum, we get: $ \begin{align*} S &\leqslant \color{Red}{\ln n + \gamma + O \Big(\frac1n \Big)} \color{Blue}{- \ln \left(\frac{n(1-\varepsilon)}{2} \right) - \gamma + O \Big(\frac1n \Big)} +\color{Green}{\exp (-\Omega(n \varepsilon^2)) \cdot O(\ln n)} \\ &\leqslant \ln 2 - \ln (1- \varepsilon) + o_{n \to \infty}(1) \leqslant \ln 2 + O(\varepsilon) + o_{n \to \infty}(1). \tag{1} \end{align*} $

An analogous argument gets the lower bound $ S \geqslant \ln 2 - \ln (1+\varepsilon) - o_{n \to \infty}(1) \geqslant \ln 2 - O(\varepsilon) - o_{n \to \infty}(1). \tag{2} $ Since the estimates $(1)$ and $(2)$ hold for all $\varepsilon > 0$, it follows that $S \to \ln 2$ as $n \to \infty$.

  • 0
    Nice. Thanks for the analysis-style argument.2011-09-11
16

Here's a different proof. We will simplify the second term as follows: $ \begin{eqnarray*} \frac{1}{2^n} \sum\limits_{k=0}^n \left[ \binom{n}{k} \sum\limits_{t=1}^{k} \frac{1}{t} \right] &=& \frac{1}{2^n} \sum\limits_{k=0}^n \left[ \binom{n}{k} \sum\limits_{t=1}^{k} \int_{0}^1 x^{t-1} dx \right] \\ &=& \frac{1}{2^n} \int_{0}^1 \sum\limits_{k=0}^n \left[ \binom{n}{k} \sum\limits_{t=1}^{k} x^{t-1} \right] dx \\ &=& \frac{1}{2^n} \int_{0}^1 \sum\limits_{k=0}^n \left[ \binom{n}{k} \cdot \frac{x^k-1}{x-1} \right] dx \\ &=& \frac{1}{2^n} \int_{0}^1 \frac{\sum\limits_{k=0}^n \binom{n}{k} x^k- \sum\limits_{k=0}^n \binom{n}{k}}{x-1} dx \\ &=& \frac{1}{2^n} \int_{0}^1 \frac{(x+1)^n- 2^n}{x-1} dx. \end{eqnarray*} $

Make the substitution $y = \frac{x+1}{2}$, so the new limits are now $1/2$ and $1$. The integral then changes to: $ \begin{eqnarray*} \int_{1/2}^1 \frac{y^n- 1}{y-1} dy &=& \int_{1/2}^1 (1+y+y^2+\ldots+y^{n-1}) dy \\ &=& \left. y + \frac{y^2}{2} + \frac{y^3}{3} + \ldots + \frac{y^n}{n} \right|_{1/2}^1 \\ &=& H_n - \sum_{i=1}^n \frac{1}{i} \left(\frac{1}{2} \right)^i. \end{eqnarray*} $ Notice that conveniently $H_n$ is the first term in our function. Rearranging, the expression under the limit is equal to: $ \sum_{i=1}^n \frac{1}{i} \left(\frac{1}{2} \right)^i. $ The final step is to note that this is just the $n$th partial sum of the Taylor series expansion of $f(y) = -\ln(1-y)$ at $y=1/2$. Therefore, as $n \to \infty$, this sequence approaches the value $-\ln \left(1-\frac{1}{2} \right) = \ln 2.$

ADDED: As Didier's comments hint, this proof also shows that the given sequence, call it $u_n$, is monotonoic and is hence always smaller than $\ln 2$. Moreover, we also have a tight error estimate: $ \frac{1}{n2^n} < \ln 2 - u_n < \frac{2}{n2^n}, \ \ \ \ (n \geq 1). $

  • 0
    @Srivatsan Interesting!2012-03-28
7

Here are my two proofs, in case anyone is interested.


Proof 1: The $n$th term is $\sum_{k=1}^n \frac{1}{k 2^k}$.

Let $\Delta f(k) = f(k+1) - f(k)$, i.e., the finite difference of $f(k)$.

Let $B(n) = \sum_{k=0}^n \binom{n}{k} f(k)$, $A(n) = \sum_{k=0}^n \binom{n}{k} \Delta f(k)$.

A few years ago I proved$^1$ the following relationship between $B(n)$ and $A(n)$:

$B(n) = 2^n \left(f(0) + \sum_{k=1}^n \frac{A(k-1)}{2^k}\right).$

Now, $\Delta H_n = \frac{1}{n+1}$. And we know that

$\sum_{k=0}^n \binom{n}{k} \frac{1}{k+1} = \frac{1}{n+1} \sum_{k=0}^n \binom{n+1}{k+1} = \frac{2^{n+1}-1}{n+1},$ using the fact that $\binom{n}{k} \frac{n+1}{k+1} = \binom{n+1}{k+1}$.

Thus the formula above for $B(n)$ yields

$\sum_{k=0}^n \binom{n}{k} H_k = 2^n \sum_{k=1}^n \frac{2^k-1}{k2^k} = 2^n\left(H_n - \sum_{k=1}^n \frac{1}{k2^k}\right).$

Rearranging, we get $H_n - \frac{1}{2^n} \sum_{k=0}^n \binom{n}{k} H_k = \sum_{k=1}^n \frac{1}{k 2^k}.$

Thus $\lim_{n \to \infty} \left(H_n - \frac{1}{2^n} \sum_{k=0}^n \binom{n}{k} H_k\right) = \sum_{k=1}^{\infty} \frac{1}{k 2^k} = \log 2,$ where the $\log 2$ is obtained, as in Srivatsan's second answer, by substituting $-1/2$ into the Maclaurin series for $\log(1+x)$.

(This is how I first came across the limit result; I was trying to find an expression for $\sum_{k=0}^n \binom{n}{k} H_k$.)

$^1$"Combinatorial Sums and Finite Differences," Discrete Mathematics, 307 (24): 3130-3146, 2007.


Proof 2: The probabilistic argument.

Let $X_1, X_2, \ldots, X_n$ be $n$ independent and identically distributed exponential random variables with rate parameter $\lambda$. The minimum of these $n$ random variables is known to have an exponential $n \lambda$ distribution. Let $X_{(k)}$ denote the $k$th smallest. Because of the memoryless property of the exponential distribution, $X_{(k+1)} - X_{(k)}$ has an exponential $(n-k) \lambda$ distribution. So $E[X_{(k+1)} - X_{(k)}] = \frac{1}{(n-k)\lambda}$. Let $Y_n = X_{(n)}$ (i.e., the largest). Thus $E[Y_n] = \sum_{k=0}^{n-1} E[(X_{(k+1)} - X_{(k)})= \sum_{k=1}^n \frac{1}{\lambda k} = \frac{H_n}{\lambda}.$

Now, consider $E[Y_n | Y_n > 1]$. Since $P(X_i < 1) = 1-e^{-\lambda}$, conditioning on the number of the $X_i$ with values less than $1$ yields $E[Y_n | Y_n > 1] = 1 + \sum_{k=0}^{n-1} \binom{n}{k} \left(1 - e^{-\lambda}\right)^k \left(e^{-\lambda}\right)^{n-k} \frac{H_{n-k}}{\lambda}$ $= 1 + \sum_{k=1}^n \binom{n}{k} \left(1 - e^{-\lambda}\right)^{n-k} \left(e^{-\lambda}\right)^k \frac{H_k}{\lambda}.$

But as $n \to \infty$, $E[Y_n] - E[Y_n|Y_n > 1] \to 0$, since $P(Y_n > 1) = 1 - e^{-\lambda n}$. Therefore, $\lim_{n \to \infty} \left(\frac{H_n}{\lambda} - \sum_{k=1}^n \binom{n}{k} \left(1 - e^{-\lambda}\right)^{n-k} \left(e^{-\lambda}\right)^k \frac{H_k}{\lambda}\right) = 1.$

Letting $\lambda = \log 2$ yields the result.

5

Looking at $2^{-n}\binom{n}{k}$ as a probability distribution with mean $\frac{n}{2}$ and standard deviation $\frac{\sqrt{n}}{2}$ it is pretty easy to see that, as a Riemann sum $ \begin{align} 2^{-n}\sum_{k=1}^n\binom{n}{k}H_k&\sim\sqrt{\frac{2}{\pi n}}\int_{-\infty}^\infty (\log(x)+\gamma)\;e^{-2(x-n/2)^2/n}\;\mathrm{d}x\\ &=\log(n/2)+\gamma+O\left(\frac{1}{\sqrt{n}}\right) \end{align} $ Since $H_n\sim\log(n)+\gamma$, we get the difference to be $\log(2)$.

Let me attack this in a different way (though the former still works).

Using the asymptotic expansion for $H_n$, we get for $k\le n$, $ H_n-H_k=-\log(k/n)+O\left(\frac{1}{k}\right)\tag{1} $ Since we will be dealing with $O\left(\frac{1}{k}\right)$, notice that $ \begin{align} 2^{-n}\sum_{k=1}^n\binom{n}{k}\frac{1}{k+1} &=2^{-n}\sum_{k=1}^n\binom{n+1}{k+1}\frac{1}{n+1}\\ &\le 2^{-n}\;2^{n+1}\frac{1}{n+1}\\ &=\frac{2}{n+1} \end{align} $ Therefore, $ 2^{-n}\sum_{k=1}^n\binom{n}{k}O\left(\frac{1}{k}\right)=O\left(\frac{1}{n}\right)\tag{2} $ The Central Limit Theorem says that for any continuous $f$ on $[0,1]$, $ \lim_{n\to\infty}\;2^{-n}\sum_{k=1}^n\binom{n}{k}f\left(\frac{k}{n}\right) = f\left(\frac{1}{2}\right)\tag{3} $ To justify $(3)$, pick $\epsilon>0$ and find $\delta$ so that $\left|x-\frac{1}{2}\right|<\delta\Rightarrow\left|f(x)-f(\frac{1}{2})\right|<\epsilon$. Because the mean of $2^{-n}\binom{n}{k}$ is $\frac{n}{2}$ and the variance is $\frac{n}{4}$, the Central Limit Theorem says that we can pick a $\sigma$ big enough so that $ \underset{|k-n/2|<\sigma\sqrt{n}/2}{2^{-n}\sum\binom{n}{k}}>1-\epsilon $ for all $n$. Therefore, we choose an $n$ large enough so that $\sigma\sqrt{n}/2. $ \begin{align} 2^{-n}\sum_{k=1}^n\binom{n}{k}\left|f\left(\frac{k}{n}\right)-f\left(\frac{1}{2}\right)\right| &\le \epsilon\;\max_{[0,n]}\left|f\left(\frac{k}{n}\right)-f\left(\frac{1}{2}\right)\right|\\ &+\max_{|k-n/2|

Finally, $ \begin{align} H_n-2^{-n}\sum_{k=1}^n\binom{n}{k}H_k &=2^{-n}\sum_{k=1}^n\binom{n}{k}(H_n-H_k)\\ &=2^{-n-1}\sum_{k=1}^n\binom{n+1}{k+1}\;2\frac{k+1}{n+1}(H_n-H_k)\\ &=2^{-n-1}\sum_{k=1}^n\binom{n+1}{k+1}\left(-2\frac{k+1}{n+1}\log\left(\frac{k+1}{n+1}\right)+O\left(\frac{1}{k}\right)\right)\\ &=-2^{-n-1}\sum_{k=1}^n\binom{n+1}{k+1}\left(-2\frac{k+1}{n+1}\log\left(\frac{k+1}{n+1}\right)\right)+O\left(\frac{1}{n}\right)\\ &\to-2\;\frac{1}{2}\log(\frac{1}{2})\\ &=\log(2) \end{align} $

  • 0
    @Didier: I proved a much stronger version of the lemma, where only $(x(1-x))^Nf$ needs to be in $L^1$ for some $N$, but the proof was a bit too long for posting here. However, I was able to port a part of the proof into the application of the lemma I did prove here. $x\log(x)\in C[0,1]$, so now the proof is correct.2011-09-17
4

$H_{n+1} - \frac{1}{2^n} \sum_{k=1}^n \binom{n}{k} H_k \quad $ is the expected value of $(\sum_{i=j}^{n+1} 1/i)$ where $j$ is the position of a random walk started at $j=1$ and with $j$ taking $n$ steps of +0 or +1 each with probability 1/2.

The limit is the same as for the original expression with $H_n$ and is equivalent to the statement that $j$ is (with probability approaching 1 as $n$ increases) concentrated in an interval of length $o(n)$ around $n/2$. This is much weaker than the Central Limit Theorem and would hold for fairly general random walks with average speed +1/2.

  • 0
    @Didier, do you mean why$o(n)$concentration is sufficient, or why$o(n)$concentration is true for the binomial distribution, or something else? (by the way, I am notified of comments under this answer automatically without the @ user ping, but I think you will not be notified without an @ message).2011-09-12
3

Just an idea. $H_n$ measures the average number of ascents in a permutation from $S_n$. The weighted sum measures the average number of ascents in a permutation from $S_n$, after each entry is deleted with probability $1/2$.

One might estimate the weighted sum this way, but this seems difficult, and anyhow the trick should be to compare the two quantities somehow. However, deletion of entries can have unexpected consequences (consider $1,n,2,3,4,\ldots,n-1$). Maybe looking at the Robinson-Schensted correspondence would help.