2
$\begingroup$

I am reviewing some probability for a class I am about to take, and I came up with this example to try to help my intuition.

Suppose I take the the space of all sequences of the two symbols $H,T$. Let $X_i$ be the function on this space that is 1 if the $i$th entry is $H$ and $0$ otherwise. Then I think that these can be considered as independent identically distributed random variables, so that the Strong Law of Large Numbers should say that $$ \lim \frac1n\sum_{i=1}^n X_i = \frac 12 a.e. $$

This implies that the set of sequences of $H,T$ which in the limit are not balanced have measure $0$. The measure here is the one so that the set of all sequences starting with $H$ has measure $\frac12$, the set of all sequences starting with $HH$ has measure $\frac14$, etc.

So my question is: Is this correct? Is there such a measure on this space of sequences? If so, is there a nice way to see the conclusion at the beginning of the paragraph above without quoting the strong law of large numbers?

2 Answers 2

3

The space you're talking about, the space of all one-way infinite binary sequences, is known as Cantor Space (especially within logic). I think of it as the set of all paths through the full infinite binary branching tree.

As Qiaochu said, the 'fair coin measure' can be thought of as a product measure. But I prefer to just recall that by Caratheodory's extension theorem, it suffices to define any set function $\mu$ on the algebra of basic clopens (that is, sets of the form $$[\sigma]:=\{A \in \{0,1\}^{\mathbb{N}} : A \text{ extends } \sigma \},$$ where $\sigma$ is some finite binary string) such that $\mu[\sigma] = \mu[\sigma 1] + \mu[\sigma 0]$ and $\mu[\varnothing] = 1$. ("$\varnothing$" denotes the empty finite binary string; all elements of $\{0,1\}^{\mathbb{N}}$ extend it.)

So to get the measure you want, you just define your $\mu$ such that $\mu[\sigma 0] = \mu[\sigma 1] = \mu[\sigma]/2$. Notice that in terms of the tree picture, you're just saying that no matter where you are in the tree, it costs the same amount of measure to go the the left as to the right (so my tree is growing upwards).

Notice that this measure (often called the Lebesgue measure because of what @Michael said in the comments) is just a special case of a Bernoulli measure, which corresponds to a coin with probability $p$ of heads and $1-p$ of tails (or if you prefer $p$ to go left in the tree and $1-p$ to go right).

There are other ways to see that almost every number must have a 'balanced' frequency of 0's and 1's, but none of them are much nicer than using the law of large numbers. For example, one could apply, as @Qiaochu suggested, the Borel-Cantelli lemma. To do so, you want to exploit something that happens infinitely often (because of the $\limsup$). If some $A \in \{0,1\}^{\mathbb{N}}$ does not have a balanced number of 1's and 0's, then frequency of 0's compared to the frequency of 1's is outside of the interval from $[1- 1/n, 1+1/n]$ for some $n$ infinitely often. I haven't checked the whether the resulting sets have measures whose total sum is finite (as is required by the B-C), but something like this should work.

One could also apply Birkhoff's Ergodic Theorem (which is just a generalization of the law of large numbers anyway) or the theory of Algorithmic Randomness to conclude that almost every normal has a balanced frequncy of 0's and 1'. In fact, you can conclude much more, that almost every real number is Normal in every base.

  • 0
    To show the strength of this last result you mention (almost every real number is normal in every base), one could add that (unless I am mistaken) no explicit example of a number normal in bases 2 and 10 (for example) is known.2012-08-22
  • 0
    @did What about [Chaitin's omega](http://en.wikipedia.org/wiki/Chaitin's_constant)? It's known to be normal in all bases.2012-08-22
  • 2
    @QuinnCulver: It's pretty hard to call a non-computable number "explicit".2012-08-22
  • 0
    Quinn: As @Nate said.2012-08-22
3

Yes, yes, and it depends on what you mean by "nice."

The measure you want is product measure on the product space $\{ 0, 1 \}^{\mathbb{N}}$, where $\{ 0, 1 \}$ is given the obvious probability measure. An alternate description of this measure is as Haar measure on the group $(\mathbb{Z}/2\mathbb{Z})^{\mathbb{N}}$. From either of these descriptions, it exists and has the properties you want it to.

You can get the conclusion you want without the SLLN by using the central limit theorem instead; alternately, you can compute the characteristic function of $\frac{1}{n} \sum_{i=1}^n X_i$ by hand, verify that it converges to the characteristic function of the constant r.v. with value $\frac{1}{2}$, then use Lévy continuity... (This only gets you convergence in distribution; see did's comments below.) I don't know a way to get this conclusion which doesn't require some level of technology.

Edit: Wikipedia claims that this result ought to follow from Borel-Cantelli, but I don't see it at the moment.

  • 1
    Yet another description is that it is Lebesgue measure on the Borel $\sigma$-algebra of $[0,1]$ with reals represented in binary form.2012-08-22
  • 1
    @Michael: this is subtle. The two measure spaces are not homeomorphic with their usual topologies and I do not think it is completely trivial to write down a bijection between them which is an isomorphism of $\sigma$-algebras and measure-preserving, or at least I have never attempted to do this.2012-08-22
  • 0
    $HTHTT\ldots\mapsto 0.10100\ldots$ is measure preserving and almost a bijection, there remains a countable set of eventually constant sequences that one can deal with pretty much arbitrarily. This is actually how Lebesgue measure on $[0,]$ is constructed in an introductory book by Adams and Guillemin.2012-08-22
  • 0
    @QiaochuYuan Recall the the isomorphism need only be "almost bijective" and "almost measure preserving".2012-08-22
  • 0
    @QiaochuYuan That it follows from Borel-Cantelli should be the same reason that every Martin-Löf random (MLR) is Solovay random, if that helps. The point is that one could translate that result, and the fact that that all MLRs are normal, into probabilistic language to see why it follows from B-C.2012-08-22
  • 0
    @QiaochuYuan I've added to my answer a strategy for using B-C.2012-08-22
  • 0
    @Quinn: yes, that was my idea too, but it seems messier than applying the central limit theorem.2012-08-22
  • 1
    *You can get the conclusion you want without the SLLN by using the central limit theorem instead*... Various forms of this statement appear regularly on the site. Each time, I wonder what it means: how can CLT (convergence in distribution) yield an almost sure convergence (SLLN)?2012-08-22
  • 1
    Borel-Cantelli is indeed the way to go (and not the least messy) since for every $x\gt\frac12\gt y$, the series $\sum\limits_nP(X_1+\cdots+X_n\gt xn)$ and $\sum\limits_nP(X_1+\cdots+X_n\lt yn)$ both converge (by the simplest exponential inequality one could think about).2012-08-22
  • 0
    @did: I haven't fleshed out the idea, but CLT concerns the convergence in distribution of $\frac{1}{\sqrt{n}} \text{stuff}$ whereas here we are concerned with the convergence a.e. of $\frac{1}{n} \text{stuff}$, and it seems plausible to me that CLT can give the stronger conclusion in light of dividing by the larger term. Can you elaborate on the Borel-Cantelli solution? What's the simplest exponential inequality one could think about?2012-08-22
  • 1
    CLT/SLLN: To see that a general argument *CLT hence SLLN* cannot work, assume that $Z_n$ is i.i.d. standard normal and that $Y_n=Z_n$ with probability $1-u_n$ and $Y_n=2^n$ with probability $u_n$, with every possible independence. Then $Y_n$ converges in distribution to a standard normal iff $u_n\to0$ but $Y_n/\sqrt{n}\to0$ almost surely iff $\sum u_n$ is finite (otherwise, $Y_n/\sqrt{n}$ diverges almost surely).2012-08-22
  • 0
    @did: aha. I admit I am not as comfortable with these different modes of convergence as I ought to be.2012-08-22
  • 1
    SLLN by exponential inequality for Bernoulli's: for every $t\geqslant0$, $P(S_n\gt xn)\leqslant\mathrm e^{-txn}E(\mathrm e^{tX})^n=\Lambda(t)^n$. When $t\to0^+$, $\Lambda(t)=1+tE(X)-tx+o(t)$ and $x\gt1/2=E(X)$ hence $\Lambda(t)\lt1$ for some $t\gt0$. By symmetry, $P(S_n\lt(1-x)n)=P(S_n\gt xn)$ hence both series $\sum\limits_nP(S_n\lt(1-x)n)$ and $\sum\limits_nP(S_n\gt xn)$ converge. By the easy part of Borel-Cantelli lemma, $\limsup\limits_nS_n/n\leqslant x$ almost surely and $\liminf\limits_nS_n/n\geqslant1-x$ almost surely, for each $x\gt1/2$, hence $\lim\limits_nS_n/n=1/2$ almost surely.2012-08-22
  • 0
    @did: I am not familiar with the very first inequality you cite. Is this a standard application of a general result or is it specific to sums of Bernoulli random variables? (Would you mind elaborating on this in a separate answer?)2012-08-23
  • 1
    In general: (1.) For every $t\gt0$, $[S_n\gt xn]=[Z\gt1]$ with $Z=\mathrm e^{-txn}\mathrm e^{tS_n}$. (2.) $P(Z\gt1)\leqslant E(Z)$ by Markov inequality, since $Z\gt0$ almost surely (recall that one simply integrates the almost sure inequality $\mathbf 1_{Z\gt1}\leqslant Z$). // In the case at hand, furthermore: $E(\mathrm e^{tS_n})=(E(\mathrm e^{tX}))^n$ by independence of $(X_k)_k$.2012-08-23
  • 0
    I liked the Wikipedia link... very interesting.2012-08-29