96
$\begingroup$

I'm struggling with the concept of conditional expectation. First of all, if you have a link to any explanation that goes beyond showing that it is a generalization of elementary intuitive concepts, please let me know.

Let me get more specific. Let $\left(\Omega,\mathcal{A},P\right)$ be a probability space and $X$ an integrable real random variable defined on $(\Omega,\mathcal{A},P)$. Let $\mathcal{F}$ be a sub-$\sigma$-algebra of $\mathcal{A}$. Then $E[X|\mathcal{F}]$ is the a.s. unique random variable $Y$ such that $Y$ is $\mathcal{F}$-measurable and for any $A\in\mathcal{F}$, $E\left[X1_A\right]=E\left[Y1_A\right]$.

The common interpretation seems to be: "$E[X|\mathcal{F}]$ is the expectation of $X$ given the information of $\mathcal{F}$." I'm finding it hard to get any meaning from this sentence.

  1. In elementary probability theory, expectation is a real number. So the sentence above makes me think of a real number instead of a random variable. This is reinforced by $E[X|\mathcal{F}]$ sometimes being called "conditional expected value". Is there some canonical way of getting real numbers out of $E[X|\mathcal{F}]$ that can be interpreted as elementary expected values of something?

  2. In what way does $\mathcal{F}$ provide information? To know that some event occurred, is something I would call information, and I have a clear picture of conditional expectation in this case. To me $\mathcal{F}$ is not a piece of information, but rather a "complete" set of pieces of information one could possibly acquire in some way.

Maybe you say there is no real intuition behind this, $E[X|\mathcal{F}]$ is just what the definition says it is. But then, how does one see that a martingale is a model of a fair game? Surely, there must be some intuition behind that!

I hope you have got some impression of my misconceptions and can rectify them.

5 Answers 5

107

Maybe this simple example will help. I use it when I teach conditional expectation.

(1) The first step is to think of ${\mathbb E}(X)$ in a new way: as the best estimate for the value of a random variable $X$ in the absence of any information. To minimize the squared error ${\mathbb E}[(X-e)^2]={\mathbb E}[X^2-2eX+e^2]={\mathbb E}(X^2)-2e{\mathbb E}(X)+e^2,$ we differentiate to obtain $2e-2{\mathbb E}(X)$, which is zero at $e={\mathbb E}(X)$.

For example, if I throw a fair die and you have to estimate its value $X$, according to the analysis above, your best bet is to guess ${\mathbb E}(X)=3.5$. On specific rolls of the die, this will be an over-estimate or an under-estimate, but in the long run it minimizes the mean square error.

(2) What happens if you do have additional information? Suppose that I tell you that $X$ is an even number. How should you modify your estimate to take this new information into account?

The mental process may go something like this: "Hmmm, the possible values were $\lbrace 1,2,3,4,5,6\rbrace$ but we have eliminated $1,3$ and $5$, so the remaining possibilities are $\lbrace 2,4,6\rbrace$. Since I have no other information, they should be considered equally likely and hence the revised expectation is $(2+4+6)/3=4$".

Similarly, if I were to tell you that $X$ is odd, your revised (conditional) expectation is 3.

(3) Now imagine that I will roll the die and I will tell you the parity of $X$; that is, I will tell you whether the die comes up odd or even. You should now see that a single numerical response cannot cover both cases. You would respond "3" if I tell you "$X$ is odd", while you would respond "4" if I tell you "$X$ is even". A single numerical response is not enough because the particular piece of information that I will give you is itself random. In fact, your response is necessarily a function of this particular piece of information. Mathematically, this is reflected in the requirement that ${\mathbb E}(X\ |\ {\cal F})$ must be $\cal F$ measurable.

I think this covers point 1 in your question, and tells you why a single real number is not sufficient. Also concerning point 2, you are correct in saying that the role of $\cal F$ in ${\mathbb E}(X\ |\ {\cal F})$ is not a single piece of information, but rather tells what possible specific pieces of (random) information may occur.

  • 0
    This goes well with @shai-covo 's answer below2018-12-02
39

I think a good way to answer question 2 is as follows.

I am performing an experiment, whose outcome can be described by an element $\omega$ of some set $\Omega$. I am not going to tell you the outcome, but I will allow you to ask certain questions yes/no questions about it. (This is like "20 questions", but infinite sequences of questions will be allowed, so it's really "$\aleph_0$ questions".) We can associate a yes/no question with the set $A \subset \Omega$ of outcomes for which the answer is "yes".

Now, one way to describe some collection of "information" is to consider all the questions which could be answered with that information. (For example, the 2010 Encyclopedia Britannica is a collection of information; it can answer the questions "Is the dodo extinct?" and "Is the elephant extinct?" but not the question "Did Justin Bieber win a 2011 Grammy?") This, then, would be a set $\mathcal{F} \subset 2^\Omega$.

If I know the answer to a question $A$, then I also know the answer to its negation, which corresponds to the set $A^c$ (e.g. "Is the dodo not-extinct?"). So any information that is enough to answer question $A$ is also enough to answer question $A^c$. Thus $\mathcal{F}$ should be closed under taking complements. Likewise, if I know the answer to questions $A,B$, I also know the answer to their disjunction $A \cup B$ ("Are either the dodo or the elephant extinct?"), so $\mathcal{F}$ must also be closed under (finite) unions. Countable unions require more of a stretch, but imagine asking an infinite sequence of questions "converging" on a final question. ("Can elephants live to be 90? Can they live to be 99? Can they live to be 99.9?" In the end, I know whether elephants can live to be 100.)

I think this gives some insight into why a $\sigma$-field can be thought of as a collection of information.

  • 0
    @LaconianThinker: Oh, now I see your point. Right. The issue is that you can't describe an algorithm to take the sequence of "yes" or "no" answers and reach a conclusion as to whether $\omega \in V$. (I guess "algorithm" here really means "Borel subset of $2^{\mathbb{N}}$" so perhaps I am being circular.)2018-09-25
12

An example. Suppose that $X \sim {\rm binomial}(m,p)$ and $Y \sim {\rm binomial}(n,p)$ are independent ($0 < p < 1$). For any integer $0 \leq s \leq m+n$, it holds $ {\rm E}[X|X + Y = s] = \frac{{m }}{{m + n }}s. $ This means that $ {\rm E}[X|X + Y] = \frac{{m }}{{m + n }}(X+Y). $ Note that ${\rm E}[X|X + Y]$ is a random variable which is a function of $X+Y$.

Note that, in general, the conditional expectation of $X$ given $Z$, denoted ${\rm E}[X|Z]$, is defined as ${\rm E}[X|\sigma(Z)]$, where $\sigma(Z)$ is the $\sigma$-algebra generated by $Z$.

EDIT. In response to the OP's request, I note that the binomial distribution (which is discrete) plays no special role in the above example. For completely analogous results for the normal and gamma distributions (both are continuous) see this and this, respectively; for a substantial generalization, see this.

  • 0
    @Shai:Thank you! You've been very helpful.2011-02-25
6

You can think of the conditional expectation as the orthogonal projection onto the closed subspace of $\mathcal F$-measurable random variables in the Hilbert space of square integrable random variables.

This is a detailed and elementary discussion of this viewpoint.

5

I happened to read an article on Wikipedia today on Conditional Expectation. That clarified a lot of my questions. Hope it helps!

  1. For your first question, in the linked article, there is the definition for conditional expectation of a r.v. $X: \Omega \rightarrow \mathbb{R}$ given a sub sigma algebra $\mathcal{F}$ of the one $\mathcal{A}$ on domain $\Omega$. It is a $\mathcal{F}$-measurable function $: \Omega \rightarrow \mathbb{R}$, denoted as $E(X \vert \mathcal{F})$. If you evaluate this conditional expectation at a point $\omega \in \Omega$, you will get a value $E(X \vert \mathcal{F})(\omega)$, which is called the conditional expectation of $X$ given $\mathcal{F}$ at $\omega$.

    When the r.v. $X$ is an indicator function on some measurable subset say $A \in \mathcal{A}$, its conditional expectation given the sub sigma algebra is called the conditional probability of the subset $A$ given the sub sigma algebra $\mathcal{F}$, denoted as $P( A \vert \mathcal{F})$. It is a mapping: $\Omega \rightarrow \mathbb{R}$.

    If we let $A$ vary within $\mathcal{A}$, the conditional probability $P( \cdot \vert \mathcal{F})$ is a mapping: $\mathcal{A} \times \Omega \rightarrow \mathbb{R}$. In some cases, $\forall \omega \in \Omega$, $P( \cdot \vert \mathcal{F})(\omega)$ is a probability measure on $(\Omega, \mathcal{F})$, in which case $P(\cdot \vert \mathcal{F})$ is called a regular conditional probability.

    When $\mathcal{F}$ is generated by another r.v. $Y$, then the conditional expectation and conditional probability will be called the ones given the r.v. $Y$.

  2. For your second question, I am still wondierng what kind of information a sigma algebra (of a r.v.) can provide, and how it is provided?