0
$\begingroup$

Let $\left\{{y (n) , n = 1, 2, · · ·, N }\right\}$ be a real random sequence that represents $N$ observations of an unknown real random variable $x$.

In the expression $\sigma^2=E[(x-\hat{x})^2]$ How will be able to start demonstrate that $\hat{x}=E(x,y)=\int_{-\infty}^{\infty}\alpha\cdot p_{x|y}(\alpha)d\alpha?$

  • 0
    $\hat{x}$ is a estimator and $y$ is a sequence that represents $N$ observations of an unknown real random variable $x$2012-11-16

1 Answers 1

1

If I got the question right, the idea is to show that $\hat{x}:=E[X|Y]$ is a mean-square estimator of $X$. Before trying to give a rough answer, I think it can be useful to consider the geometrical interpretation of the conditional expectation as the orthogonal projection in $\mathcal L^2$. In an intuitive way, it corresponds to the projection of the random variable $X$ onto the "information" carried by Y (actually its natural sigma-algebra $\sigma(Y)$).

Just as a mean of comparison, take the deterministic setup in $\mathbb R^2$ and the question of finding the best "estimator" $(\hat{x},\hat{y})$ on a line $y=ax+b$ of a certain point $(x^*,y^*)$ in the mean-square sense. It amounts to solving $ \hat x \quad\in\quad \arg\min_{x\in\mathbb R} (y^*-ax-b)^2+(x^*-x)^2 $ and the solution is obviously given by the orthogonal projection of the point on the line (for the sake of completeness, $\hat{x}=(ay^*-ab+x^*)/(a^2+1)$ and $\hat y=a\hat x+b$ if I didn't mess it all up.)

Right, that being said, the mean-square risk for an estimator $\hat{X}$ based on the outcome of a random variable $Y$ is your $\sigma^2$. Now note that $E[X^2-2X\hat X + \hat{X}\,^2] \quad = \quad E[X^2-2E[X|Y]\hat{X}-\hat X\,^2]$ using the tower-property ( * ), i.e. in this case: $E[E[X|Y]]=E[X]$, and the fact that $\hat X$ is $\sigma(Y)$-measurable, i.e. in this case that $E[X\hat{X}|Y]=\hat{X}E[X|Y]$ a.s. ( ** )

Let's rewrite this as follows: $ E[(X-E[X|Y])^2] + E[(E[X|Y]-\hat X)^2]. $ Now, both terms are non-negative and we can't act on the first one but we can definitely act on the second (in that we can still choose our estimator). Clearly, $\hat{X}=E[X|Y]$ is a minimizer since it makes the second term exactly zero.

Note also the underlying "philosophy": the best estimation you can do of the future value of a random variable based on past observations is the expected value of that random variable given these observations.

( * ) if you don't know this property or are unfamiliar with measure theoretic probability, an intuitive way of looking at it is as follows: take the expected value of something (here $X$) given some partial information (here $Y$ or more precisely $\sigma(Y)$) and take the expected value again given all the information (no conditioning). Then the inner conditioning is "useless" since you consider the same thing with more information just after.

( ** ) again, this is just saying that, since you built $\hat{X}$ out of the information carried by $Y$, then the expected value of $\hat{X}$ given $Y$ is precisely $\hat{X}$. Putting it differently, it just says that $\hat{X}$ is constant (not random) with respect to $Y$. (By the way, this is known as the "taking out what is known" property...)

  • 0
    very very thanks for this answer clarified many doubts.2012-11-17