2
$\begingroup$

I want to find a function $f$ that given $x$ will predict $y$. The expected prediction error of $f$ is $e = E[(Y-f(X))^2]=\int \int [y-f(x)]^2 p(x,y) dx dy$ the expectation of $(Y-f(X))^2$ with respect to the joint distribution $p(X,Y)$. $X\in\mathbb R$, $Y\in\mathbb R$.

Others told me the optimal function that minimises $e$ is $f(x) = E[Y|X=x]$ the conditional expectation of $Y$ given $X=x$. Is that a right suggestion? This may be a simple proof to many people, but I don't know how to show/proof it, can someone help me?

1 Answers 1

3

Let $f(x) = \mathbb{E}[Y|X=x]$. Let $g(x)$ be another function. We shall now prove that $\mathbb{E}[\left(Y-g(X)\right)^2] \geq \mathbb{E}[\left(Y-f(X)\right)^2]$

$\mathbb{E}[\left(Y-g(X)\right)^2] = \mathbb{E}[\left(Y-f(X) + f(X) - g(X)\right)^2]$

$\mathbb{E}[\left(Y-g(X)\right)^2] = \mathbb{E}[\left(Y-f(X) \right)^2] + 2 \mathbb{E}[\left(Y-f(X) \right)\left(f(X) - g(X)\right)] + \mathbb{E}[\left(f(X) - g(X)\right)^2]$

Now note that $\mathbb{E}[\left(Y-f(X) \right)h(X)] = 0$ $\forall h(x) \in L^2$

The above can be obtained from the tower property or Law of total expectation as follows $\mathbb{E}[\left(Y-f(X) \right)h(X)] = \mathbb{E}[\mathbb{E}[\left(Y-f(X) \right)h(X) |X]] = \mathbb{E}[\mathbb{E}[\left(Y-f(X)|X] \right)h(X) ] = 0$

Hence, we get $\mathbb{E}[\left(Y-g(X)\right)^2] = \mathbb{E}[\left(Y-f(X) \right)^2] + \mathbb{E}[\left(f(X) - g(X)\right)^2]$

Hence, $\mathbb{E}[\left(Y-g(X)\right)^2] \geq \mathbb{E}[\left(Y-f(X) \right)^2]$

The above should suggest some sort of generalized Pythagoras theorem and this is true from the following generalized statement.

A generalized version of the above is, If we have a closed convex subset $A$ of a Hilbert space $\mathscr{H}$, then $\forall h \in \mathscr{H}$, $!\exists a \in A$ such that $||h-a|| < ||h-y||$, $\forall y \in A \backslash \{a\}$. Further if $A$ happens to be a closed subspace, then the element $a$ is the "projection" of $h$ onto the space $A$.

The proof for this version is identical to the proof written above.

  • 0
    @Dean: You do not need to use $p(x,y) = p(y|x) p(x)$ explicitly. All you need for this proof is to use the tower property or the law of total expectation http://en.wikipedia.org/wiki/Law_of_total_expectation2011-06-05