2
$\begingroup$

I want to find a function $f$ that given $x$ will predict $y$. The expected prediction error of $f$ is $$e = E[(Y-f(X))^2]=\int \int [y-f(x)]^2 p(x,y) dx dy$$ the expectation of $(Y-f(X))^2$ with respect to the joint distribution $p(X,Y)$. $X\in\mathbb R$, $Y\in\mathbb R$.

Others told me the optimal function that minimises $e$ is $$f(x) = E[Y|X=x]$$ the conditional expectation of $Y$ given $X=x$. Is that a right suggestion? This may be a simple proof to many people, but I don't know how to show/proof it, can someone help me?

1 Answers 1

3

Let $f(x) = \mathbb{E}[Y|X=x]$. Let $g(x)$ be another function. We shall now prove that $$\mathbb{E}[\left(Y-g(X)\right)^2] \geq \mathbb{E}[\left(Y-f(X)\right)^2]$$

$$\mathbb{E}[\left(Y-g(X)\right)^2] = \mathbb{E}[\left(Y-f(X) + f(X) - g(X)\right)^2]$$

$$\mathbb{E}[\left(Y-g(X)\right)^2] = \mathbb{E}[\left(Y-f(X) \right)^2] + 2 \mathbb{E}[\left(Y-f(X) \right)\left(f(X) - g(X)\right)] + \mathbb{E}[\left(f(X) - g(X)\right)^2]$$

Now note that $$\mathbb{E}[\left(Y-f(X) \right)h(X)] = 0$$ $\forall h(x) \in L^2$

The above can be obtained from the tower property or Law of total expectation as follows $$\mathbb{E}[\left(Y-f(X) \right)h(X)] = \mathbb{E}[\mathbb{E}[\left(Y-f(X) \right)h(X) |X]] = \mathbb{E}[\mathbb{E}[\left(Y-f(X)|X] \right)h(X) ] = 0$$

Hence, we get $$\mathbb{E}[\left(Y-g(X)\right)^2] = \mathbb{E}[\left(Y-f(X) \right)^2] + \mathbb{E}[\left(f(X) - g(X)\right)^2]$$

Hence, $$\mathbb{E}[\left(Y-g(X)\right)^2] \geq \mathbb{E}[\left(Y-f(X) \right)^2]$$

The above should suggest some sort of generalized Pythagoras theorem and this is true from the following generalized statement.

A generalized version of the above is, If we have a closed convex subset $A$ of a Hilbert space $\mathscr{H}$, then $\forall h \in \mathscr{H}$, $!\exists a \in A$ such that $||h-a|| < ||h-y||$, $\forall y \in A \backslash \{a\}$. Further if $A$ happens to be a closed subspace, then the element $a$ is the "projection" of $h$ onto the space $A$.

The proof for this version is identical to the proof written above.

  • 0
    Do we need to use the $p(x,y)$ in the proof? I know the fact that $p(x,y)=p(y|x)p(x)$, and can this be used?2011-06-04
  • 0
    @Dean: You do not need to use $p(x,y) = p(y|x) p(x)$ explicitly. All you need for this proof is to use the tower property or the law of total expectation http://en.wikipedia.org/wiki/Law_of_total_expectation2011-06-05