Many books while introducing the regression problem, start with the assertion that any random variable $Y$ can be decomposed into two orthogonal terms $$ Y= E[Y|X]+\epsilon. $$ In the classical statistics $E[Y|X]$ is a shorthand for $E[Y|X=x]$ where $X$ is some "controlled" (non-random) variable. However in econometric research $X$ is a random variable, thus I guess that $E[Y|X]$ is a shorthand for $E[Y|\sigma(X)]$, where $\sigma(X)$ is a sigma algebra generated by $X$.
- Is it right interpretation?
Another assertion is that $E[Y|X]$ is an orthogonal projection.
- What space does $Y$ projected onto (on $\sigma(X)$?)?
I pretty well understand it from the algebraic point of view when $$ y = \hat{y} + e, $$ and $HY=X(X'X)^{-1}X'y$. In this case the orthogonality of $e$ w.r.t $\hat{y}$ has clear geometric interpretation ($H$ is an orthogonal projection of $y$ onto $C(X)$ and $e \in C(X)^{\perp}$). However, this is a post-hoc approach when we already observed the data points $\{y_i, x_{1i},...,x_{pi}\}_{i=1}^n$, while I'm interested in the stochastic process that generates it.
To sum up, my questions are:
If $X$ is random variable and defined on the same probability space as $Y$, why does an orthogonal decomposition of the kind $$ Y = E[Y|\sigma(X)]+\epsilon=h(X)+\epsilon $$ exists? How can I prove its existence (and uniqueness)? (I know it requires squared integrability of $Y$, but I have non-intuitive explanation how it is suffice for the decomposition to exist).
Are the projections $E[Y|\sigma(X)]$ or $E[Y|X=x]$ project on $\sigma(X)$? If so, does it have any intuitive meaning (like in the linear Algebra analog)
If $\epsilon$ defined on the same probability space, what it means to be orthogonal to $E[Y|\sigma(X)]$?
Would appreciate any help.