We know that the expectation operator is defined for a random variable $x$, as such:
$$ \mathbb{E} \left\{x\right\} = \int_{-\infty}^{\infty} x \: p_x(x) \; \mathrm{d}x $$
Where $p_x{x}$ is the PDF of the random variable $x$.
If there is an arbitrary(?) function $f$ acting on the random variable $x$, then the expected value of this function can also be written as:
$$ \mathbb{E}\left\{f(x) \right\} = \int_{-\infty}^{\infty} f(x) \: p_x(x) \: \mathrm{d}x $$
My questions are: On many algorithms that I study, (statistical in nature), one often finds themselves taking the expected value of some entity, that is a function of the random variable $x$. In the reverse case, one can also find themselves poking around and manipulating the probability distribution function of $x$, and then we can 'take it back' into an expression using the expectation operator.
Upon evaluating the expected value of $x$ however, ($\mathbb{E[x]})$, I often come across this estimation formula:
$$ \mathbb{E}\left\{x\right\} \approx \frac{1}{N}\sum_{n=1}^{N} x[n] $$
and similarly,
$$ \mathbb{E}\left\{f(x)\right\} \approx \frac{1}{N}\sum_{n=1}^{N} f(x[n]) $$ Where each $x[n]$ is an individual realization of the random variable $x$.
My question is, why is this formula true, and how did it come about? Every book I read seems to just include it as if it fell from the sky one day and no explanation is given as to why it is true.
Could someone please give an intuitive and mathematical explanation for why - and more importantly, how this happens to be true? What is the history/rationale behind it?
Many thanks.
