1
$\begingroup$

I am reading about the prediction error estimation and I found the following:

Suppose we have ${\mathbf{Y}}=\mathbf{x}_0+ \epsilon$, where, $\epsilon$ is normally distributed as $\sim \mathcal{N}(0, \sigma_w^2I)$ and we find an estimate of ${x}_o$ using the function $\mathcal{f}$ and the random vector ${Y}$, as $\hat{\mathbf{X}}=\mathcal{f}(\mathbf{Y})$.

Now, suppose we have the training data of {$\mathbf{y}=[y_1,\cdots,y_N], \mathbf{x}=[x_1,x_2,\cdots, x_N]$} where $\mathbf{x}=\mathcal{f}(\mathbf{y})$

As I understood from different references, we can write:

$\mathbb{E}(\|\mathbf{Y}-\mathbf{\hat{X}}\|_2^2)^2=\sum_{i=1}^N (y_i-\hat{x}_i)^2+2\sigma_w^2 \sum_{i=1}^N cov(Y_i,\hat{X}_i)$. In other words, $\sum_{i=1}^N(y_i-x_i)^2$ is a biased estimated for $\mathbb{E}(\|\mathbf{Y}-\mathbf{\hat{X}}\|_2^2)$ which needs $2\sigma_w^2 \sum_{i=1}^N cov(Y_i,\hat{X}_i)$ to become unbiased.

Now, I am wondering if we can get a similar result in estimating $\mathbb{E}(\|\mathbf{\hat{X}}-\mathbf{x}_0\|_2^2)$, i.e. to conclude that $\sum_{i=1}^N(\hat{x}_i-x_{0i})^2$ is a biased estimated for $\mathbb{E}(\|\mathbf{\hat{X}}-\mathbf{x}_0\|_2^2)$, where $\mathbf{\hat{x}}=\mathcal{f}(\mathbf{y})$ and $\mathbf{y}$ denotes the observation vector of $\mathbf{Y}$.

  • 0
    Simple things are not simple until you learn that they are! This looks like a nice first question. Welcome!2017-01-09

0 Answers 0