3
$\begingroup$

I have a small problem. With my limited stats background I am not sure I am getting this one right. After fitting an ordinary linear regression model I get $\hat{\underline{Y}}=X\hat{\underline{\beta}}$ Now the problem is to calculate confidence interval of not observed $Y_{\alpha}$. Is this incredible stupid to just calculate confidence interval for each $\beta_j$, say $\hat{\beta_j}\pm\epsilon_j$ and then look at $a:=\sum{\epsilon_j} x_{ij}+\hat{\sigma} z$, (z=1.96 usually) where $\sigma^2$ is the variance of an error term of a model $\underline{Y}=X\underline{\beta}+\underline{\epsilon}, \underline{\epsilon} \sim N_n(0,\sigma^2I_n)$ Now It seems intuitive to claim that $(\sum{\hat{\beta_j} x_{ij}}\pm a)$ is the confidence interval for $Y_{\alpha}$.

  • 0
    the link is somewhat helpful, but not much. I know that $Y_{\alpha}=x_1*\beta_1+x_2*\beta_2+...+x_n*\beta_n+\epsilon$, so $Y_{\alpha} \sim N(\sum{x_i*\beta_i},\sigma^2)$, I know s.e. of $\hat{\beta_i}$'s, and I know $\hat{\sigma}$. Somewhat challenging to find prediction interval for $Y_{\alpha}$ though...2011-11-13

2 Answers 2

2

We have $X\in\mathbb{R}^{n\times p}$, $\beta\in\mathbb{R}^{p\times1}$, and $Y\sim \mathcal{N}_n(X\beta, \sigma^2 I_n)$, where $\mathcal{N}_n$ is the $n$-dimensional normal distribution, and $I_n$ is the $n\times n$ identity matrix. The least-squares estimator of the $p\times1$ vector of coefficients is $\hat{\beta} = (X^TX)^{-1}X^TY \sim\mathcal{N}_p(\beta, \sigma^2 (X^TX)^{-1}).$ The vector of predicted values is $ \hat{Y} = X\hat{\beta} \sim \mathcal{N}_n(X\beta,\sigma^2 H) = \mathcal{N}_n(X\beta,\sigma^2 X(X^TX)^{-1}X^T). $ (Reminder to "pure" mathematicians: Some matrices are not square. The "hat matrix" $H=X(X^TX)^{-1}X^T$ is not an identity matrix unless $X$ is square, and if $X$ were square, this whole discussion would be silly. In fact, $X(X^TX)^{-1}X^T$ is an $n\times n$ matrix of rank $p.)

The vector of residuals (not to be confused with the vector $\varepsilon$ of errors) is $ \hat{\varepsilon} = Y - \hat{Y} = (I-H)Y \sim \mathcal{N}_n(0,\sigma^2 (I-H)) $ with $H$ as above. That the residuals $\hat{\varepsilon}$ are actually independent of the predicted values $\hat{Y}$ can be seen by joint normality and uncorrelatedness: $ \operatorname{cov}(\hat{\varepsilon},\hat{Y}) = \operatorname{cov}((I-H)Y, HY) = (I-H)\operatorname{cov}(Y, Y)H^T = (I-H)\sigma^2 H = \sigma^2(H-H) =0. $ Since $I-H$ is the matrix of an orthogonal projection onto a space of dimension $n-p$, we have $ \frac{\|\hat{\varepsilon}\|^2}{\sigma^2} \sim \chi^2_{n-p}. $

Summary of the foregoing:

  • $\hat{Y}$ and $\hat{\beta}$ have the normal distributions noted above.
  • The sum of squares of residuals $\|\hat{\varepsilon}\|^2/\sigma^2$ has the chi-square distribution noted above.
  • The sum of squares of residuals is independent of $\hat{Y}$ and of $\hat{\beta}$.

What is the probability distribution of one predicted value? Take $\hat{Y}_1 = [1,0,0,0,\dots,0]\hat{Y}$ $= [1,0,0,0,\dots,0]X\hat{\beta} \sim \mathcal{N}_1(\bullet,\bullet)$. I'll let you fill in the two blanks. For the expected value, multiply on the left by $[1,0,0,0,\dots,0]$; for the variance, multiply on the left by that same thing and on the right by its transpose. notice that this is independent of the sum of squares of residuals.

So we get a Student's t-distribution: $ \frac{(\hat{Y}_1 -\mathbb{E}(\hat{Y}_1))/\sqrt{\operatorname{var}(\hat{Y}_1)}}{\sqrt{\|\hat{\varepsilon}\|^2/(n-p)}} \sim t_{n-p}. $ From this we get a confidence interval for the average $Y$-value given the $x$-values equal to whatever they are for the first $Y$ value, i.e. the first row of the design matrix $X$.

What do we do for a set of $x$-values other than the observed ones? Instead of $[1,0,0,0,\ldots,0]X$, just use that row vector of unobserved $x$-values. This gives us a confidence interval for the corresponding expected $Y$-value.

But now we want a prediction interval for the next observed $Y$-value. We're given the yet-unobserved set of $x$-values. The new $Y$-value is independent of all of the foregoing. $ Y - \hat{Y} \sim\mathcal{N}_n(0,\sigma^2+(\text{the appropriate entry from }\sigma^2X(X^TX)^{-1}X^T)). $ It's all still independent of the residual sum of squares, which has a chi-square distribution. So we get another Student's t-distribution, and base a prediction interval on that.

The above is a hasty sketch of how you derive the prediction interval, but I haven't given the bottom line. Lots of books will give you the bottom line without the derivation.

  • 1
    Once upon a time a highly respected "pure" mathematician told me that $X(X^T X)^{-1}X^T$ is obviously the identity matrix. I've seen a few other instances that I don't remember right now. I don't really think of myself as "pure" or "applied", but I wonder if there's a reason why Paul Halmos' book _Finite-Dimensional Vector Spaces_ doesn't include singular value decompositions.2011-11-14
0

This is brilliant! Filling in the gaps:

$\hat{Y_1} \sim \mathcal{N}_1 (\sum\limits_{j=1}^p {X_{j1} \cdot \beta_j}, \sigma^2H_{11})$

Unobserved, let $\mathbf{x}=(x_1,x_2,\ldots,x_p)$

$\hat{Y}_\mathbf{x} \equiv \mathbf{x}\cdot\hat\beta \sim \mathcal{N}_1 (\mathbf{x}\cdot\beta, \sigma^2\mathbf{x}(X^TX)^{-1}\mathbf{x}^T)$ $\tau^2 \equiv \mathbf{x}(X^TX)^{-1}\mathbf{x}^T$ ${\hat\sigma}^2 \equiv \|\hat\varepsilon\|^2/(n-p)$ $\frac{(\hat{Y}_\mathbf{x}-\mathbb{E}[\hat{Y}_\mathbf{x}])/\sigma\tau}{\hat\sigma/\sigma} \sim t_{n-p}$ So $\mathbb{E}[\hat{Y}_\mathbf{x}]$ is in $(\hat{Y}_\mathbf{x} \pm \hat\sigma\tau \cdot t_{\alpha/2})$ For prediction interval: $\hat{Y}_\mathbf{x}-Y_\mathbf{x}=\mathbf{x}(\hat\beta-\beta)-\varepsilon \sim \mathcal{N}(0,\sigma^2(\tau^2+1))$ (Independence of $\varepsilon$) $\frac{(\hat{Y}_\mathbf{x}-Y_\mathbf{x})/\sigma\sqrt{\tau^2+1}}{\hat\sigma/\sigma} \sim t_{n-p}$ Hence the prediction interval is $(\hat{Y}_\mathbf{x} \pm \hat\sigma\sqrt{\tau^2+1} \cdot t_{\alpha/2})$