0
$\begingroup$

In the linear regression model $Y = X\beta + e$, where $Y, e$ are $n \times 1$ random vectors, $X$ is an $n \times k$ random matrix and $\beta$ a $k \times 1$ vector of parameters, it is sometimes said that under the assumption $e \mid X \sim N(0, \sigma^2 I_n)$ one has $\hat{\beta} - \beta \mid X \sim N\big(0, \sigma^2(X^TX)^{-1}\big)$, where $\hat{\beta} := (X^TX)^{-1}X^TY$ is the OLS estimate for $\beta$.

Usually this is justified by applying the proposition "if $Z\sim N(0, \Sigma)$ and $W:= AZ$ with $A\Sigma A^T > 0$ then $W \sim N(0, A\Sigma A^T)$", since a standard result in OLS regression (under suitable assumption) is:

$\hat{\beta} - \beta = (X^TX)^{-1}X^{T}e$

I am not quite satisfied with the above justification. When I see $e\mid X \sim N(0, \sigma^2I_n)$ I interpret it just as a short hand notation for:

$f_{e \mid X}(u \mid x) = \frac{1}{\sqrt[n]{2 \pi} \sigma^{n}}\exp\Big(-\frac{u^Tu}{2\sigma^2}\Big)$

I guess my problem is that I am not used to treating $e\mid X$ as a random variable.

What I had in mind is something along the lines: begin with the conditional distribution $F_{\hat{\beta} - \beta \mid X}(u \mid x)$, arrive (somehow) to a known distribution and take derivatives. When I try to do it I get:

$F_{\hat{\beta} - \beta \mid X}(u \mid x) = \Pr(\hat{\beta} - \beta \leq u \mid X = x) = \Pr\big((X^TX)^{-1}X^Te \leq u \mid X = x\big)$

but I don't see how to proceed from there. So, my questions are:

  1. In what sense is $e \mid X$ a random variable? [Yes, $f_{e\mid X}(u, x)$ is a density function but I am used to thinking random variables as functions from one measurable space to another].

  2. Is there a way to make the above attempt at a proof, work?

Thanks

1 Answers 1

0
  1. Take a r.v $Y$ with $E|Y|^2 < \infty$ and another random variable $X$ , ($E|X|^2 < \infty$) on the same probability space, then $Y$ can be orthogonaly decomposed into $$ Y = E[Y|X]+ e, $$ from the decomposition it follows that $$ E[e|X]=0, $$ however any parametric specification is mainly based on ones' assumptions. In particular, one can assume that $e|X\sim N(0, \sigma^2)$. This is the stochastic model that generates your observed data points $\{(y_i, x_i)\}$. So, basically you have an inverse problem when your goal is reconstruct $E[Y|X]$. If $(Y,X)$ is bivariate normal then the OLS estimators provide consistent estimators, otherwise it is just a linear approximation.

  2. Every entry of your vector is a an integral over normal density function. Generally, there is no close form for it, but you can still take derivatives and get the multivariate normal distribution,