In the linear regression model $Y = X\beta + e$, where $Y, e$ are $n \times 1$ random vectors, $X$ is an $n \times k$ random matrix and $\beta$ a $k \times 1$ vector of parameters, it is sometimes said that under the assumption $e \mid X \sim N(0, \sigma^2 I_n)$ one has $\hat{\beta} - \beta \mid X \sim N\big(0, \sigma^2(X^TX)^{-1}\big)$, where $\hat{\beta} := (X^TX)^{-1}X^TY$ is the OLS estimate for $\beta$.
Usually this is justified by applying the proposition "if $Z\sim N(0, \Sigma)$ and $W:= AZ$ with $A\Sigma A^T > 0$ then $W \sim N(0, A\Sigma A^T)$", since a standard result in OLS regression (under suitable assumption) is:
$\hat{\beta} - \beta = (X^TX)^{-1}X^{T}e$
I am not quite satisfied with the above justification. When I see $e\mid X \sim N(0, \sigma^2I_n)$ I interpret it just as a short hand notation for:
$f_{e \mid X}(u \mid x) = \frac{1}{\sqrt[n]{2 \pi} \sigma^{n}}\exp\Big(-\frac{u^Tu}{2\sigma^2}\Big)$
I guess my problem is that I am not used to treating $e\mid X$ as a random variable.
What I had in mind is something along the lines: begin with the conditional distribution $F_{\hat{\beta} - \beta \mid X}(u \mid x)$, arrive (somehow) to a known distribution and take derivatives. When I try to do it I get:
$F_{\hat{\beta} - \beta \mid X}(u \mid x) = \Pr(\hat{\beta} - \beta \leq u \mid X = x) = \Pr\big((X^TX)^{-1}X^Te \leq u \mid X = x\big)$
but I don't see how to proceed from there. So, my questions are:
In what sense is $e \mid X$ a random variable? [Yes, $f_{e\mid X}(u, x)$ is a density function but I am used to thinking random variables as functions from one measurable space to another].
Is there a way to make the above attempt at a proof, work?
Thanks