Let $Y = X \beta + \varepsilon$ with $Y \in \mathbb{R} \in \mathbb{R}^n$, $X \in \mathbb{R}^{n \times p}$, $\operatorname{rank}(X) = p$, $\beta \in \mathbb{R}^p$, $\varepsilon \sim \mathcal{N}_n(0, \sigma^2 I_n)$.
What is the distribution of $\hat{\beta}$? $$\hat{\beta} = {(X^TX)}^{-1} X^T Y$$
My thoughts
Except for $\varepsilon$, everything is constant. So $\hat{\beta}$ is an affine function of $\varepsilon$. Hence $\hat{\beta}$ is also multivariate normal distributed.
\begin{align} \mathbb{E}(\hat{\beta}) &= \mathbb{E}({(X^TX)}^{-1} X^T Y)\\ &= {(X^TX)}^{-1} X^T \mathbb{E}(Y)\\ &= {(X^TX)}^{-1} X^T \mathbb{E}(X \beta + \varepsilon)\\ &= {(X^TX)}^{-1} X^T X \beta + \mathbb{E}(\varepsilon)\\ &= {(X^TX)}^{-1} X^T X \beta\\ \end{align}
First sub-question: Is ${(X^TX)}^{-1} = X^{-1} (X^T)^{-1}$, even if $X$ is not square? (And which inverse would that be? I've just read that there are multiple candidates.)
edit: I've just seen ${(X^TX)}^{-1} X^T X \beta = {(X^TX)}^{-1} (X^T X) \beta$, so I don't need this to continue.
\begin{align} \mathbb{E}(\hat{\beta}) &= X^{-1} {X^T}^{-1} X^T X \beta\\ &=\beta \end{align}
\begin{align} C(\hat{\beta}) &= C({(X^TX)}^{-1} X^T Y)\\ &={(X^TX)}^{-1} X^T C(Y) { \left ({(X^TX)}^{-1} X^T \right)}^T\\ &={(X^TX)}^{-1} X^T \sigma^2 I_n { \left ({(X^TX)}^{-1} X^T \right)}^T\\ &=\sigma^2 {(X^TX)}^{-1} X^T { \left ({(X^TX)}^{-1} X^T \right)}^T\\ &=\sigma^2 {(X^TX)}^{-1} X^T X {(X^T X)^{-1}}^T \\ &=\sigma^2 {(X^T X)^{-1}}^T \\ \end{align}
Sub-question 2: According to my notes, $\hat{\beta} \sim \mathcal{N}_n \left ({\beta, \sigma^2 (X^T X)^{-1}} \right )$. Is that the same?
I have read that $X^T X$ is always symmetric, but is ${(X^T X)}^{-1}$ also symmetric?