You have $ Y = X\beta+\varepsilon $ where $X$ is an observable $n\times4$ matrix, $\beta$ is $4\times1$ and not observable, $\varepsilon\sim N_n(0_{n\times 1} , \sigma I_{n\times n})$, and $Y$ is $n\times 1$ and observable.
So $Y\sim N_n(X\beta,\sigma I_{n\times n})$. The density function is $ f(y) = \frac{\text{constant}}{\sigma^n} \exp\left( (-1/2) \frac{(y-X\beta)^T (y-X\beta)}{\sigma^2} \right). $ The value of $\beta$ that maximizes this is the one that minimizes the residual sum of squares $ (y-X\beta)^T (y-X\beta). $ If $\hat y$ is the orthogonal projection of $y$ onto the column space of $X$, then this is $ \|((y-\hat y)+(\hat y - X\beta))\|^2 = \|y-\hat y\|^2 + \underbrace{(y-\hat y)^T (\hat y - X\beta)} + \|\hat y-X\beta\|^2. $ The term over the $\underbrace{\text{underbrace}}$ is $0$ because the two vectors are orthogonal to each other. The first term does not depend on $\beta$. Therefore, we seek the value of $\beta$ that minimizes the third term. Since $\hat y$ is in the column space of $X$, there is some $\hat\beta$ such that $\hat y = \hat \beta$, and that value makes that square zero, so that's the one we want.
Hence the MLE for $\beta$ is the vector of coefficients of the orthogonal projection of $y$ onto the column space of $X$, expressed as a linear combination of the columns of $X$.
That orthogonal projection is $Hy=X(X^T X)^{-1} X^T y$. To see this, suppose first that $u$ is in the column space of $X$. Then $u=X\alpha$. So $Hu = HX\alpha$ $=\Big(X(X^T X)^{-1} X^T\Big) X\alpha$ $=X\alpha = u$. Now suppose $u$ is orthogonal to the column space of $X$. Then $Hu = X(X^T X)^{-1} X^T u$, and this is $0$ since $Xu=0$. So $H$ leaves fixed each vector in the column space of $X$, and kills each vector orthogonal to that space.
So what is $\hat\beta$ if $\hat y=X\hat\beta$? We'd like to multiply both sides on the left by the inverse of $X$, but $X$ is not a square matrix. However $X$ does have a left-inverse. It's left-inverse is $(X^T X)^{-1}X^T$. We have $ Hy = X\hat\beta. $ So $ \hat\beta = (X^T X)^{-1} X^T H y = (X^T X)^{-1}X^T y. $
That's the MLE for $\beta$.
Since that doesn't depend on $\sigma$, we can plug that into the density in place of $\beta$ and then find the MLE for $\sigma$.