As you may know Bayesian Information Criterion (BIC) can be used in model selection for linear regression: The model which has the min BIC is selected as the best model for the regression. BIC formula is given by:(https://en.wikipedia.org/wiki/Bayesian_information_criterion)
$$BIC(M)=k\log(n)-2\log(\bar{L})$$
or for linear regression:
$$BIC(M)=k\log(n)+n*\log(RSS/n)$$
where $\bar{L}$ is the maximized value of the likelihood function of the model, i.e. $\bar{L}=p(x|M,\theta)$, $k$ is the number of parameters, i.e. independent variables, in the regression and $n$ is the number of data points.
I am looking for the derivation of it. I googled but could not find a document explaining the derivation of BIC for linear regression. I tried to derive the formula myself but I get confused about the model: what is my model, what am I trying to maximize, what is $\theta$?
Can you please provide any information regarding the derivation of BIC for linear regression please? Thanks.