3
$\begingroup$

As you may know Bayesian Information Criterion (BIC) can be used in model selection for linear regression: The model which has the min BIC is selected as the best model for the regression. BIC formula is given by:(https://en.wikipedia.org/wiki/Bayesian_information_criterion)

$$BIC(M)=k\log(n)-2\log(\bar{L})$$

or for linear regression:

$$BIC(M)=k\log(n)+n*\log(RSS/n)$$

where $\bar{L}$ is the maximized value of the likelihood function of the model, i.e. $\bar{L}=p(x|M,\theta)$, $k$ is the number of parameters, i.e. independent variables, in the regression and $n$ is the number of data points.

I am looking for the derivation of it. I googled but could not find a document explaining the derivation of BIC for linear regression. I tried to derive the formula myself but I get confused about the model: what is my model, what am I trying to maximize, what is $\theta$?

Can you please provide any information regarding the derivation of BIC for linear regression please? Thanks.

1 Answers 1

3

In case somebody is looking for the derivation of the BIC formulation for linear regression here it is.

assuming that $Y$ depends on $X_i$ s a linear relationship can be formulated as:

$$Y=\beta_0+\beta_1X_1+\beta_2X_2+\dots+\beta_nX_n+\epsilon=f(X)+\epsilon$$

where $\epsilon$ is normal variable with zero mean and a variance of $\sigma$. We are trying to estimate the $\beta$ coefficients and there may be multiple regressions models. If this is the case BIC can be used for model selection.

From the regression equation $\epsilon=Y-f(X)$; since $\epsilon$ is assumed to be Gaussian and i.i.d with zero mean and a variance of $\sigma$, likelihood of $\epsilon$ can be written as:

$$ L=\prod\frac{1}{\sigma\sqrt{2\pi}}exp(-\frac{(Y_i-f_i(X)^2)}{2\sigma^2}) $$

When the multiplication is done and ignoring the $\pi$ variable we obtain:

$$ L=\frac{1}{\sigma^n} \exp (-\frac{\sum (Y_i-f_i(X))^2}{2\sigma^2})=\frac{1}{\sigma^n} \exp (\frac{-RSS}{2\sigma^2}) $$

When we take the derivative of $L$ wrt $\sigma$ and equate to zero we obtain $\sigma^2=\frac{RSS}{n}$. Putting this value in $L$ to obtain its max value, i.e. $\bar{L}$ we obtain

$$ \bar{L}=L|_{\sigma^2=\frac{RSS}{n}}=(\frac{RSS}{n})^{-n/2}*\exp(-n/2) $$

and the log of $\bar{L}$ is

$$ \log(\bar{L})=-\frac{n}{2}log(RSS/n)-n/2 $$

and the -2*log of $\bar{L}$ is

$$ \log(\bar{L})=nlog(RSS/n)+n $$

which is the second part of the BIC formula for regression. I believe $n$ in the derivation is ignored since it is not associated with any variable.

The first part of BIC for linear regression directly comes from the BIC definition.