2
$\begingroup$

Can anybody help me to generate the estimator of equation:

$$Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2}+\cdots+\beta_4X_{i4}+\varepsilon_i$$

using method of maximum likelihood, where $\varepsilon_i$ are independent variables which have normal distribution $N(0,\sigma^2)$

  • 0
    Do you know how to multiply matrices? This sort of thing is more efficiently expressed that way.2012-12-13
  • 0
    Yes, I know how to multiply matrices2012-12-13

2 Answers 2

1

This is given by least squares estimation. To see this, write $$ L(\beta, \sigma^2 | Y) = \prod_i (2\pi\sigma^2)^{1/2} \exp\left(\frac {-1} {2\sigma^2} (Y_i - \beta_0 - \sum_j \beta_j X_{ij})^2\right) = (2\pi\sigma^2)^{n/2} \exp\left(\frac {-1} {2\sigma^2} \sum_i(Y_i - \beta_0 - \sum_j \beta_j X_{ij})^2\right) $$

Maximization of this is equivalent to minimization of $\sum_i (Y_i - \beta_0 - \sum_ j\beta_j X_{ij})^2$, which is the definition of least squares estimation. If $\mathbf X$ is such that $\langle \mathbf X\rangle_{ij} = X_{ij}$ and $\mathbf Y$ is such that $\langle\mathbf Y\rangle_{ij} = Y_{ij}$ then we can rewrite this as the minimization of $Q(\beta) = \|\mathbf Y - \mathbf X\beta\|^2$ with respect to $\beta$, and there are several ways to see that $\hat \beta = (\mathbf X^T \mathbf X)^{-1} \mathbf X^T \mathbf Y$ is the minimizer. The most expedient is probably to calculate $\partial Q/ \partial \beta$ and set equal to $0$ which gives $$ -2 \mathbf X^T \mathbf Y + 2 (\mathbf X^T \mathbf X) \beta = 0 $$ and note that $Q$ is a strictly convex function of $\beta$ which ensures that the solution to this equation is a minimum of $Q$.

Note however that we are relying on $\mathbf X^T \mathbf X$ being invertible, i.e. $\mathbf X$ has linearly independent columns; if this fails, $Q$ is not strictly convex and any generalized inverse $(\mathbf X^T\mathbf X)^-$ can be used to replace $(\mathbf X^T \mathbf X)^{-1}$.

  • 0
    There's also the question of the MLE for $\sigma$.2012-12-13
  • 0
    @Michael it is clear that optimization wrt $\beta$ can be done indepently of $\sigma$2012-12-14
1

You have $$ Y = X\beta+\varepsilon $$ where $X$ is an observable $n\times4$ matrix, $\beta$ is $4\times1$ and not observable, $\varepsilon\sim N_n(0_{n\times 1} , \sigma I_{n\times n})$, and $Y$ is $n\times 1$ and observable.

So $Y\sim N_n(X\beta,\sigma I_{n\times n})$. The density function is $$ f(y) = \frac{\text{constant}}{\sigma^n} \exp\left( (-1/2) \frac{(y-X\beta)^T (y-X\beta)}{\sigma^2} \right). $$ The value of $\beta$ that maximizes this is the one that minimizes the residual sum of squares $$ (y-X\beta)^T (y-X\beta). $$ If $\hat y$ is the orthogonal projection of $y$ onto the column space of $X$, then this is $$ \|((y-\hat y)+(\hat y - X\beta))\|^2 = \|y-\hat y\|^2 + \underbrace{(y-\hat y)^T (\hat y - X\beta)} + \|\hat y-X\beta\|^2. $$ The term over the $\underbrace{\text{underbrace}}$ is $0$ because the two vectors are orthogonal to each other. The first term does not depend on $\beta$. Therefore, we seek the value of $\beta$ that minimizes the third term. Since $\hat y$ is in the column space of $X$, there is some $\hat\beta$ such that $\hat y = \hat \beta$, and that value makes that square zero, so that's the one we want.

Hence the MLE for $\beta$ is the vector of coefficients of the orthogonal projection of $y$ onto the column space of $X$, expressed as a linear combination of the columns of $X$.

That orthogonal projection is $Hy=X(X^T X)^{-1} X^T y$. To see this, suppose first that $u$ is in the column space of $X$. Then $u=X\alpha$. So $Hu = HX\alpha$ $=\Big(X(X^T X)^{-1} X^T\Big) X\alpha$ $=X\alpha = u$. Now suppose $u$ is orthogonal to the column space of $X$. Then $Hu = X(X^T X)^{-1} X^T u$, and this is $0$ since $Xu=0$. So $H$ leaves fixed each vector in the column space of $X$, and kills each vector orthogonal to that space.

So what is $\hat\beta$ if $\hat y=X\hat\beta$? We'd like to multiply both sides on the left by the inverse of $X$, but $X$ is not a square matrix. However $X$ does have a left-inverse. It's left-inverse is $(X^T X)^{-1}X^T$. We have $$ Hy = X\hat\beta. $$ So $$ \hat\beta = (X^T X)^{-1} X^T H y = (X^T X)^{-1}X^T y. $$

That's the MLE for $\beta$.

Since that doesn't depend on $\sigma$, we can plug that into the density in place of $\beta$ and then find the MLE for $\sigma$.