# DERIVING THE MLEs FOR LINEAR REGRESSION

Let’s now go ahead and derive the MLEs for some of the parameters in the simple linear regression model. As usual, we first take the log of the likelihood. We have

We next take the derivatives with respect to each of the parameters and set these derivatives to zero. For example,

where the first term is zero because it did not depend on *b _{0},* and the derivative of the squared term is just -1. Notice that since o

^{2}doesn’t depend on i, we can take it out of the sum and multiply both sides of the equation by o

^{2}.

Finally, the sum of b_{0} from *i* = 1 to *n* is simply *n* times b_{0}.

We can now solve for b_{0} to get

where I’ve done a bit of rearranging for clarity. Consider this equation in the context of the “null hypothesis” for linear regression, namely that b_{1} = 0. This equation says that under the null hypothesis, b_{0} is just the average of *Y,* which, as we have seen in Chapter 4, turns out to be the MLE for the p parameter of the Gaussian distribution. This makes sense based on what I already said about the regression model.

Notice that as often happens, the MLE for b_{0} depends on b_{1}, so that to maximize the likelihood, we will have to simultaneously solve the equation

where the first term is zero because it did not depend on *b _{1},* and the derivative of the squared term is -X

_{;}. Once again, we can take o

^{2}out of the sum and multiply both sides of the equation by o

^{2}, leaving

We can solve for *b _{1}*

Luckily, the equation for *b _{1}* only depends on b

_{0}, so that we have two equations and two unknowns and we can solve for both b

_{0}and b

_{1}. To do so, let’s plug the equation for b

_{0}into the equation for b

_{1}.

After algebra, I have which can be solved to give

where, to avoid writing out all the sums, it’s customary to divide the top and bottom by n, and then put in *m** _{X}* to represent various types of averages. This also works out to

where

s are the standard deviations of each list of observations

*r* is the “Pearson’s correlation” *r **(**X **,**Y*) = d := *(**X**i **- m**x* )(Y - *m**Y* )j/sxSy

This equation for *b** _{1}* is interesting because it shows that if there is no correlation between

*X*and

*Y*

*,*the slope (b

_{1}) must be zero. It also shows that if the standard deviation of

*X*and

*Y*are the same

*(*

*s*

_{Y}/*s*

*= 1), then the slope is simply equal to the correlation.*

_{X}We can plug this back in to the equation for b_{0} to get the second MLE: