# Estimation of population parameters

We have specified an economic model, and the corresponding population regression equation. It is now time to estimate the value of the population parameters. For that purpose we need a **sample regression equation, **expressed as this:

The important difference between the population regression equation and the sample regression equation concerns the parameters and the error term. In the population regression equation the parameters are fixed constants. They do not change. In the sample regression equation the parameters are random variables with a distribution. Their mean values represent estimates of the population parameters, and their standard errors are used when performing statistical tests. The error term is also an estimate and corresponds to the population error term.

Sometimes it is convenient to have a separate name for the estimated error term in order to make the distinction. In this text we will call the estimated error term the **residual term.**

## The method of ordinary least squares

There exist many methods to estimate the parameters of the population regression equation. The most common ones are the method of maximum likelihood, the method of moment and the method of Ordinary Least Squares (OLS). The last method is by all means the most popular method used in the literature and is therefore the basis for this text.

**Figure 3.1 Fitted regression line using OLS**

The OLS relies on the idea to select a line that represents an average relationship of the observed data similarly to the way the economic model is expressed. In Figure 3.1 we have a random sample of 10 observations. The OLS regression line is placed in such a way that the sum of the squared distances between the dots and the regression line become as small as possible. In mathematical terms using equation (3.3) we have:

In order to be more general we assume a sample size of * n *observations. The objective is to minimize the Residual Sum of Squares (RSS) expressed in (3.4) with respect to b0 and

*Hence, this is a standard optimization problem with two unknown variables that is solved by taking the partial derivatives with respect to b0 and*

**b .***put them equal to zero, and then solving the resulting linear equations system with respect to those two variables. We have:*

**b ,**By rearranging these two equations we obtain the equation system in normal form:

Solving for * b0 *and b1 gives us:

The slope coefficient b1 is simply a standardized covariance, with respect to the variation in X1. The interpretation of this ratio is simply: when * X1 *increases by 1 unit,

*will change by*

**Y***units. Remember that*

**b1***and*

**b0***are random variables, and hence it is important to know how their expected values and variances look: likes Below we will derive the expected value and variance for both the intercept and the variance.*

**b1**The variance of the intercept is slightly more involved, but since text books in general avoid showing how it could be done we will do it here, even though the slope coefficient is the estimate of primary interest.

In order to find ohe expected value and the varianc0 it is convenient to rewrite the expression for the estimators in such a way that they appear to be functions of the sample values ot the dependent variable * Y. *Since the intercept is expressed as a function of the slope coefficient we will start with the slope estimator:

For the intercept we do the following:

Hence

Hence, the OLS estimators are weighted averages of the dependent variable, holding in mind that * Wi *is to be treated as a constant. Having the OLS estimators in this form we can easily find the expected value and variance:

**The expected value of the OLS estimators**

Hence, the mean value of the sample estimators equals the population parameters. You should confirm these steps your self. The result from the second comes from the regression assumptions. Also rammers that the population is f constant and that the expected value constant is the constant itself. The derivation of the variance will start with the expression established at the second the above.

**The variance of the OLS estimators**

When deriving the variance for the intercept, we utilize the definition of the variance that is expressed in terms of expectations. We have the expected value of the squared difference, and thereafter substitute

Square the expression and take the expectation and end up with

Try to work out the expressions and remember that * EUf *=

*2 = a2 and that*

**EU**and therefore

The covariance between the two OLS estimators can be received using the covariance operator together with expressions (3.9) and (3.10). Try it out. The covariance is given by the following expression:

In order to understand all the steps made above you have to make sure you remember how the variance operator works. Go back to chapter 1 and repeat if necessary. Also remember that the variance of the population error term is constant and the same over observations. If that assumption is violated we will end up with something else.

Observe that the variance of the OLS estimators is a function of the variance of the error term of the model. The larger the variance of the error term, the larger becomes the variance of the OLS estimator. This is true for the variance of the intercept, variance of the slope coefficient and for the covariance between slope and the intercept. Remember that the variance of the error term and the variance of the dependent variable coincide. Also note that the larger the variation in * X *is, the smaller become the variance of the slope coefficient. Think about that. Increased variation in

*has of course the opposite effect, since the variance in*

**Y***is the same as the variance of the error term.*

**Y**The variance of the population error term * a1 *is usually unknown. We therefore need to replace it by an estimate, using sample information. Since the population error term is unobservable, one can use the estimated residual to find an estimate. We start by forming the residual term

We observe that it takes two estimates to calculate its value which implies a loss of two degrees of freedom. With this information we may use the formula for the sample variance. That is:

Observe that we have to divide by * n*-2, which referees to the degrees of freedom, which is the number of observations reduced with the number of estimated parameters used in order to create the residual. It turns out that this is an unbiased estimator of the population variance and it is decreasing as the number of observations increases.

## Properties of the least squares estimator

The OLS estimator is attached to a number of good properties that is connected to the assumptions made on the regression model which is stated by a very important theorem; the **Gauss Markov theorem.**

**The Gauss Markov Theorem**

When the first 5 assumptions of the simple regression model are satisfied the parameter estimates are unbiased and have the smallest variance among other linear unbiased estimators. The OLS estimators are therefore called **BLUE **for Best Linear Unbiased Estimators.

The OLS estimators will have the following properties when the assumptions of the regression function are fulfilled:

1) **The estimators are unbiased**

That the estimators are unbiased means that the expected value of the parameter equals the true population value. That means that if we take a number of samples and estimate the population parameters with these samples, the mean value of those estimates will equal the population value when the number of samples goes to infinity. Hence, on average we would be correct but it is not very likely that we will be exactly right for a given sample and a given set of parameters.

**Unbiased estimators implies that**

2) **Minimum variance: Efficiency of unbiased estimators**

When the variance is best, it means that it is efficient and that no other linear unbiased estimator has a better precision (smaller variance) of their estimators. It requires that the variance is homoscedastic and that it is not autocorrelated over time. Both these two issues will be discussed in chapter 9 and 10.

3) **Consistency**

Consistency is another important property of the OLS estimator. It means that when the sample size increase and goes to infinity, the variance of the estimator has to converge to zero and the parameter estimates converge to the population parameters. An estimator can be biased and still consistent but it is not possible for an estimator to be unbiased and inconsistent.

4) **Normally distributed parameters**

Since the parameters are weighted averages of the dependent variable they can be treated as a means. According to the central limit theorem, the distribution of means is normally distributed. Hence, the OLS estimators are normally distributed in sufficiently large samples.