The multiple regression model
From now on the discussion will concern multiple regression analysis. Hence, the analysis will be assumed to include all relevant variables that explain the variation in the dependent variable, which almost always includes several explanatory variables. That has consequences on the interpretation of the estimated parameters, and violations of this condition will have consequences that will be discussed in chapter 7. This chapter will focus on the differences between the simple and the multiple-regression model and extend the concepts from the previous chapters.
Partial marginal effects
For notational simplicity we will use two explanatory variables to represent the multiple-regression model. The population regression function would now be expressed in the following way:
By including another variable in the model we control for additional variation that is attributed to that variable. Hence coefficient B1 represents the unique effect that comes from X1, controlling for X2, which means that, any common variation between x1 and x2 will be excluded. We are talking about the partial regression coefficient.
Assume that you would like to predict the value (sales price) of a Volvo S40 T4 and we have access to a data set including the following variables: sale price (P) the age of the car (A) and the number of kilometers the car has gone (K). We set up the following regression model:
The model offers the following two marginal effects:
The first marginal effect (6.3) represents the effect from a unit change in the age of the car on the conditional expected value of sales prices. When the age of the car increase by one year, the mean sales price change by b1 Euros when controlling for number of kilometers. It is reasonable to believe that the age of the car is correlated with the number of kilometers the car has gone. That means that some of the variations in the two variables are common in explaining the variation in the sales price. That common variation is excluded from the estimated coefficients. The partial effect that we seek is therefore the unique effect that comes from the aging of the car.
Accordingly, the second marginal effect (6.4) represents the unique effect that each kilometer has on the sales price of the car, controlling for the age of the car. The way the model is specified here, imply that the unique effect on the sales price from each kilometer is the same whether the car is new or if it is 10 years old, which means that the marginal effects are independent of the level of a and K. If this is implausible, one could adjust for it.
One way to extend the model and control for additional variation would be to include squared terms as well as cross products. The extended model would then be:
Extending the model in this way would results in the following two marginal effects:
Equation (6.6) is the marginal effect on sales price from a unit increase in age. It is a function of how old the car is and how many kilometer the car has gone. In order to receive a specific vale for the marginal effect we need to specify values for a and K. Most often those values would be mean values of a and K, unless other specific values are of particular interest. The marginal effects given by (6.6) and (6.7) consist of three parameter estimates, which individually can be interpreted.
Focusing on (6.6) the first parameter estimate is B1. It should be regarded as an intercept, and as such has limited interest. Strictly speaking it represents the marginal effect, when a and k both are zero, which would be when the car was new.
The second parameter is B2 that accounts for any non-linear relation between a and P. To include a squared term is therefore a way to test if the relation is non-linear. If the estimated coefficient is significantly different from zero we should conclude that non-linearity is present and controlling for it would be necessary. Failure to control for it, would lead to a biased marginal effect since it would be assumed to be constant, when it in fact vary with the level of a.
The third parameter B5 controls for any synergy effect that could possible exist between the two explanatory variables included. It is not obvious that such effect would exist in the Volvo S40 example. In other areas of economics the effect is more common. For instance in the US wage equation literature: being black and being a woman are usually two factors that have negative effects on the wage rate. Furthermore, being a black woman is a combined effect that further reduces the wage rate. This would be an example of a negative synergy effect.