Multiple Linear Regression
Multiple linear regression is used to estimate the relationship between more than one predictor variables and a single response variable. Suppose that you want to include the effect of a certain promotion in estimating ice cream sales. During the time period that the ice cream sales data was collected, the store manager would sometimes include a flyer in the daily paper promoting the store. The manager feels that demand is higher on days that she uses the flyer promotion but is not sure how much additional demand is driven by this promotion. The relation between sales and the now two explanatory variables can be formalized as:
where DATt continues to represent the daily average temperature and the new variable, FLY, indicates if a flyer was included in the paper that day. The FLY variable is an indicator variable, meaning that it only takes a one or zero value. If a flyer promotion was used on a particular day, then FLY = 1, if not then FLY = 0. The data used for this regression is shown in Figure 3.11 and the regression results are shown in Figure 3.12.
Observing the new regression results, notice that the R-squared value increased slightly compared to the single variable regression results in Figure 3.8. This will always be the case as you add more predictor variables, the R-square value will only go up. The Adjusted R-square value, however, is smaller than the original regression (.9338 versus .9368) because it includes a penalty for adding additional predictor variables. When a new variable is added and this value goes down, it is an indication that the new variable is not adding any additional predictive power to the model. This is confirmed by looking at the P-value of the Flyer variable (.715188), which is much larger than the 0.05 threshold required for significance at the 95% level. Thus, it does not appear from these results that the flyer promotion provides any additional sales lift, at least on the day that it appears in the daily paper. Care must be taken, however, in making broader interpretations of regression results. For example, you could conclude from the lack of significance in the regression model that the flyer promotion does not add any value. This conclusion may be incorrect, however, because the regression equation only measures the sales lift of the promotion on the day that the flyer appears. Thus, it could be the case that the flyer does increase the overall brand awareness of the store and, in its absence; the overall sales could be lower over an extended period of time.