Linear Regression

Figure 12.0 Linear Regression
Overview
From birds and butterflies to street widths and urban sprawl, linear regression explores the details of relationships. Researchers from many disciplines frequently use linear regression, evaluating two or more variables to understand how those variables are related—or, to reveal that no relationship exists.
In earlier chapters you learned how to use descriptive statistics, along with tools such as cross tabulation and correlation, to assess and describe the associations between variables. In this chapter you will learn about more sophisticated tools that can assist practicing planners and academics alike. With a firm understanding of linear regression, you will be able to interpret a great deal of quantitative research and will gain an important technique for your own research. It will also prepare you for subsequent chapters in a companion book, Advanced Quantitative Research Methods for Urban Planners, in which more advanced statistical methods are discussed that draw on concepts from linear regression.
Linear regression allows planning researchers to predict the value of one variable— known as the outcome, response, or dependent variable—from the value of one (in the case of simple regression) or more (in the case of multiple regression) other variables— known as predictor, explanatory, or independent variables. Because it enables a straightforward analysis of relationships between multiple, measurable constructs, linear regression is one of the most commonly used analytical methods in planning research. This chapter includes:
- • A brief intellectual history of linear regression
- • A general explanation of how and when planners use linear regression
- • A detailed description of ordinary least squares (OLS) regression, including how to perform simple and multiple regression, how to evaluate model strength and reliability, and how to interpret model results, all using a simplified hypothetical planning example
- • A realistic example of how to perform multiple regression for planning research, including screenshots from SPSS and R
- • A discussion of the assumptions behind OLS linear regression, problems that violate these assumptions, ways to diagnose these problems, and potential solutions (all based on the realistic example)
- • Case studies from the planning literature that utilize OLS regression
While textbooks on regression analysis are ubiquitous, the goal of this chapter is to make the method especially relevant and vivid for urban planners. Readers requiring more background detail should see the Works Cited at the end of this chapter.
Purpose
Linear regression is at the core of social statistics and is the inferential method researchers often look to first for describing and analyzing patterns in empirical data. The objective is to identify the straight-line (hence linear) equation that best fits the data in question (see Figure 12.0 to determine when to use regression in comparison to other inferential statistical methods). Valid application of linear regression depends on a main assumption: that the relationship between the dependent variable and independent variable(s) is linear. Other assumptions are listed, and dealt with, in the Step by Step section of this chapter.
Regression models can be used for explanation (to better understand what is happening in a relationship), prediction (to estimate the outcome of one event in relation to another), and testing hypotheses. Often regression models accomplish all three purposes. These findings can, in turn, help inform and influence planning and policy decisions. Before turning to the details of how to perform regression analysis, let us consider some examples of research that utilized linear regression for each of these purposes.
Explanation
Regression helps researchers explain the factors behind observed patterns by modeling the relationship between dependent and independent variables with a linear equation. If the model is explanatory, each independent variable is assessed by the degree to which its variation can account for the variation in the dependent variable’s values.
In California, researchers were interested in discovering if the presence of particulates from pollution on urban sidewalks (the dependent variable) was related to traffic and land use factors (the independent variables) (Boarnet et al., 2011). Their model was not trying to predict concentrations but simply to explore the relationships among the variables, and found meaningful associations between the dependent variable and the independent variables of interest after accounting for meteorological factors.
Prediction
Regression also allows the modeling of a dependent variable in order to predict values in other places and times, consistently and accurately. If the model is predictive in nature, the regression equation is describing how well each of the independent variables perform as predictors of the dependent variable.
In Minnesota, data on residential proximity to a landfill was used to predict the negative effect on housing value. The researchers found that houses within two miles of a landfill lost value because of the landfill; beyond two miles the effect was not present. Their findings suggested implications for the siting of new landfills (Nelson, Gener-eux, & Genereux, 1992).
Hypothesis Testing
Linear regression allows for the evaluation of hypotheses regarding functional or causal relationships. A classic example, the “Broken Window” theory, asserts that there is a positive relationship between vandalism and other crimes (the more broken windows a neighborhood has, the more other crimes will occur there). A regression analysis can indicate whether or not this association really exists. If the theory is correct, a linear regression would indicate a positively sloped line, where crime (the dependent/outcome variable on the y-axis) increases as the number of broken windows (the independent/predictor variable on the x-axis) increases. And the fit of the line to the data would be strong.
In the San Francisco Bay Area, data on residential traffic speeds was evaluated for a relationship with street width (Daisa & Peers, 1997). The researchers were interested in determining if wider streets led to higher speeds. The outcome of their regression model indicated that wider residential streets do experience higher speeds (about 0.8 mph for each five-foot increase). These findings have been used by cities as they consider design standards for new residential streets and traffic calming measures for existing ones (Sacramento Transportation & Air Quality Collaborative, 2005). Later in this chapter, we use a hypothetical dataset based on this example to introduce the mathematics of linear regression.
Wise Use of Regression
Authors of textbooks on regression have noted that students often know how to run a regression model on a computer, but they don’t understand what it is they are doing (Draper & Smith, 1998). The use of statistical computer programs has made linear regression, and many other techniques, accessible to nearly everyone. However, this accessibility often leads to misuse or misinterpretation, especially when the
Linear Regression 223 researcher is not familiar with the concepts and theory behind the model. This chapter will fill that gap. While linear regression is a simple and elegant tool, it depends upon assumptions and details that require close attention. The successful application of regression analysis requires a balance of theoretical results, empirical rules, and subjective judgment (Chatterjee & Price, 2000).