# THE GENERAL LINEAR MODEL

Models that incorporate terms for more than one factor at a time can be used as an alternative to stratification to achieve control of confounding. These models succeed in controlling confounding because when several risk factors are included, the effect of each is unconfounded by the others. Let us consider an extension of the simple linear model in Figure 12-1 to a third variable.

Equation 12-1, like the one for Figure 12-1, has the same outcome variable, *Y* (also known as the *dependent* variable), but there are now two predictor variables, X) and *X _{2},* which are referred to as

*independent*variables. Suppose that

*Y*is the mortality rate from laryngeal cancer, as in Figure 12-1, and that

*X*as before, is the number of cigarettes smoked daily. The new variable, X

_{1},_{2}, might be the number of grams of alcohol consumed daily (alcohol is also a risk factor for laryngeal cancer). With two independent variables and one dependent variable, the data points must now be visualized as being located within a threedimensional space: two dimensions for the two independent variables and one dimension for the dependent variable. Imagine a room in which the edge of the floor against one wall is the axis for X

_{;}and the edge where the adjacent wall meets the floor is the axis for X

_{2}. The line from floor to ceiling where these two adjacent walls meet would be the

*Y*axis. Equation 12-1 is a straight line through the three-dimensional space of this room.

What is the advantage of adding the term X_{2} to the model? Ordinarily, because cigarette smoking and alcohol consumption are correlated, we might expect that cigarette smoking and alcohol drinking would be mutually confounding risk factors for laryngeal cancer. A stratified analysis could remove that confounding, but the confounding can also be removed by fitting Equation 12-1 to the data. In a model such as Equation 12-1 with terms for two predictive factors, smoking (X_{;}) and alcohol (X_{2}), the coefficients for these terms, *a _{l}* and

*a*respectively, provide estimates of the effects of cigarette smoking and alcohol drinking that are mutually unconfounded. Mathematically, there is no limit to the number of terms that could be included as independent variables in a model, although limitations of the data provide a practical limit. The general form of Equation 12-1 is referred to as the

_{2}*general linear model.*