# PARTIAL CORRELATION

Of course, I haven’t proven anything by all this laying out of tables. Don’t misunderstand me. Elaboration tables are a great start—they test whether your ideas about some antecedent or intervening variables are plausible by showing what could be going on—but they don’t tell you how things work or how much those antecedent or intervening variables are contributing to a correlation you want to understand. For that, we need something a bit more . . . well, elaborate. Partial correlation is a direct way to control for the effects of a third (or fourth or fifth . . .) variable on a relationship between two variables.

Here’s an interesting case. Across the 50 states in the United States, there is a stunning correlation (r = .778) between the percentage of live births to teenage mothers (15-19 years of age) and the number of motor vehicle deaths per hundred million miles driven. States that have a high rate of road carnage have a high rate of births to teenagers, and vice versa.

This one’s a real puzzle. Obviously, there’s no direct relation between these two variables. There’s no way that the volume of highway slaughter causes the number of teenage mothers (or vice versa), so we look for something that might cause both of them.

I have a hunch that these two variables are correlated because they are both the consequence of the fact that certain regions of the country are poorer than others. I know from my own experience, and from having read a lot of research reports, that the western and southern states are poorer, overall, than are the industrial and farming states of the Northeast and the Midwest. My hunch is that poorer states will have fewer miles of paved road per million people, poorer roads overall, and older vehicles. All this might lead to more deaths per miles driven.

Table 22.14 shows the zero order correlation among three variables: motor vehicle deaths per hundred million miles driven (it’s labeled MVD in table 22.14); the percentage

Table 22.14 Correlation Matrix for Three Variables

 Variable 1 MVD Variable 2 TEENBIRTH Variable 3 INCOME MVD 1.00 TEENBIRTH .778 1.00 INCOME -.662 -.700 1.00

SOURCE: MVD for 1995, Table 1018, Statistical Abstract of the United States (1997). TEENBIRTH for 1996, Table 98, Statistical Abstract of the United States (1997). INCOME for 1996, Table 706, Statistical Abstract of the United States (1997).

of live births to young women 15-19 years of age (TEENBIRTH); and average personal income (INCOME). Zero-order correlations do not take into account the influence of other variables.

We can use the formula for partial correlation to test directly what effect, if any, income has on the correlation between TEENBIRTH and MVD. The formula for partial correlation is:

where r12.3 = means ‘‘the correlation between variable 1 (MVD) and variable 2 (TEENBIRTH), controlling for variable 3 (INCOME) is . . .’’

Table 22.15 Calculating the Partial Correlations for the Entries in Table 22.14

 Pairs Pearson's r r2 1 — r2 V1 — r2 r12 (MVD and TEENBIRTH) .778 0.60528 0.39472 0.628264 r13 (MVD and INCOME) — .662 0.43824 0.56176 0.749504 r32 (TEENBIRTH and INCOME) — .700 0.49 0.51 0.714143

Table 22.15 shows the calculation of the partial correlations for the entries in table 22.14. So, the partial correlation between MVD and TEENBIRTH, controlling for INCOME is:

which we can round off to .59. (Remember, we have to use a lot of decimal places during the calculations in order to keep the rounding error in check. When we get through with the calculations we can round off to two or three decimal places.) In other words, when we partial out the effect of income, the correlation between MVD and TEENBIRTH drops from about 0.78 to about 0.59. That’s because income is correlated with motor vehicle deaths (r = —0.662) and with teenage births (r = —0.700).

If a partial correlation of 0.59 between the rate of motor vehicle deaths and the rate of teenage births still seems high, then perhaps other variables are at work. You can ‘‘partial out’’ the effects of two or more variables at once, but as you take on more variables, the formula naturally gets more complicated.

A simple correlation is referred to as a zero-order correlation. Formula 22.1 is for a first-order correlation. The formula for a second-order correlation (controlling for two variables at the same time) is:

You can also work out and test a model of how several independent variables influence a dependent variable all at once. This is the task of multiple regression, which we’ll take up next. (For more on partial correlation, see Gujarati 2003.)