To give you an absolutely clear idea of how the regression formula works, table 21.16 shows all the predictions along the regression line for the data in table 21.14.

Table 21.16 Regression Predictions for the Dependent Variable in Table 21.14

For the country of

Where the infant mortality rate in 2008 was

Predict that the TFR will be

And compare that to the actual TFR in table 20.8

Armenia

22.2

1.20 + .056(22.2) = 2.44

1.79

Chad

48.7

1.20 + .056(48.7) = 3.93

5.78

Ghana

67.0

1.20 + .056(67.0) = 4.95

4.00

El Salvador

17.5

1.20 + .056(17.5) = 2.18

2.22

Iran

24.2

1.20 + .056(24.2) = 2.56

1.74

Latvia

8.3

1.20 + .056(8.3) = 1.66

1.48

Namibia

27.2

1.20 + .056(27.2) = 2.72

3.07

Panama

15.7

1.20 + .056(15.7) = 2.08

2.41

Slovenia

3.6

1.20 + .056(3.6) = 1.40

1.47

Suriname

20.5

1.20 + .056(20.5) = 2.35

2.29

We now have two predictors of TFR: (1) the mean TFR, which is our best guess when we have no data about some independent variable like infant mortality, and (2) the values produced by the regression equation when we do have information about something like infant mortality.

Each of these predictors produces a certain amount of error, or variance, which is the difference between the predicted number for the dependent variable and the actual measurement. This is also called the residual—that is, what’s left over after making your prediction using the regression equation. (To anticipate the discussion of multiple regression in chapter 22: The idea in multiple regression is to use two or more independent variables in order to reduce the size of the residuals.)

You’ll recall from chapter 20, in the section on variance and the standard deviation, that in the case of the mean, the total variance is the average of the squared deviations of the observations from the mean, |2 (x — x)^{2} / n}. In the case of the regression line predictors, the variance is the sum of the squared deviations from the regression line. Table 21.17 compares these two sets of errors, or variances, for the data in table 21.14.

Table 21.17 Comparison of the Error Produced by Guessing the Mean TFR in Table 21.14 and the Error Produced by Applying the Regression Equation for Each Guess

Country

TFR

^{y}

Old error

(y - v)^{2}

Prediction using the regression equation

New error (y - the prediction using the regression equation)^{2}

Armenia

1.79

0.71

2.44

0.42

Chad

5.78

9.23

3.93

3.42

El Salvador

2.22

0.17

2.18

0.002

Ghana

4.00

1.88

4.95

0.90

Iran

1.74

0.79

2.56

0.67

Latvia

1.48

1.32

1.66

0.03

Namibia

3.07

0.19

2.72

0.12

Panama

2.41

0.05

2.08

0.11

Slovenia

1.47

1.35

1.40

0.005

Suriname

2.29

0.12

2.35

0.004

2 = 15.81

2 = 5.68

We now have all the information we need for a true PRE measure of association between two interval variables. Recall the formula for a PRE measure: the old error minus the new error, divided by the old error. For our example in table 21.14:

In other words: The proportionate reduction of error in guessing the TFR in table 21.14— given that you know the distribution of informant mortality rates and can apply a regression equation—compared to just guessing the mean of TFR is 0.64, or 64%.

This quantity is usually referred to as r-squared (written r^{2}), or the amount of variance accounted for by the independent variable. It is also called the coefficient of determination because it tells us how much of the variance in the dependent variable is predictable from (determined by) the scores of the independent variable. The Pearson product moment correlation, written as r, is the square root of this measure, or, in this instance, 0.80. (We calculated r in table 21.15 by applying formula 21.22 and got r = 0.81. The difference is rounding error when we do these calculations by hand. You won’t get this error when you use a computer to do the calculations.)