Pearson’s r measures how much changes in one variable correspond with equivalent changes in the other variables. It can also be used as a measure of association between an interval and an ordinal variable, or between an interval and a dummy variable. (Dummy variables are nominal variables coded as 1 or 0, present or absent. See chapter 19 on text analysis.) The square of Pearson’s r is a PRE measure of association for linear relations between interval variables. R-squared tells us how much better we could predict the scores of a dependent variable, if we knew the scores of some independent variable.

Table 21.14 shows data for two interval variables for a random sample of 10 of the 50 countries in table 20.8: (1) infant mortality and (2) life expectancy for women.

To give you an idea of where we're going with this example, the correlation between INFMORT and TFR across the 50 countries in table 20.8 is around 0.91, and this is reflected in the sample of 10 countries for which the correlation is r = 0.81.

Now, suppose you had to predict the TFR for each of the 10 countries in table 21.14 without knowing anything about the infant mortality rate for those countries. Your best guess—your lowest prediction error—would be the mean, 2.63 children per woman. You

Table 21.13 Computing Spearman's Rank Order Correlation Coefficient for the Data in Table 21.12

Hunter

Rank for meat

Rank for fish

Difference in the ranks

d^{2}

Alejandro

1

10

-9

81

Jaime

2

9

-7

49

Leonardo

3

15

-12

144

Humberto

4

6

-2

4

Daniel

5

7

-2

4

Joel

6

12

-6

36

Jorge

7

14

-7

49

Timoteo

8

16

-8

64

Tomas

9

5

4

16

Lucas

10

8

2

4

Guillermo

11

2

9

81

Victor

12

11

1

1

Manuel

13

13

0

0

Benjamin

14

4

10

100

Jonatan

15

3

12

144

Lorenzo

16

1

15

225

total d^{2} 1,002

^{r}“ = - wa-i) = - ^{6012/4080} = -^{474}

Table 21.14 Infant Mortality by TFR for 10 Countries from Table 20.8

Country

INFMORT

x

TFR

^{y}

Armenia

22.2

1.79

Chad

129.9

5.78

El Salvador

17.5

2.22

Ghana

67.0

4.00

Iran

24.2

1.74

Latvia

8.3

1.48

Namibia

27.2

3.07

Panama

15.7

2.41

Slovenia

3.6

1.47

Suriname

20.5

2.29

Mean of x = 25.49

Mean of y = 2.63

FIGURE 21.4.

A plot of the data in table 21.14. The dotted lineisthemeanofTFR. Thesolidlineis drawn from the regression equation y = 1.018 +

.051 x.

can see this in figure 21.4 where I’ve plotted the distribution of TFR and INFMORT for the 10 countries in table 21.14.