We covered basic regression in chapter 21, but just to bring you back up to speed, remember that in simple regression we use an equation that expresses how an independent variable is related to a dependent variable. On the left-hand side of the equation, we have the unknown score for y, the dependent variable. On the right-hand side, we have the y-intercept, called a. It’s the score for y if the dependent variable were zero. We have another coefficient, called b, that tells by how much to multiply the score on the independent variable for each unit change in that variable.

The general form of the equation (from chapter 21, formula 21.19) is:

which means that the dependent variable, y, equals some constant plus another constant times the independent variable, x. So, for example, a regression equation like:

predicts that, on average, people with a high school education will start out earning $22,000 a year; people with1 year of college will earn $26,000; and so on. A person with

9 years of university education (say, someone who has a Ph.D.) would be predicted to start at $58,000:

Suppose that the average starting salary for someone who has a Ph.D. is $75,000. Several things could account for the discrepancy between our prediction and the reality. Sampling problems, of course, could be the culprit. Or it could be that there is just a lot of variability in starting salaries of people who have the Ph.D. English teachers who go to work in small, liberal arts colleges might start at $45,000 and people who have a Ph.D. in finance and who go to work for major brokerage companies might start at $150,000.

No amount of fixing the sample will do anything to get rid of the variance of starting salaries. In fact, the better the sample, the better it will reflect the enormous variance in those salaries.

In simple regression, if starting salary and years of education are related variables, we want to know ‘‘How accurately can we predict a person’s starting salary if we know how many years of education they have beyond high school?’’ In multiple regression, we build more complex equations that tell us how much each of several independent variables contributes to predicting the score of a single dependent variable.

A typical question for a multiple regression analysis might be ‘‘How well can we predict a person’s starting salary if we know how many years of college they have, and their major, and their gender, and their age, and their ethnic background?” Each of those independent variables contributes something to predicting a person’s starting salary after high school.

The regression equation for two independent variables, called x and x_{2}, and one dependent variable, called y, is:

which means that we need to find a separate constant—one called b_{1} and one called b_{2}—by which to multiply each of the two independent variables. The general formula for multiple regression is:

(See box 22.1.)

BOX 22.1

MULTIPLE REGRESSION IS ALSO A PRE MEASURE

Recall that simple regression yields a PRE measure, r^{2}. It tells you how much better you can predict a series of measures of a dependent variable than you could by just guessing the mean for every measurement. Multiple regression is also a PRE measure. It, too, tells you how much better you can predict measures of a dependent variable than you could if you guessed the mean—but using all the information available in a series of independent variables.

The key to regression are those b coefficients, or weights, in formula 22.4. We want coefficients that, when multiplied by the independent variables, produce the best possible prediction of the dependent variable—that is, we want predictions that result in the smallest possible residuals. Those coefficients, by the way, are not existential constants. Remember that samples yield statistics, which only estimate parameters. Your statistics change with every sample you take and with the number of independent variables in the equation.