# STRATIFIED SAMPLING

Stratified random sampling ensures that key subpopulations are included in your sample. You divide a population (a sampling frame) into subpopulations (subframes), based on key independent variables and then take a random (unbiased), sample from each of those subpopulations. You might divide the population into men and women, or into rural and urban subframes—or into key age groups (18-34, 35-49, etc.) or key income groups. As the main sampling frame gets divided by key independent variables, the subframes presumably get more and more homogeneous with regard to the key dependent variable in the study.

In 2009, for example, the Quinnipiac University Poll asked a representative sample of 2,041 registered voters in the United States the following question: Do you think abortion should be legal in all cases, legal in most cases, illegal in most cases or illegal in all cases? Across all voters, 52% said that abortion should be legal in all (15%) or most (37%) cases and 41% said it should be illegal in all (14%) or most (27%) cases. (The remaining 7% had no opinion.)

These facts hide some important differences across religious, political, and other subgroups. Among Catholic voters, 50% said that abortion should be legal in all (8%) or most (42%) cases; among Jewish voters, 86% said that abortion should be legal in all (33%) or most (53%) cases. Among registered Democrats, 66% favored legal abortion in all or most cases; among registered Republicans, 30% took that position (Quinnipiac University 2009). Sampling from smaller chunks (by age, gender, and so on) ensures not only that you capture the variation, but that you also wind up understanding how that variation is distributed.

This is called maximizing the between-group variance and minimizing the within- group variance for the independent variables in a study. It’s what you want to do in building a sample because it reduces sampling error and thus makes samples more precise.

This sounds like a great thing to do, but you have to know what the key independent variables are. Shoe size is almost certainly not related to what people think is the ideal number of children to have. Gender and generation, however, seem like plausible variables on which to stratify a sample. So, if you are taking a poll to find out the ideal number of children, you might divide the adult population into, say, four generations: 15-29, 30-44, 45-59, and over 59.

With two genders, this creates a sampling design with eight strata: men 15-29, 30-44, 45-59, and over 59; women 15-29, 30-44, 45-59, and over 59. Then you take a random sample of people from each of the eight strata and run your poll. If your hunch about the importance of gender and generation is correct, you’ll find the attitudes of men and the attitudes of women more homogeneous than the attitudes of men and women thrown together.

Table 5.1 shows the distribution of gender and age cohorts for St. Lucia in 2001. The numbers in parentheses are percentages of the total population 15 and older (106,479), not percentages of the column totals.

Table 5.1 Estimated Population by Sex and Age Groups for St. Lucia, 2001

 Age cohort Males Females Total 15-29 21,097 (19.8%) 22,177 (20.8%) 43,274 (40.6%) 30-44 15,858 (14.9%) 16,763 (15.7%) 32,621 (30.6%) 45-59 8,269 (7.8%) 8,351 (7.8%) 16,620 (15.6%) >59 7,407 (7%) 6,557 (6.2%) 13,964 (13.1%) Total 52,631 (49.5%) 53,848 (50.5%) 106,479 (100%)

SOURCE: Govt. Statistics, St. Lucia. http://www.stats.gov.lc/cen2001.htm.

A proportionate stratified random sample of 800 respondents would include 112 men between the ages of 30 and 44 (14% of 800 = 112), but 120 women between the ages of 30 and 44 (15% of 800 = 120), and so on.

Watch out, though. We’re asking people about their ideal family size and thinking about stratifying by gender because we’re accustomed to thinking in terms of gender on questions about family size. But gender-associated preferences are changing rapidly in late industrial societies, and we might be way off base in our thinking. Separating the population into gender strata might just be creating unnecessary work. Worse, it might introduce unknown error. If your guess about age and gender being related to desired number of children is wrong, then using table 5.1 to create a sampling design will just make it harder for you to discover your error (box 5.2).

BOX 5.2

THE RULES ON STRATIFYING SAMPLES

Here are the rules on stratification: (1) If differences on a dependent variable are large across strata like age, sex, ethnic group, and so on, then stratifying a sample is a great idea. (2) If differences are small, then stratifying just adds unnecessary work. (3) If you are uncertain about the independent variables that could be at work in affecting your dependent variable, then leave well enough alone and don't stratify the sample. You can always stratify the data you collect and test various stratification schemes in the analysis instead of in the sampling.