Estimating Proportions in Samples for Smaller Populations

This general formula, 6.5, is independent of the size of the population. Florida has a population of about 18 million. A sample of 400 is .000022 of 18 million; a sample of 2,402 is .00013 of 18 million. Both proportions are microscopic. A random, representative sample of 400 from a population of 1 million gets you the same confidence level and the same confidence interval as you get with a sample of 400 from a population of 18 million.

Often, though, we want to take samples from relatively small populations. The key word here is “relatively.” When formula 6.4 or 6.5 calls for a sample that turns out to be 5% or more of the total population, we apply the finite population correction. The formula (from Cochran 1977) is:

where n is the sample size calculated from formula 6.5; n' (read: n-prime) is the new value for the sample size; and N is the size of the total population from which n is being drawn.

Here’s an example. Suppose you are sampling the 540 resident adult men in a Mexican village to determine how many have ever worked illegally in the United States. How many of those men do you need to interview to ensure a 95% probability sample, with a 5% confidence interval? Answer: Because we have no idea what the percentage is that we’re trying to estimate, we set P and Q at .5 each in formula 6.4. Solving for n (sample size), we get:

which we round up to 385. Then we apply the finite population correction:

This is still a hefty percentage of the 540 people in the population, but it’s a lot smaller than the 385 called for by the standard formula (box 6.1).

BOX 6.1

SETTLING FOR BIGGER CONFIDENCE INTERVALS

If we were willing to settle for a 10% confidence interval, we'd need only 82 people in this example, but the trade-off would be substantial. If 65 out of 225, or 29%, reported that they had worked illegally in the United States, we would be 68% confident that from 24% to 34% really did, and 95% confident that 19% to 39% did. But if 24 out of 82 (the same 29%) reported having worked illegally in the United States, we'd be 68% confident that the true figure was between 19% and 39%, and 95% confident that it was between 9% and 49%. With a spread like that, you wouldn't want to bet much on the sample statistic of 29%.

If it weren't for ethnography, this would be a major problem in taking samples from small populations—the kind we often study in anthropology. If you've been doing ethnography in a community of 540 people for 6 months, though, you may feel comfortable taking a confidence interval of 10% because you are personally (not statistically) confident that your intuition about the group will help you interpret the results of a small sample.

Another Catch

All of this discussion has been about estimating single parameters, whether proportions or means. You will often want to measure the interaction among several variables at once. Suppose you study a population of wealthy, middle-class, and poor people in India. That’s three kinds of people. Now add two sexes, male and female (that makes six kinds of people) and two religions, Hindu and Muslim (that makes 12 kinds). If you want to know how all those independent variables combine to predict, say, average number of children desired, the sampling strategy gets more complicated.

Representative sampling is one of the trickiest parts of social research. I recommend strongly that you consult an expert in sampling if you are going to do complex tests on your data (Further Reading: sampling theory and sample design).

FURTHER READING

Sampling theory and sample design: Jaeger (1984); Kish (1965); Levy and Lemeshow (1999); Sudman (1976).