THE CENTRAL LIMIT THEOREM
The fact that many variables are not normally distributed would make sampling a hazardous business, were it not for the central limit theorem. According to this theorem, if you take many samples of a population, and if the samples are big enough, then:
- 1. The mean and the standard deviation of the sample means will usually approximate the true mean and standard deviation of the population. (You’ll understand why this is so a bit later in the chapter, when we discuss confidence intervals.)
- 2. The distribution of sample means will approximate a normal distribution.
We can demonstrate both parts of the central limit theorem with some examples.
of the Central Limit Theorem
Table 6.1 shows the per capita gross domestic product (PCGDP) for the 50 poorest countries in the world in 2007.
Here is a random sample of five of those countries: Uzbekistan, Senegal, Guinea, Rwanda, and Liberia. Consider these five as a population of units of analysis. In 2007, these countries had an annual per capita GDP, respectively of $704, $908, $452, $354, and $195 (U.S. dollars). These five numbers sum to $2,613 and their average, 2613/5, is $522.60.
There are 10 possible samples of two elements in any population of five elements. All 10 samples for the five countries in our example are shown in the left-hand column of table 6.2. The middle column shows the mean for each sample. This list of means is the sampling distribution. And the right-hand column shows the cumulative mean.
Notice that the mean of the means for all 10 samples of two elements—that is, the mean of the sampling distribution—is $522.60, which is exactly the actual mean per capita GDP of the five countries in the population. In fact, it must be: The mean of all possible samples of size 2 is equal to the parameter that we’re trying to estimate.
Figure 6.4 (left) is a frequency polygon that shows the distribution of the five actual GDP values. A frequency polygon is just a histogram with lines connecting the tops of the bars so that the shape of the distribution is emphasized. Compare the shape of this distribution to the one in figure 6.4 (right) showing the distribution of the 10 sample means for the five GDP values we’re dealing with here. That distribution looks more like the shape of the normal curve: It’s got that telltale bulge in the middle.