of the Central Limit Theorem
Figure 6.5 shows the distribution of the 50 data points for GDP in table 6.1. The range is quite broad, from $118 to $978 per year per person, and the shape of the distribution is multimodal.
The actual mean of the data in table 6.1—that is, the parameter we want to estimate—is $533.28. There are 2,118,760 samples of size 5 that can be taken from 50 eleTable 6.1 Per Capita Gross Domestic Product (PCGDP) in U.S. Dollars for the 50 Poorest Countries in the World, 2007
Country |
PCGDP |
Country |
PCGDP |
Burundi |
118 |
Burkina Faso |
483 |
DR-Congo |
151 |
Mali |
554 |
Zimbabwe |
159 |
Tajikistan |
555 |
Liberia |
195 |
Comoros |
556 |
Ethiopia |
201 |
Cambodia |
598 |
Guinea-Bissau |
211 |
Haiti |
612 |
Malawi |
257 |
Benin |
618 |
Eritrea |
271 |
N. Korea |
618 |
Niger |
289 |
Ghana |
647 |
Somalia |
291 |
Chad |
692 |
Sierra Leone |
330 |
Kyrgyzstan |
704 |
Afghanistan |
345 |
Uzbekistan |
704 |
Rwanda |
354 |
Laos |
711 |
Mozambique |
362 |
Kiribati |
762 |
Tanzania |
368 |
Kenya |
786 |
Gambia |
377 |
Lesotho |
797 |
Madagascar |
377 |
Viet Nam |
815 |
Myanmar |
379 |
Mauritania |
874 |
Togo |
386 |
Senegal |
908 |
Timor-Leste |
393 |
Sao Tome and Principe |
912 |
Central African Rep. |
394 |
Papua New Guinea |
953 |
Uganda |
403 |
Yemen |
967 |
Nepal |
419 |
Zambia |
974 |
Bangladesh |
428 |
India |
976 |
Guinea |
452 |
Solomon Islands |
978 |
SOURCE: United Nations, Dept. of Economic and Social Affairs, Economic and Social Development. http:// unstats.un.org/unsd/demographic/products/socind/inc-eco.htm.
Table 6.2 All Samples of Two from Five Elements
Sample |
Mean |
Cumulative mean |
|||
Uzbekistan and Senegal |
(704 |
+ |
908)/2 = |
806.0 |
806.0 |
Uzbekistan and Guinea |
(704 |
+ |
452)/2 = |
578.0 |
1,384.0 |
Uzbekistan and Rwanda |
(704 |
+ |
354)/2 = |
529.0 |
1,913.0 |
Uzbekistan and Liberia |
(704 |
+ |
195)/2 = |
449.5 |
2,362.5 |
Senegal and Guinea |
(908 |
+ |
452)/2 = |
680.0 |
3,042.5 |
Senegal and Rwanda |
(908 |
+ |
354)/2 = |
631.0 |
3,673.5 |
Senegal and Liberia |
(908 |
+ |
195)/2 = |
551.5 |
4,225.0 |
Guinea and Rwanda |
(452 |
+ |
354)/2 = |
403.0 |
4,628.0 |
Guinea and Liberia |
(452 |
+ |
195)/2 = |
323.5 |
4,951.5 |
Liberia and Rwanda |
(195 |
+ |
354)/2 = |
274.5 |
5,226.0 |
x = |
5,226/10 = |
522.6 |
ments. Table 6.3 shows the means from 10 samples of five countries chosen at random from the data in table 6.1.
Even in this small set of 10 samples, the mean is $504.72—quite close to the actual mean of $533.28. Figure 6.6 (left) shows the distribution of these samples. It has the look of a normal distribution straining to happen. Figure 6.6 (right) shows 20 samples of five from the 50 countries in table 6.1. The strain toward the normal curve is unmistakable and the mean of those 20 samples is $505.18.
The problem is that in real research, we don’t get to take 10 or 20 samples. We have
FIGURE 6.4.
Five cases and the distribution of samples of size 2 from those cases.
FIGURE 6.5.
The distribution of the 50 data points for GDP in table 6.1.
Table 6.3 10 Means from Samples of Size 5 Taken from the 50 Elements in Table 6.1
522.60 |
652.80 |
434.40 |
461.20 |
586.20 |
489.20 |
468.20 |
458.60 |
465.00 |
509.00 |
Mean = 504.72 Standard Deviation = 67.51
to make do with one. The first sample of five elements that I took had a mean of $522.60—pretty close to the actual mean of $533.28. But it’s very clear from table 6.3 that any one sample of five elements from table 6.1 could be off by a lot. They range, after all, from $434.40 to $652.80. That’s a very big spread, when the real average we’re trying to estimate is $533.28. Still, as you can see from figure 6.6, as we add samples, the mean of the samples gets closer and closer to the parameter we’re trying to estimate and the distribution of the means of the samples looks more and more like the normal distribution.
We are much closer to answering the question: How big does a sample have to be?