CENTRAL TENDENCY: THE MEAN
The arithmetic mean, or the average, is the sum of the individual scores in a distribution, divided by the number of scores. The formula for calculating the mean is:
where x (read: x-bar) is the mean, 2 x means ‘‘sum all the values of x” and n is the number of values of x. To calculate the mean, or average age of the 30 respondents whose data are shown in table 20.2, we add up the 30 ages and divide by 30. The mean age of these 30 respondents is 45.033 years. (We use x when we refer to the mean of a sample of data; we use the Greek letter ^ when we refer to the mean of an entire population.)
The formula for calculating the mean of a frequency distribution is:
where 2 fx is the sum of the attributes of the variable times their frequencies.
Table 20.5 shows the calculation of the mean age for the frequency distribution shown in table 20.3b.
Calculating the Mean of Grouped Data
Table 20.6 shows the calculation of the mean for the grouped data on AGE in table20.4. When variable attributes are presented in ranges, as in the case here, we take the midpoint of the range.
Note the problem in taking the mean of the grouped data in table 20.6. If you go back to table 20.3b, you’ll see that all six of the people who are between 20 and 29 are really between 20 and 25. Counting them all as being 25 (the midpoint between 20 and 29) distorts the mean.
Also, there are five people over 60 in this data set: one who is 60, two who are 67, and one each who are 69 and 78. Their average age is 68.2, but they are all counted as being just 60 + in table 20.4. In calculating the mean for these grouped data, I've assigned the
Table 20.5 Calculating the Mean for the Data in Table 20.3b
Count (f) |
AGE (x) |
fx |
1 |
20 |
20 |
1 |
21 |
21 |
2 |
24 |
48 |
2 |
25 |
50 |
1 |
31 |
31 |
1 |
34 |
34 |
1 |
35 |
35 |
1 |
37 |
37 |
2 |
38 |
76 |
1 |
41 |
41 |
2 |
46 |
92 |
1 |
47 |
47 |
1 |
49 |
49 |
1 |
51 |
51 |
1 |
52 |
52 |
3 |
53 |
159 |
1 |
54 |
54 |
1 |
56 |
56 |
1 |
57 |
57 |
1 |
60 |
60 |
2 |
67 |
134 |
1 |
69 |
69 |
1 |
78 |
78 |
2 fx = 1,351 |
||
2 fx/ n = 1,351/30 = 45.033 |
Table 20.6 Calculating the Mean for the Frequency Table of the Grouped Variable AGE
AGE range |
Midpoint (x) |
f |
fx |
20-29 |
25 |
6 |
150 |
30-39 |
35 |
6 |
210 |
40-49 |
45 |
5 |
225 |
50-59 |
55 |
8 |
440 |
60 + |
65 |
5 |
325 |
О CO II c |
2 fx = 1,350 |
||
x = 1,350/30 |
|||
= 45.00 |
midpoint to be 65, as if the range were 60-69, even though the actual range is 60-78, and the real midpoint of the 60 + category is
If you have grouped data, however, it will usually be because the data were collected in grouped form to begin with. In that case, there is no way to know what the real range or the real midpoint is for the 60 + category, so we have to assign a midpoint that conforms to the midpoints of the other ranges. Applying the midpoint for all the other classes, the midpoint for the 60 + category would be 65, which is what I’ve done in table 20.6.
You can see the result of all this distortion: The grouped data have a mean of 45.00, while the ungrouped data have a calculated mean of 45.033. In this case, the difference is teeny, but it won’t always be that way. If you collect data in groups about interval variables like age, you can never go back and see how much you’ve distorted things. So, to repeat: It’s always better to collect interval data at the interval level, if you can, rather than in grouped form. You can group the data later, during the analysis, but you can’t ‘‘ungroup’’ them if you collect data in grouped form to begin with.