A lot of data are reported in intervals, or groups. For example, some people are uncomfortable with a straightforward question like ‘‘How much money do you make?’’ so researchers often ask something like:

Now we’d like to get an idea of about how much you earn each year. Do you earn:

(1) less than $10,000 per year?

(2) $10,000 or more but less than $20,000 per year?

(3) $20,000 or more but less than $30,000 per year?

(4) $30,000 or more but less than $40,000 per year?

(5) $40,000 or more but less than $50,000 per year?

and so on. This produces grouped data. In table 20.3b, the two middle scores for AGE are 46 and 47, so the median is 46.5. Table 20.4 shows the data on AGE from table 20.3b, grouped into 10-year intervals. To find the median in grouped data, use the formula for finding any percentile score in a distribution of grouped data:

where:

PS is the percentile score you want to calculate;

L is the true lower limit of the interval in which the percentile score lies; n is the case number that represents the percentile score;

C is the cumulative frequency of the cases up to the interval before the one in which the percentile score lies; i is the interval size; and

f is the count, or frequency, of the interval in which the median lies.

Table 20.4 Frequency Table of the Grouped Variable AGE

Count

Cum. count

Variable AGE

6

6

20-29

6

12

30-39

5

17

40-49

8

25

50-59

5

30

60 +

In applying formula 20.1 to the data in table 20.4, the first thing to do is calculate n. There are 30 cases and we are looking for the score at the 50th percentile (the median), so n is (30)(.50) = 15. We are looking, in other words, for a number above which there are 15 cases and below which there are 15 cases. Looking at the data in table 20.4, we see that there are 12 cases up to 39 years of age and 17 cases up to 49 years of age. So, C is 12, and the median case lies somewhere in the 40-49 range.

The true lower limit, L, of this interval is 39.5 (midway between 39 and 40, the bound?ary of the two groups), and the interval, i, is 10 years. Putting all this into the formula, we get:

So, the median age for the grouped data on 30 people is 45.5 years. Notice that: (1) The median for the ungrouped data on AGE in table 20.3b is 46.5, so in grouping the data we lose some accuracy in our calculation; and (2) None of the respondents in table 20.3b actually reported a median age of 45.5 or 46.5.

I’ve given you this grand tour of the median as a specific percentile score because I want you to understand the conceptual basis for this statistic. I’m going to do the same thing for all the statistical procedures I introduce here and in the next two chapters. Once you understand the concepts behind these statistics (the median, the standard deviation, z-scores, chi-square, t-tests, and regression), you should do all future calculations by computer. It’s not just easier to calculate statistics by computer—it’s more accurate. You’re less likely to make mistakes in recording data on a computer and when you do make mistakes (I make them all the time), it’s easier to find and fix them. Just think of how easy it is to find spelling errors in a document when you use a word processor.