The first rule for coding quantitative data is: Don’t analyze while you’re coding. This rule is the exact opposite of the rule that applies to inductive coding of qualitative data. In that case, coding text is analysis—thinking about what each piece of text means, developing hypotheses about the people who are described, boiling the text down to a series of mnemonics.
It’s different with a set of numbers. Suppose you ask 400 randomly selected people, aged 20-70, how old they are. You could get as many as 51 different ages, and you’ll probably get at least 20 different ages.
I’ve seen many researchers code this kind of data into four or five categories—such as 20-29, 30-39, 40-49, 50 and older—before seeing what they’ve got. Recall from chapter 2 that this just throws away the interval-level power of data about age. You can always tell the computer to package data about age (or income, or any interval-level variable) into a set of ordinal chunks. But if you actually code the data into ordinal chunks to begin with, you can never go back.
Here’s a concrete example of something that’s a little more complex than age. Gene Shelley studied the strength of ties between friends and acquaintances (Shelley et al. 1990). Every other day for a month, she called 20 informants on the phone to talk about things they’d learned in the previous 2 days about their friends and acquaintances. People mentioned things like ‘‘So-and-so told me she was pregnant,’’ “So-and-so’s father called and told me my friend made his first jump in parachute school,’’ and so on. Shelley asked people to estimate how long it had been between the time something happened to one of their friends/acquaintances and the time they (the informants) heard about it. This estimated time was the major dependent variable in the research.
There were 20 informants, who submitted to 15 interviews each, and in each interview almost every informant was able to name several events of interest. Thus, there were over 1,000 data records (one for each event remembered by an informant). The length of time estimated by informants between an event happening to someone they knew and their hearing about it ranged from ‘‘immediately,’’ to ‘‘10 years,’’ with dozens of different time periods in between (‘‘about 5 minutes,’’ ‘‘two and a half months,’’ etc.).
The temptation was to make up about five codes, like 1 = 5 minutes or less, 2 = 6 minutes to 19 minutes, 3 = 20 minutes to an hour, and so on. But how do you decide what the right breaks are? Shelley decided to code everything in days or fractions of days (1 minute is .0007 days; 10 years is 3,650 days, without worrying about leap years) (Shelley et al. 1990). Shelley didn’t throw away data by turning a ratio-level variable (minutes) into an ordinal variable (arbitrary chunks of time).
Here’s another example, using a nominal variable. Suppose you are studying the personal histories of 200 Mexican men who have had experience as illegal labor migrants to the United States. If you ask them to name the towns in which they have worked, you might get a list of 300 communities—100 more than you have informants! The temptation would be to collapse the list of 300 communities into a shorter list, using some kind of scheme. You might code them as Southeast, Southwest except California, California,
Midwest, Northwest, mid-Atlantic, and so on. Once again, you’d be making the error of doing your analysis in the coding.
Once you’ve got all the data entered into a computer, you can print them, lay them out, stare at them, and start making some decisions about how to ‘‘package’’ them for statistical analysis. You might decide to label each of the 300 communities in the list according to its population size, or according to its ethnic and racial composition (more than 20% Spanish surname, for example), or its distance in kilometers from the Mexican- U.S. border. All those pieces of information are available from the U.S. Census and other sources online. But if you collapse the list into a set of categories during coding, then your option to add codes about the communities is closed off.