Sampling in Content Analysis
There are two components to sampling in content analysis. The first is identifying the corpus of texts; the second is identifying the units of analysis within the texts. If you collect
40 or 50 life histories, then you naturally analyze the whole corpus. But when the units of data run into the hundreds or even thousands—like all television commercials that ran during prime time in August 2005; all front-page stories of the New York Times from 1851 to 2005; all campaign speeches by John Kerry and George W. Bush during the 2004 presidential campaign—then a representative sample of records must be made.
Gilly (1988) did a cross-cultural study of gender roles in advertising. She videotaped a sample of 12 hours of programming in Los Angeles (United States), Monterrey (Mexico), and Brisbane (Australia), from 8 a.m. to 4 p.M. on Tuesday and from 7 p.M. to 11 p.M. on Wednesday. To control for seasonal variation between the hemispheres, the U.S. and Mexico samples were taken in September 1984 and the Australia sample was taken in February 1985. There were 617 commercials: 275 from the United States, 204 from Mexico, and 138 from Australia.
Because of her research question, Gilly used only adult men and women who were on camera for at least 3 seconds or who had at least one line of dialog. There were 169 women and 132 men in the U.S. ads; 120 women and 102 men in the Mexican ads; and 52 women and 49 men in the Australian ads.
Text analysis—particularly nonquantitative analysis—is often based on purposive sampling. Trost (1986) thought the relationship between teenagers and their families might be affected by five different dichotomous variables. To test this idea, he intentionally selected five cases from each of the 32 possible combinations of the five variables and conducted 160 interviews.
Nonquantitative studies in content analysis may also be based on extreme or deviant cases, cases that illustrate maximum variety on variables, cases that are somehow typical of a phenomenon, or cases that confirm or disconfirm a hypothesis. Even a single case may be enough to display something of substantive importance, but Morse (1994) suggests using at least six participants in studies where you’re trying to understand the essence of experience and carrying out 30-50 interviews for ethnographies and grounded theory studies.
Once a sample of texts is established, the next step is to identify the basic, nonoverlapping units of analysis. This is called unitizing (Krippendorf 2004a) or segmenting (Tesch 1990). The units may be entire texts (books, interviews, responses to an open-ended question on a survey) or segments (words, word-senses, sentences, themes, paragraphs). If you want to compare across texts—to see whether or not certain themes occur—the whole text (representing a respondent or an organization) is the appropriate unit of analysis. When the idea is to compare the number of times a theme occurs across a set of texts, then you need to break the text down into smaller chunks, each of which reflects a theme.