Analytic induction is a formal, qualitative method for building up causal explanations of phenomena from a close examination of cases. The method involves the following steps: (1) Define a phenomenon that requires explanation and propose an explanation. (2) Examine a single case to see if the explanation fits. (3) If it does, then examine another case. An explanation is accepted until a new case falsifies it.

When you find a case that doesn’t fit, then, under the rules of analytic induction, the alternatives are to change the explanation to include the new case or redefine the phenomenon to exclude the nuisance case. Ideally, the process continues until a universal explanation for all known cases of a phenomenon is attained. (Explaining cases by declaring them all unique is not an option of the method. That’s a convenient way out, but it doesn’t get us anywhere.)

Charles Ragin (1987, 1994) formalized the logic of analytic induction, using an approach based on Boolean logic. Boolean variables are dichotomous: true or false, present or absent, and so on. This seems simple enough, but it’s going to get very complicated, very quickly, so pay attention. Remember, there is no math in this. It’s entirely qualitative. In fact, Ragin (1994) calls his Boolean method of induction qualitative comparative analysis, or QCA.

Suppose you have four dichotomous variables, including three independent, or causal, variables, and one independent, or outcome, variable. With one dichotomous variable, A, there are two possibilities: A and not-A. With two dichotomous variables, A and B, there are four possibilities: A and B, A and not-B, not-A and B, not-A and not-B. With three dichotomous variables, there are eight possibilities; with four there are 16 . . . and so on.

We’ve seen all this before—in the discussion about factorial designs of experiments (chapter 4); in the discussion of how to use the number of subgroups to figure out sample size (chapter 5); in the discussion of how to determine the number of focus groups you need in any particular study (chapter 8); and in the discussion of factorial questionnaires (chapter 9). The same principle is involved.

Thomas Schweizer (1991, 1996) applied this Boolean logic in his analysis of conflict and social status in Chen Village, China. In the 1950s, the village began to prosper with the application of technology to agriculture. The Great Leap Forward and the Cultural Revolution of the 1960s, however, reversed the village’s fortunes. Chan et al. (1984) reconstructed the recent history of Chen Village, focusing on the political fortunes of key actors there.

Schweizer coded the Chan et al. text for whether each of 13 people in the village experienced an increase or a decrease in status after each of 14 events (such as the Great Leap Forward, land reform and collectivization, the collapse of Red Brigade leadership, and an event known locally as ‘‘the great betrothal dispute’’). Schweizer wound up with a 13-actor-by-14-event matrix, where a 1 in a cell meant that an actor had success in a particular event and a 0 meant a loss of status in the village.

When Schweizer looked at this actor-by-event matrix he found that, over time, nine of the actors consistently won or consistently lost. That means that for nine of the villagers, there was just one outcome, a win or a loss. But four of the actors lost sometimes and won other times. For each of these four people, there could be a win or a loss, which means there are eight possible outcomes.

In total, then, Schweizer needed to account for 17 unique combinations of actors and outcomes. He partitioned the 17 unique cases according to three binary independent variables (whether a villager was originally from a city or had been raised in the village, whether a villager had a proletarian or a nonproletarian background, and whether a villager had ties to people outside the village or not) and one dependent variable (whether the person was an overall success). There are a total of four variables. Table 19.8 shows the 16 outcomes that are possible with four binary variables, and the number of actual cases, out of 17, for each of those outcomes.

Table 19.8 The Outcome of 17 Cases from Schweizer's (1996) Text Analysis

Success

External ties

Proletarian background

Urban origin

No. of cases

0

0

0

0

2

0

0

0

1

2

0

0

1

0

1

0

0

1

1

0

0

1

0

0

0

0

1

0

1

0

0

1

1

0

2

0

1

1

1

0

1

0

0

0

1

1

0

0

1

3

1

0

1

0

0

1

0

1

1

0

1

1

0

0

0

1

1

0

1

1

1

1

1

0

4

1

1

1

1

1

By setting up the logical possibilities in table 19.8, Schweizer was able to test several hypotheses about success and failure in Chen Village. You can see, for example, that just two people who were originally from a city (people who had been sent to the village to work during the Cultural Revolution) turned out to be failures, but five people from the city turned out to be successful. People from an urban background have an advantage, but you can also see from table 19.8 that it’s not enough. To ensure success, you should come from a proletarian family OR have good external ties (which provide access to information and power at the regional level).

Failure is predicted even better: If an actor has failed in the Chen Village disputes, then he or she is of rural origin (comes from the village) OR comes from a nonproletarian family AND has no ties to authorities beyond the village. The Boolean formula for this statement is:

The substantive conclusions from this analysis are intuitively appealing: In a communist revolutionary environment, it pays over the years to have friends in high places; people from urban areas are more likely to have those ties; and it helps to have been born into a politically correct (that is, proletarian) family.

Analytic induction helps identify the simplest model that logically explains the data. Like classic content analysis and cognitive mapping, human coders have to read and code the text into an event-by-variable matrix. The object of the analysis, however, is not to show the relations between all codes, but to find the minimal set of logical relations among the concepts that accounts for a single dependent variable.

With three binary independent variables (as in Schweizer’s data), two logical operators (OR and AND), and three implications (‘‘if A then B,’’ ‘‘if B then A,’’ and ‘‘if A, then and only then, B’’), there are 30 multivariate hypotheses: 18 when all three independent variables are used, plus 12 when two variables are used. With more variables, the analysis becomes much more difficult, but there are now computer programs that test all possible multivariate hypotheses and find the optimal solution (see appendix E).

These are not easy analyses to do, and some people I talk to about this kind of work wonder how qualitative analysis ever got so complicated. It just goes to show that qualitative doesn’t mean wimpy (Further Reading: qualitative comparative analysis) (box 19.5).