Respondents in the survey were 405 adults 18 years or older living in the Chicago metropolitan area, including 103 non-Hispanic Whites, 100 non-Hispanic Blacks, 102 Mexican- Americans, and 100 Korean-Americans. Fifty-two of the Mexican-American respondents were interviewed in English and 50 were interviewed in Spanish. Forty-one of the Korean- American respondents were interviewed in English and 59 were interviewed in Korean.

Interviews took place between August 13, 2008 and April 10, 2010 and were conducted by the Survey Research Laboratory (SRL) of the University of Illinois at Chicago (UIC). Respondents were recruited using Random Digit Dial (RDD) sampling procedures. Recruitment efforts targeted geographic areas with high proportions of residents in the targeted race/ethnic groups, and areas that were accessible to UIC. The oldest/youngest male/female approach was used to randomly select a household member between the ages of 18 and 70. The sample was limited to respondents in the four targeted racial and ethnic groups who lived in Chicago and spoke one of the targeted languages (English for Whites and Blacks, Spanish or English for Mexican-Americans, and Korean or English for Korean-Americans). To recruit Korean-Americans, we also purchased samples of and contacted those in households with Asian and Korean surnames (e.g., Kim), expanded the targeted geographic area to all of Cook County, sent advance letters (in English and Korean) to likely Korean households, and recruited alternative household members if the initially selected respondent was unavailable.

We cannot estimate a standard response rate for this study because we used a quota sampling strategy and discontinued recruitment when the targeted number of cases was reached. In the non-Korean sample, appointments were scheduled for 10.0% of the sample and interviews were completed with 61.1% of those scheduled. Among the Korean sample, these numbers were 1.8% and 77.6%, respectively. Mexican- and Korean-Americans were allowed to choose their interview language; those who were bilingual and had no language preference were randomly assigned to interview language until quotas were met.

Eligible respondents were interviewed at SRL. They completed an initial paper-and- pencil self-interview (PAPI) that included several measures not employed in this analysis, followed by a computer-assisted personal interview (CAPI; average length of 67 minutes), followed by a second PAPI that included respondents' demographic characteristics. Interviews and respondents were matched on race and ethnicity. Bilingual interviewers conducted the interviews with Mexican- and Korean-Americans. Respondents received $40 for participating. All study procedures were approved by the UIC Institutional Review Board.

We measured reading time and response latencies for 150 CAPI questions. The core of the questionnaire included 90 items that varied along a number of dimensions (see Appendix 17C of the online materials for question wording). Questions varied in the type of judgment they asked respondents to make: subjective judgments (e.g., attitudes, beliefs, or values), self-relevant knowledge (behaviors or characteristics), or factual knowledge. Within each of these sets of questions, half of the questions required time-qualified judgments (that required the respondent to recall whether something happened within a specific period of time) and half required judgments that were not time-qualified. The response format of questions was also manipulated. Six different response formats were used: (1) yes/no; (2) categorical; (3) fully labeled unipolar scales; (4) fully labeled bipolar scales with a midpoint; (5) fully labeled bipolar scales without a midpoint; and (6) numerical open ends. The remaining questions included semantic differential scales, feeling thermometer questions about groups in society, agree/disagree questions, questions that explicitly offered or omitted a "no-opinion" option or filter, and questions that asked respondents about the process of answering questions in the survey. A number of questions were deliberately designed to be problematic (as denoted in Appendix 17C of the online supplemental materials).

The CAPI interview was separated into four modules. Half of the respondents were randomly assigned to be asked Modules I and II before Modules III and IV. The other half of respondents were randomly assigned to receive Modules III and IV before Modules I and II. All respondents received Module V questions (demographics) last.

Behavior coding was used to obtain information about the validity of timing measures (see below) and indicators of comprehension and mapping difficulties. Interviews were video and audio recorded if respondents gave permission, and trained staff coded interviewer and respondent behaviors from the video recordings. A total of 398 of the 405 interviews were able to be coded. Validation was done for 77 cases across all three languages, which included 29,281 behavior codes. Coder agreement for these validated cases was high (95.8% overall agreement).

Question reading time was measured as the amount of time it took interviewers to read the question, as captured by a timer that started when the question screen was loaded and ended when interviewers hit "enter" to indicate they finished reading the question and were ready for the response screen. Response latencies were measured as the amount of time it took respondents to answer the question, as measured by a timer that started when the response screen was loaded and ended when a response was entered. Response latencies were transformed by taking the square root of this measure. IRS was calculated in words per minute for each question using question length and question reading time. For more information on this time measurement procedure, which was modified from one suggested by Bassili and Fletcher (1991), see Appendix 17B. Respondent and interviewer behaviors rendered both response and question latencies invalid. Figures 17.A2 and 17.A3 show the distribution of invalidated question and response latencies, respectively. Eliminating these questions resulted in a total of 31,520 valid question-answer sequences included in our analyses.

Variables were coded to assess each of our hypotheses about the predictors of IRS. For each interview, interviewer experience on the project was calculated as the number of successive interviews (e.g., an interviewer's first interview was coded 1, their second interview was coded 2, and so on). For each question, the number of previous questions asked of each respondent was coded. Question length was operationalized as the number of words in the question.

Question sensitivity was rated independently by two of the investigators. Each question was coded as not at all sensitive, somewhat sensitive, or very sensitive. Concordance between the two coders was over 80%, and all disagreements were discussed and consensus reached on a final code. Dummy variables were created for somewhat and very sensitive questions (with not at all sensitive questions as the comparison group).

Behavior coding data were used to estimate two variables. A variable for comprehension difficulty was coded 1 (versus 0) if behavior coding indicated that a respondent showed any of the behaviors listed in the top panel of Figure 17.1 for a given question. A variable for mapping difficulty was coded 1 (versus 0) if behavior coding indicated that a respondent showed any of the behaviors listed in the bottom panel of Figure 17.1 for a given question.

A number of control variables were used to control for question characteristics. A variable to represent whether a question was time-qualified or not was coded 1 for questions that requested time-qualified judgments and 0 for those that did not. Judgment type was coded using two dummy variables for subjective judgments and factual knowledge (self-relevant knowledge questions were the comparison group). Questions that used a show card were coded 1 and those that did not were coded 0. Question abstraction was rated independently by two of the investigators as not at all abstract (i.e., about a concrete or physical construct), somewhat abstract (e.g., about objective or physical construct but responding requires some interpretation of terms or ideas; e.g., "corruption in government"), or very abstract (e.g., not objective or grounded in physical world; e.g., "obedience"). Concordance between the two coders was over 80%; all disagreements were discussed and consensus reached on a final code. Dummy variables were created for somewhat and very abstract questions (not at all abstract questions as the comparison group). Several deliberately problematic questions were included in the survey instrument including those that: (1) asked about nonexistent policies or objects, (2) had mismatched question stem and response options, (3) had non- mutually exclusive or non-exhaustive response options, (4) requested information that was too specific, and (5) were double barreled. A dummy variable was coded 1 for deliberately problematic questions and 0 for all other questions. Two dummy variables captured the inclusion of an explicitly offered "no-opinion” response option and the use of “no-opinion" filter questions. For the first variable, each question was given a code of 1 if it explicitly offered respondents a no-opinion response option and a 0 otherwise. For the second variable, each question was coded 1 if it was preceded by a no-opinion filter question and 0 otherwise.


Respondent behavior codes used to identify comprehension and mapping problems.

Question format was captured with a set of dummy variables for questions with (1) agree/ disagree response options, (2) yes/no response options, (3) categorical response options, (4) fully labeled unipolar response scales, (5) fully labeled bipolar response scales with a midpoint, (6) fully labeled bipolar response scales without a midpoint, (7) semantic differentials, and (8) feeling thermometers (numerical open-ended response format was the comparison group).

Respondent demographics were also measured. Education was measured using a series of dummy variables for highest degree earned, including less than a high school degree (as the comparison group), high school graduate, some college, college graduate (four-year degree), and college graduate (advanced degree). Respondents' age was calculated from reports of birth year and was coded as a continuous variable ranging from 0 (age 18) to 1 (age 70). Gender was observed by the interviewer and coded 0 for females and 1 for males. Household income was coded into five categories (coded 0 = less than $10,000; .2 = $10,001- $20,000; .4 = $20,001-$30,000; .6 = $30,001-$50,000; .8 = $50,001-$70,000; and 1 = $70,001 or more). Respondent race/ethnicity was obtained via the initial RDD screening and confirmed at the beginning of the CAPI survey. Race/ethnicity was coded using three dummy variables (with non-Hispanic White as the comparison group) to indicate non-Hispanic Black, Mexican-American, and Korean-American. Language of interviews was coded using two dummy variables, representing Spanish and Korean interviews. English language interviews were the comparison group.

To examine the antecedents and consequences of IRS, we conducted a series of generalized linear models. The models were fitted with the identity link function [Е(у)=л'Р, у ~ Normal] for continuous outcomes of interviewer pace and response latencies, and the logit link function [logit{E(y)}=x|3, у ~ Bernoulli] for dichotomous outcomes of any comprehension and any mapping difficulty. All models were fitted with a cluster-robust sandwich variance estimator (Hardin and Hilbe 2018) to adjust for cross-classified clusters of questions and interviewers in which IRS was nested. Only questions with valid response and question latencies were included (see Online Appendix 17B for more details about this process). All analyses were conducted using Stata 14 (StataCorp 2015) with unweighted data.

< Prev   CONTENTS   Source   Next >