/ Examining the Utility of Interviewer Observations on the Survey Response Process


Surveys of representative samples of persons selected from general populations represent a vital source of information for policy makers and program evaluators. However, the utility of survey data depends heavily on the quality of the data collected. Survey data of poor quality could produce misleading, if not erroneous, estimates and inferences related to the characteristics of larger populations. Unfortunately, evaluating survey data quality can sometimes be difficult. Perhaps the best way to assess response quality would be to use true values available from an external source to validate answers reported by survey respondents. Unfortunately, these types of external validation data are usually unavailable or difficult to access.

In these situations, survey researchers and practitioners often turn to other sources of data that provide indirect indicators of data quality. These data sources capture breakdowns in the survey response process, which arise due to the inability and unwillingness of respondents to answer survey questions (Tourangeau, Rips, and Rasinski 2000).

One such data source is paradata, or data describing the survey data collection process (Kreuter 2013). Various types of paradata have been used to evaluate the quality of survey data, including response latency data (e.g., Callegaro et al. 2009), linguistic expressions of doubt and uncertainty (e.g., Schaeffer and Dykema 2011), call record data (such as contact attempts and call histories; e.g., Kreuter and Kohler 2009), patterns of attrition (e.g., Olson and Parkhurst 2013), and doorstep concerns (e.g., Yan 2017). These studies have all suggested that the various paradata were in fact useful indicators of data quality.

This chapter focuses on a different type of paradata that could provide information about breakdowns of the survey response process: post-survey interviewer observations of respondents and their behaviors during the interviewing process. Survey organizations often ask an interviewer to answer several closed-ended questions about the respondent and his/her behaviors (based on their observations) once the interviewer has finished an interview. For example, in the Health and Retirement Study (HRS), interviewers evaluate how interested the respondent was in the survey topic using a three-point scale (not at all interested, somewhat interested, and very interested) and assess how attentive he/she has been during the interview process using a four-point scale (excellent, good, fair, and poor). This practice assumes that respondents observed by the interviewers as showing undesirable behaviors (e.g., "not attentive") or having negative attitudes (e.g., "not interested") will provide data of poor quality.

The collection of these types of interviewer observations and evaluations dates back to 1948, when interviewers in a study were first asked to rate the dependability of the interviews that they obtained after each interview (Bennett 1948). More than seven decades later, many prominent government-sponsored surveys in the United States routinely collect post-survey interviewer observations, including the National Survey of Family Growth (NSFG). Several international survey programs also collect these observations, including the European Social Survey (ESS).

Survey programs generally design post-survey interviewer observations to tap into data quality issues, such as the overall quality of the information provided (e.g., Maclin and Calder 2006), respondents' understanding of the questions, interest in the interview, and attention to the questions. Interviewers record the observations at the end of the survey, after having just spent the entire interview witnessing the respondent's behaviors and recording their responses. Theoretically, these types of observations should correlate well with the quality of the answers provided by respondents. This potential led Billiet, Carton, and Loosveldt (2004) to recommend that interviewer observations of the survey response process be one evaluation criterion in the Total Quality Management paradigm.

Given their potential, several methodological studies have evaluated the utility of these observations. Fisk (1950) provided the first evaluation of interviewer observations of the survey response process, focusing on the variation of interviewer ratings of interest between interviewers. Later studies considering the relationships of individual interviewer observations with various indirect indicators of data quality found that respondents who received positive or favorable ratings tended to provide data of better quality, in terms of less missing data (Antoun 2012; Tarnai and Paxson 2005; Yan 2017), less measurement error (Wang, West, and Liu 2014), higher validity (Andrews 1990), more consistent reports (Antoun 2012), less "heaping" (Holbrook et al. 2014; Sakshaug 2013), less "satisficing" (Josten 2014), higher response propensity in later waves of panel studies (Plewis, Calderwood, and Mostafa 2017), and more codable answers to open-ended questions (Tarnai and Paxson 2005). A recent study has shown that interviewers do in fact use their observations of respondent behaviors during the interview process to answer the post-survey observation questions (Kirchner, Olson, and Smyth 2018).

Even though several methodological studies have now demonstrated the utility of individual observations for indicating data quality, little evidence exists of secondary analysts of survey data using these observations for substantive analysis purposes. The observations are generally not included in public-use data files or only available via restricted-access data user agreements (see, for example, "Other Data Files" at the web site www.cdc.gov/nchs/nsfg/nsfg_2011_2015_puf.htm). This raises questions about the cost- benefit tradeoffs of collecting these observations. For example, in the NSFG, interviewers spend an average of 5 to 6 minutes per interview completing these post-survey observations. Given that 22,682 interviews were completed in the NSFG from 2006 to 2010, an estimated 1,887 hours of production time (almost one full year of 40-hour work weeks) were spent on this task alone. Recent personal communication with NSFG managers at the National Center for Flealth Statistics suggests that public data users rarely use these observations, despite their public availability and the cost that it takes to collect them.

In this chapter, we apply latent class analysis (LCA) to multiple post-survey interviewer observations recorded in each of two major surveys (the NSFG and the ESS) to derive respondent-level data quality classes that could benefit secondary analysts of these data. The individual post-survey judgments and estimates recorded by interviewers that have been the focus of prior methodological work may be prone to quality issues themselves (e.g., Kirchner, Olson, and Smyth 2018; O'Muircheartaigh and Campanelli 1998; West and Peytcheva 2014). LCA offers the benefit of accounting for potential measurement error in the individual observations (Kreuter, Yan, and Tourangeau 2008), and (assuming that a given latent class model fits the data well) produces smoothed respondent-level predictions of the probability of membership in one of a small number of classes defined by patterns in the observations. If variables containing predicted response quality classes for the respondents based on the LCA effectively distinguish between respondents in terms of data quality, survey organizations could include these variables in public-use data files, enabling analysts to adjust for overall response quality in their analyses (e.g., analyzing changes in estimates across the quality classes). We therefore also compare the derived classes in terms of indirect indicators of data quality from each survey.

< Prev   CONTENTS   Source   Next >