Coding and data processing
The coding of information on subjective well-being is generally straight-forward. In general, numerical scales should be coded as numbers, even if the scale bounds have labels. Much analysis of subjective well-being data is likely to be quantitative and will involve manipulating the data as if it were cardinal. Even for fully-labelled response scales (such as the “yes/no” responses that apply to many questions), it is good practice to code the data numerically as well as in a labelled format in order to facilitate use of the micro-data to produce summary measures of affect balance or similar indices. “Don’t know” and “refused to answer” responses should be coded separately from each other as the differences between them are of methodological interest.
Normal data-cleaning procedures include looking for obvious errors such as data coders transposing numbers, duplicate records, loss of records, incomplete responses, out-of-range responses or failure to follow correct skip patterns. Some issues are of particular relevance to subjective data. In particular, where a module comprising several questions with the same scale is used, data cleaning should also involve checking for response sets (see Chapter 2). Response sets occur when a respondent provides identical ratings to a series of different items. For example, a respondent may answer “0” to all ten domain evaluation questions from Module E. This typically suggests that the respondent is not, in fact, responding meaningfully to the question and is simply moving through the questionnaire as rapidly as possible. Such responses should be treated as a non-response and discarded. In addition, interviewer comments provide an opportunity to identify whether the respondent was responding correctly, and a robust survey process will make provision for allowing such responses to be flagged without wiping the data record.
Finally, it is important to emphasise that much of the value from collecting measures of subjective well-being comes from micro-data analysis. In particular, analysis of the joint distribution of subjective well-being and other outcomes and use of subjective well-being measures in cost-benefit analysis cannot usually be accomplished through secondary use of tables of aggregate data. Because of this, a clear and comprehensive data dictionary should be regarded as an essential output in any project focusing on subjective well-being. This data dictionary should have information on survey methodology, sampling frame and correct application of survey weights, as well as a description of each variable (covering the variable name, the question used to collect it and how the data is coded). If a variable is collected from only part of the survey sample due to question routing, this should also be clearly noted in the data dictionary.