Aggregation of multi-item measures

Where a survey includes more than one question about subjective well-being, a key reporting decision for data producers will be whether to report responses to each question separately, or alternatively to aggregate some questions into broader multi-item measures. Single-item life evaluation questions are most often reported as stand-alone headline measures.12 However, in addition to the single-item life evaluation primary indicator, the suite of question modules proposed in Chapter 3 also includes several multi-item measures intended to capture evaluative, affective (or hedonic), eudaimonic and domain- specific aspects of subjective well-being.

Although there may be value in looking at responses to individual questions or scale items in more detailed analyses, it is desirable to summarise longer multi-item measures, particularly for the purposes of reporting outcomes to the general public. Furthermore, summing responses across multiple items should generally produce more reliable estimates of subjective phenomena, reducing some of the impact of random measurement error on mean scores - such as may result from problems with question wording, comprehension and interpretation or bias associated with a single item. However, whilst summing responses across different life evaluation questions should pose relatively few problems, affect and eudaimonia are by nature more multidimensional constructs, and thus there is a greater risk of information loss when data are aggregated.

Box 4.3. Distribution of subjective well-being among OECD and emerging countries

(OECD, 2011a)

In How’s Life?, the OECD used the gap between the 10th and 90th percentiles as a measure of distribution (the “90/10 gap”). Conceptually similar to the interquartile range, the 90/10 gap was used because the clustered nature of life satisfaction responses meant that the interquartile range provided little to distinguish between countries.

Figure 4.3. Inequality in life satisfaction in OECD and emerging economies, 2010

Point difference between the 90th percentile and the 10th percentile

Source: GallupWorld Poll data, reported in How's Life? (OECD, 2011a).

Options for aggregation, specific to each scale, include:

  • • Positive and negative affect: Where several items are used to examine experienced affect, most scales are designed such that one can calculate positive and negative affect subtotals for each respondent, summarising across items of similar valence. For example, in the core affect measure proposed in Module A of Chapter 3, positive affect is calculated as the average score (excluding missing values) for questions on “enjoyment” and “calm”, and negative affect is calculated as the average score for questions on “worry” and “sadness”. As with any summary measure, this risks some degree of data loss, particularly where affect dimensions can be factored into one or more sub-dimensions - for example, the high-arousal/low-arousal dimensions identified in the Circumplex model of mood (Russell, 1980; Russell, Lewicka and Niit, 1989; Larsen and Fredrickson, 1999). However, for the purposes of high-level monitoring of affect, examining summary measures will be more feasible than looking at each affect item individually, and the increased reliability of multi-item scales will be advantageous.
  • Affect balance: Positive and negative affect measures can be further summarised into a single “affect balance” score for each respondent by subtracting the mean average negative affect score from the mean average positive affect score. This can then in turn be reported as either a mean score (positive minus negative affect) or as a proportion of the population with net positive affect overall.

• Where information is available on the frequency of positive and negative affect experiences throughout the day, such as that provided by time-use studies, it is also possible to calculate the proportion of time that people spend in a state where negative affect dominates over positive affect. This is described as the “U-index” (Kahneman and Krueger, 2006), and again this can also be reported at the aggregate population level. Time-use data also enable the mean affect balance associated with different activities to be described (Table 4.3).

Table 4.3. Mean net affect balance by activity, from Kahneman et al. (2004)

Activity

Percentage of sample

Time spent (hours)

Net affect1

Intimate relations

11

0.21

4.74

Socialising after work

49

1.15

4.12

Dinner

65

0.78

3.96

Relaxing

77

2.16

3.91

Lunch

57

0.52

3.91

Exercising

16

0.22

3.82

Praying

23

0.45

3.76

Socialising at work

41

1.12

3.75

Watching TV

75

2.18

3.62

Phone at home

43

0.93

3.49

Napping

43

0.89

3.27

Cooking

62

1.14

3.24

Shopping

30

0.41

3.21

Computer at home

23

0.46

3.14

Housework

49

1.11

2.96

Childcare

36

1.09

2.95

Evening commute

62

0.62

2.78

Working

100

6.88

2.65

Morning commute

61

0.43

2.03

1. Net affect is the average of three positive adjectives (enjoyment, warm, happy) less the average of five negative adjectives (frustrated, depressed, angry, hassled, criticised).All the adjectives are reported on a 0-6 scale, ranging from “not at all” to “very much”. The “time spent” column is not conditional on engaging in the activity. The sample consists of 909 employed women inTexas.

Source: Kahneman, Krueger, Schkade, Schwartz and Stone (2004), Figure 2, p. 432.

Both affect balance and the U-index are similar to threshold-based measures, but ones that have both clear meaning and the considerable advantage of reducing affect data to a single variable. However, there is some risk of data loss in adopting these aggregation approaches, particularly when exploring group differences. For example, the ONS subjective well-being data release (ONS, 2012) found that, for most age groups, on average women reported slightly higher happiness yesterday than men, but they also reported higher anxiety yesterday. If aggregated as an affect balance measure, these differences may not be detectable.

Ultimately, the judgement of the most appropriate measure should be driven by the primary data use. For overall monitoring, the benefits of reporting affect balance are likely to outweigh the drawbacks - but when attempting to understand, for example, the links between affect and health outcomes, it may be more important to examine dimensions of affect separately (Cohen and Pressman, 2006).

Eudaimonia: Most of the literature regards eudaimonia as a multidimensional construct (e.g. Huppert and So, 2011; Ryff, 1989; Ryan and Deci, 2001), and therefore summarising across all items on a multi-item scale again risks some data loss. For detailed analysis, it may be important to examine each sub-component of eudaimonia separately, at least initially. Nonetheless, for the purposes of monitoring well-being, if positive correlations are found between each of the sub-dimensions, it may be appropriate to sum across items.13 The first option is to take the mean average value of all responses, omitting missing values. Alternatively, a threshold-based approach has been proposed by Huppert and So (2011; see Box 4.3), which categorises respondents according to whether they meet the criteria for “flourishing”. The “flourishing” construct may offer a powerful communicative device. However, partly because it is based on groups of items with different numbers and different response categories, Huppert and So’s operational definition of “flourishing” ends up being quite complex (with different thresholds being applied to differentially distributed data, and different items grouped according to various subscales assumed to be present). As noted earlier, the present difficulty with threshold-based measures is that there is little consensus on where the meaningful cut-off points lie. Further research is therefore needed before this approach can be regarded as preferable to reporting mean average scores.

Domain satisfaction: Questions about satisfaction with individual domains of life can be meaningful as stand-alone measures, and may be particularly useful for policy-makers seeking specific information on the effects of a particular policy intervention. However, some sets of domain-specific questions have been designed with a view to creating a composite measure of life evaluation overall, by summing responses across each of the domains (e.g. the Australian Personal Wellbeing Index - International Wellbeing Group, 2006, in Module E, Chapter 3). This overall approach requires making strong assumptions about the weights to apply to each life domain (as well as the universality with which those weights apply across the population) along with some judgements about which domains of life are relevant to subjective well-being overall. In the case of the Personal Wellbeing Index, domains have been selected as the most parsimonious list for capturing “satisfaction with life as a whole”, and equal weights are adopted for each domain, in recognition of the fact that empirically-derived weights may not generalise across data sets. These assumptions notwithstanding, composite measures of domain satisfaction may offer a more rounded and potentially more reliable picture of life “as a whole”, as respondents are encouraged to consider a variety of different aspects of life when forming their answers.

 
Source
< Prev   CONTENTS   Source   Next >