Reporting central tendency and level

The most fundamental information to report with respect to subjective well-being is the level of the outcome. This can be thought of as addressing the issue of “how high or low is the level of subjective well-being in the population under consideration?”. There are three main approaches to describing the level of either single-item or summed multi-item aggregate measures. First, the frequency of responses can be described by category: this involves presenting the proportion of the population that select each response category of the subjective well-being scale used. Second, the data can be summarised in relation to one or more thresholds. This involves reporting the proportion of the population with a level of subjective well-being above or below a particular threshold level. Finally, the data can be summarised via some measure of central tendency, such as the mean, median or mode. Each of these three approaches has its own strengths and weaknesses.

Reporting the proportion of respondents selecting each response category is the method that requires the data producer to make the fewest decisions about presentation. Such an approach has some significant strengths with respect to information on subjective well-being. Because the entire distribution is described, no information is lost. Also, a presentation by category respects the ordinal nature of subjective well-being data6 and requires no assumptions about the differences among ordinal categories (i.e. there is no assumption that the difference between a 3 and a 4 is the same as that between a 7 and an 8).

However, presenting the whole distribution of responses for each measure also has significant draw-backs. In particular, for a non-specialist audience it is difficult to directly compare two distributions of this sort and reach judgements about which represents a higher or lower “level” of well-being - although non-parametric statistical tests are available for these purposes. While reporting the whole distribution may be a viable strategy where the number of response categories is relatively limited (e.g. example shown in Box 4.1), as the number of categories increase it becomes more difficult to reach overall judgements from purely descriptive data.

Box 4.1. Reporting on the proportion of respondents by response category

Statistics New Zealand publishes a number of measures of subjective well-being in the statistical releases for the biannual New Zealand General Social Survey. These include overall life satisfaction and satisfaction with particular aspects of life, namely financial satisfaction and a subjective assessment of health status. In all cases a five-point labelled Likert scale is used for responding to the questions. Although such a measure is sub-optimal in many respects, it lends itself well to being presented as a proportion of respondents by response category (Figure 4.1).

Figure 4.1. Reporting the proportion of respondents selecting each response category

Source: Statistics New Zealand, New Zealand General Social Survey.

One way to manage a large number of scale responses is to report on the proportion of responses falling above or below a given threshold, or set of thresholds. For example, responses can be reported as the percentage of the sample falling above or below a certain cut-off point, or banded into “high”, “medium” and “low” categories (Box 4.2). Threshold descriptions of the data can be grasped quickly - providing an anchor for interpretation, and offering a way of

Box 4.2. Output presentation examples - threshold-based measures

The Gallup-Healthways Life Evaluation Index classifies respondents as “thriving”, “struggling”, or “suffering”, according to how they rate their current and future lives (five years from now) on the Cantril Ladder scale with steps numbered from 0 to 10, where “0” represents the worst possible life and “10” represents the best possible life. “Thriving” respondents are those who evaluate their current state as a “7” or higher and their future state as “8” or higher, while “suffering” respondents provide a “4” or lower to both evaluations. All other respondents are classified as “struggling”. Table 4.2 shows thriving struggling and suffering in the EU.

Table 4.2. Gallup data on thriving, struggling and suffering in the EU (sorted by percentage suffering)

Column 1

% thriving

% struggling

% suffering

% thriving minus % suffering (pct. pts)

Bulgaria

5

50

45

-40

Romania

18

54

28

-10

Hungary

15

57

28

-13

Greece

16

60

25

-9

Latvia

16

61

23

-7

Portugal

14

65

22

-8

Estonia

24

60

17

7

Poland

23

60

17

6

Lithuania

23

57

16

7

Slovenia

32

53

14

18

Germany

42

52

6

36

Czech Republic

34

53

13

21

Slovak Republic

27

61

12

15

Malta

34

55

11

23

Spain

39

54

7

32

Cyprus

44

49

7

37

Italy

23

71

6

17

United Kingdom

52

44

6

46

Ireland

54

43

4

50

France

46

50

4

42

Austria

59

38

3

56

Finland

64

34

3

61

Denmark

74

24

2

72

Luxembourg

45

54

1

44

Netherlands

66

33

1

65

Note: Data collected between March and June 2011. Data unavailable for Sweden and Belgium at time of publishing.

Source: GallupWorld web article by Anna Manchin, 14 December 2011, “More suffering than thriving in some EU countries”, www.gaUup.com/poll/151544/Suffering-Thriving-Countries.aspx.

Box 4.2. Output presentation examples - threshold-based measures (cont.)

Change in subjective well-being over time can also be presented relative to a given threshold (Figure 4.2).

Figure 4.2. Share of the French population classified as “thriving”, “struggling” and “suffering”

Source: Gallup World web article by Anna Manchin, 4 May 2012, “French Adults’ Life Ratings sink in 2012”, www.gaUup.com/poU/154487/French-Adults-Life-Ratings-Sink-2012.aspx.

communicating something about the distribution of the data with a single figure. The use of thresholds is also consistent with the ordinal nature of much subjective well-being data, as it requires no assumptions about the cardinality of scale responses.

The downsides of threshold measures include losing some of the richness of the data,7 and the risk of encouraging a distorted emphasis on shifting people from just below to just above a threshold. This is a particular risk if only one threshold (e.g. “6 and above”) is used, because it may be important for policy-makers in particular to understand what characterises communities at both high and low ends of the subjective well-being spectrum. Although thresholds have the potential to be more sensitive to change when carefully selected around the area of greatest movement on the scale, there is a considerable risk that a threshold positioned in the wrong part of the scale could mask important changes in the distribution of the data. For example, if the risk of clinically-significant mental health problems is greatest for individuals scoring 5 or less on a 0-10 life evaluation measure, setting a threshold around 7 could lead to a failure to identify changes that could have significant consequences for policy. In addition, reporting based on thresholds runs the risk of presenting two very similar distributions as quite different, or vice versa. For example, for some countries the distribution of subjective well-being is bi-modal, while for others there is a single mode. Depending on where a threshold is set, two such distributions might be presented as very different, or essentially the same. The central difficulty, therefore, lies in identifying meaningful threshold points that have real-world validity.

Thresholds can be set through examining the underlying distribution of the data and identifying obvious tipping points, but this data-driven approach limits both meaningful interpretation (what is the real-world meaning of a data cliff?) and comparability among groups with different data distributions, whether within or between countries. A more systematic approach may be to adopt something similar to a “relative poverty line”, whereby individuals falling, for example, below half of the median value on a scale are classified as faring badly. This capitalises on thresholds’ ability to convey distributional characteristics, but has the downside of conveying relatively little about the average level, which is essential for both group and international comparisons.8 It also remains an essentially arbitrary method for identifying a threshold. The final option would be to select an absolute scale value below which individuals demonstrate a variety of negative outcomes (and an upper bound associated with particularly positive outcomes), based on the available empirical evidence. This would at least give the threshold some real-world meaning.

Blanton and Jaccard (2006) make a strong case for linking psychological metrics to meaningful real-world events, highlighting the risk of assigning individuals to “high”, “medium” and “low” categories without justifying or evidencing what these categories mean in practice. In particular, they note the conceptual and practical problems associated with the intuitively appealing practice of “norming”, i.e. setting threshold values based on the proportion of the sample falling above or below that threshold. For example, in an obesity reduction programme, if an individual’s weight loss result was described as “high” because relative to others in the group they lost more weight, the clinical significance of the finding remains obscured: it is possible that everyone in the group lost a clinically significant amount of weight, or no-one in the group lost a clinically significant amount. In both of these scenarios, what matters is not how the individual fares relative to the rest of the sample, but how their weight loss is likely to relate to other health outcomes. There is a clear analogy here with both relative poverty lines and international comparisons of subjective well-being: what would be categorised as “high” life satisfaction by normative standards in Denmark will be quite different to “high” life satisfaction according to normative standards in Togo - making these two categorisations impossible to compare. This emphasises the challenges associated with setting suitable thresholds and suggests against emphasising threshold-based measures too strongly in data releases. Given the wide range of potential uses of the data, a wide range of thresholds may be also relevant to policy-makers and others.9

Summary statistics of central tendency provide a useful way of presenting and comparing the level of subjective well-being in a single number. The most commonly-used measures of central tendency are the mean, the mode and the median. However, due to the limited number of scale categories (typically no more than 0-10), the median and modal values may lack sensitivity to changes in subjective well-being over time or to differences between groups. The mean is therefore generally more useful as a summary statistic of the level of subjective well-being.

Although the mean provides a good summary measure of the level of subjective well-being, it has shortcomings. First, the use of the mean requires treating the data from which it is calculated as cardinal. Although most subjective measures of well-being are assumed to be ordinal, rather than cardinal,10 evidence suggests that treating them as if they were cardinal in subsequent correlation-based analysis does not lead to significant biases: the practice is indeed common in the analysis of subjective well-being data, and there appear to be few differences between the conclusions of research based on parametric and nonparametric analyses (Ferrer-i-Carbonell and Frijters, 2004; Frey and Stutzer, 2000; Diener and Tov, 2012). That said, Diener and Tov also note that when it comes to simpler analyses, such as comparisons of mean scores, ordinal scales that have been adjusted for interval scaling using Item Response Theory can produce different results to unadjusted measures (p. 145). Second, the mean can be strongly affected by outliers and provides no information on the distribution of outcomes. Both of these issues therefore highlight the importance of complementing the mean with information on the distribution of data.

Distribution

It is also important to present information on the distribution of responses across the different response categories. If the primary way of presenting the data is by reporting the proportion of responses falling in each response category, the need for separate measures of distribution is less important. If, however, reporting is based on thresholds or summary statistics of central tendency, specific measures of distribution are important. The choice of distributional measure will depend partly on whether the data is treated as ordinal or cardinal.

When cardinality is assumed, it is possible to use summary statistics of distribution such as the Gini coefficient. Both the Gini coefficient and the standard deviation are based on calculations that are unlikely to hold much meaning for the general public, and may therefore be less effective as a tool for public communication. The Gini in particular also perhaps has less intuitive meaning for subjective well-being than it does for its more traditional applications to income and wealth.11 This means that other measures of dispersion, such as the interquartile range (i.e. the difference between individuals at the 25th percentile and individuals at the 75th percentile of the distribution), or the point difference between the 90th and the 10th percentile (Box 4.3), may be preferred in simple data releases. Where space allows, graphical illustrations of distribution are likely to be the most intuitive way to represent distributions for non-specialist audiences, although such graphs can be difficult to compare in the absence of accompanying summary statistics.

 
Source
< Prev   CONTENTS   Source   Next >