# Methods for analysis: Tests of association

The most appropriate method for analysis depends largely on the type of data collected, the method of collection and the nature of the research question. The simplest test for the strength of a relationship between two variables is a bivariate correlation. The Pearson or product-moment coefficient can be calculated when the data are assumed to be normally distributed and the expected relationship between them is linear; Spearman’s Rank and other non-parametric tests are available for ordinal data and non-linear relationships. Partial correlation enables examination of the relationship between two variables while removing the effect of one or two other variables. Correlations indicate the possible existence of a predictive relationship between two variables, but they do not imply causation.

For more thorough examination of the drivers of subjective well-being in cross-sectional, international and longitudinal studies, regression analysis is widely adopted. Regression is a correlation-based statistical technique that examines how well a set of explanatory or *independent variables* can predict a given *dependent variable,* i.e. the chosen subjective well-being measure. Regression is particularly suited to complex real-life problems because it allows the impact of several independent variables to be assessed simultaneously in one model, and it can tolerate independent variables being correlated with one another. However, the “best” regression solution (in terms of variance explained per independent variable) is produced when each independent variable is strongly correlated with the outcome variable, but uncorrelated with other variables, whether these other variables are included or excluded from the model. If two correlated independent variables are both included in the same regression model, their relationship with the dependent variable may be obscured (Tabachnick and Fidell, 2001). However, if an independent variable is correlated with some other excluded variable with causal claims, then the included variable will falsely be given credit for explanatory power really due to the excluded variable (a difficulty commonly described as the “omitted variable problem”).

Given the ordinal nature of subjective well-being measures, linear regression models (based on ordinary least squares estimates) are theoretically inefficient when compared to methods designed to analyse ordinal outcomes (e.g. Probit). However, Ferrer-i-Carbonell and Frijters (2004) have examined both methods in relation to subjective well-being drivers, and concluded that in practice there are few differences between estimates based on ordinary least squares estimates and Probit methods. Similar results were reported by Frey and Stutzer (2000), and reviewing the literature overall, Diener and Tov (2012) reach a similar conclusion. As the interpretation of ordinary least squares outputs is more straightforward, these are often the results reported. However, it remains advisable to examine both Probit and ordinary least squares approaches in the course of analyses to test for (and report on) any major differences between the results observed.

Where curvilinear relationships are expected, such as in the case of both income and age in predicting subjective well-being, squared values (in the case of age, where the expected relationship is U-shaped) and log values (in the case of income, where the expected relationship is asymptotic) are typically used in regression models.

Other analytical options that may be used to investigate the drivers of subjective well-being include structural equation modelling, also known as causal modelling or analysis, analysis of co-variance structures or path analysis. Like regression, structural equation modelling involves examining a set of relationships between one or more independent variables and a dependent variable (or sometimes several dependent variables); but rather than using raw measured variables as they are observed, structural equation modelling combines factor analysis^{33} with regression - and involves multiple regression analysis of factors. The key advantage of this approach is that it enables complex pathways to be tested simultaneously, and by focusing on relationships among underlying factors (rather than measured variables), estimated relationships are thought of as “free” of measurement error.^{34 }Detailed discussion of structural equation modelling is beyond the scope of this chapter, but because it also involves association- and regression-based techniques, some of the issues raised below remain relevant.