General measurement issues
Sample surveys serve the purpose of estimating the value of certain parameters for a population of interest, e.g. the median wealth or the average mortgage debt of households, in a cost-effective way, i.e. collecting data only from an appropriate subset of the population. When designing a survey, data producers must keep in mind that their goal is to achieve the best possible estimates for the outcome measures of interest, subject to a budget constraint (Figure 6.2).
While several possible metrics exist to define what “best” means in this context, they all rely on the same sequence of steps that support the desired estimates. First, an instrument is constructed to obtain the information sought, generally in the form of a questionnaire. Second, a random sample of the population - sometimes called a theoretical sample - is selected. Third,
Figure 6.2. The role of data producers in sample surveys
fieldwork activities (generally, attempts to contact or interview sampled households) are conducted and data from the final sample are obtained. Fourth, the data are processed and weights are constructed. Finally, the target estimates are computed. There may be various choices of method at each of these steps, and often such choices have different implications for aspects of the quality of the ultimate results and the overall cost of the work.
The information provided by respondents, sometimes called raw data, may be inaccurate or incomplete. Data producers must work on correcting inaccuracies through data-editing and dealing with missing information, typically through imputation. The resulting data set is sometimes referred to as a validated or final data set.
The final sample differs from the theoretical sample when the former is affected by unit non-response, that is, by the failure to obtain an interview with the desired sample element. If unit non-response rates are high and/or concentrated in specific sectors of the population, the final sample might look quite different from the theoretical sample. Estimation weights must be computed for each observation in order to account for any disproportion in the initial probabilities of the selection of sample units, to adjust for differential propensities to unit nonresponse and to align the final sample composition with that of the target population.
The desired estimates are obtained by applying mathematical formulae called estimators to the final data and weights. Each estimator has the statistical properties of a distribution. Survey designers generally aim at minimising a version of its mean square error, i.e. the sum of the square of its bias (distance between the expected value of the estimator and the true population parameter) and its variance (a measure of the variation of the estimate that would be expected as a result of repeated execution of sampling and all other steps toward the construction of the estimate). In other words, the distribution of a good estimate is tightly centred around the true parameter. The key steps in designing any sample survey are summarised in Figure 6.2.
The first input tends to come from researchers or policy makers, and it is typically expressed in general terms, e.g. “there is a need for more information on household wealth” or “it is urgent to know who the highly indebted individuals are”. Data producers need to translate this policy demand into a clearly defined set of key indicators: for example, median net wealth, average debt-to-income ratio, shares of indebted individuals by employment status, etc. Subsequently, categories must be defined and sequences of questions designed to obtain such information for individual sample elements (see Box 6.1 for an example). Very often, there may be a desire for relatively broad information that can be used to address research or policy questions that are unknown at the time a survey is
Box 6.1. Measuring household financial vulnerability
The subprime crisis that hit the United States and, subsequently, the rest of the world in 2007-08 was triggered by the inability of a cluster of low- and middle-income households to repay their mortgages. Events such as this, which depend on the concentration of a given phenomenon in a specific segment of a population, cannot be predicted based on aggregate statistics. An increasing trend in aggregate household debt, or even the average debt-to- income ratio, does not necessarily signal increasing systemic vulnerability; this could also emerge during periods of solid economic expansion.
Sample surveys produce a tool for estimating the probability of financial difficulties at the micro level and the possible economy-wide effects that they may trigger. They allow reconstructing household budgets individually, while also controlling for characteristics such as education and employment history, which help in determining earning potential. They give a fuller picture of each debtor's situation and default risk. For this reason, after the crisis policy makers have expressed a growing demand for survey-based statistics to assess financial vulnerability. Data producers are key to this in that they have to translate this generic demand into a set of target estimates, and then devise optimal strategies for the collection of data, the production of the estimates and the communication of the results. The questions and possible answers involved in this process can be sketched as follows:
• What is “financial vulnerability”? The idea is clearly related to the likelihood of incurring financial difficulties, but measurement requires a clear definition, both in terms of content and in terms of reference unit. In turn, this implies a number of choices. At the time of writing, no international standard existed for this concept, but several countries have defined it as a binary indicator, valued positively if the amount of debt-related payments (capital and interest, summed over all existing debts) at the household level exceeds a certain share of aggregate household income in a given year. Some data producers look only at mortgage debt, while others estimate vulnerability at the individual level. Fine-grained versions of the indicator may also be produced, taking into account the depth of vulnerability.
Once a definition has been decided upon, and assuming a survey framework already exists, target variables must be selected. What is the essential information set? Should it be complemented by auxiliary variables and, if so, which ones? In the case of the most widely adopted definition outlined above, households need to provide at least an estimate of each debt-related payment or set of payments over the course of the reference period, along with an estimate of income. It may also be useful to collect additional information on each debt, in terms of stock (e.g. outstanding principal), the incoming flows of funds (e.g. any refinancing during the year), interest rate, mode of collateralisation and so on. While these items are not strictly necessary to estimate vulnerability in terms of a ratio between outgoing flows and income, they are instrumental in giving a fuller representation of each household's debt situation, which might be of help to policy makers. Since a balance must be struck between respondent burden and information completeness, any additional variables that go beyond what is essential to the original request should be chosen parsimoniously and, if possible, through a bilateral clarification process between the data producers and policy makers.
A measurement strategy should then be determined for each of the target variables. In the following, we forego issues related to the measurement of income and focus on debt. Different types of households may recall debt-related information with varying degrees of difficulty: for example, those who operate under a strict budget constraint might be more aware of the exact amount of each payment, while affluent respondents might not be
Box 6.1. Measuring household financial vulnerability (cont.)
equally attentive and might even fail to recall some outflows, such as small-amount payments for consumer durables debited automatically every month on a credit card or bank account. One possible strategy to improve accuracy entails an initial set of Yes/No filter questions, i.e. asking households whether they hold a certain type of debt (mortgage on primary residence, mortgage on other real estate, consumer credit for vehicles, consumer credit for other durables, credit card debt, bank overdraft, informal debts towards friends and relatives and so on). For each debt identified by a positive answer, details are then requested. Another strategy, used in some broader-scope surveys, consists in asking how each type of asset is or was financed, and then investigating details whenever debt is mentioned as a form of financing. Additional questions are then needed to cover loans that do not go directly toward a specific asset, including the reason why they were taken out. Compared to the former measurement strategy, this one has the advantage of giving a clearer picture of how households plan and carry out the acquisition of assets; however, it generally entails a larger response effort.
Data producers should also envision in advance whether respondents may need help in answering certain questions; if yes, they should predispose cognitive aids for respondents such as cards and glossaries, and integrate information on using them in interviewer training sessions. For example, in the case of Yes/No questions covering different types of debt, it may be useful to provide interviewers with a standard definition of concepts such as revolving credit or bank overdraft.
Once the data has been collected, it must be checked, validated and, where necessary, subjected to imputation procedures before it is fit for the production of estimates. Choices have to be made on editing rules, treatment of outliers, and computation of variability in results. Generally speaking, these choices should be made beforehand for the whole survey, and not on a variable-by-variable basis, in order to achieve methodological consistency.
Finally, the results have to be presented to policy makers and, in some cases, to the general public. Population-level statistics, such as the total share of financially vulnerable households, should generally be accompanied by meaningful information on the distribution of the phenomenon. Depending on the variables available in the surveys and on any external information pointing to problematic population segments, breakdowns by age, gender, education level, household size, employment status and/or sector, etc., and any combination thereof, can be offered to users.
constructed. In the case of wealth measurement, this desire argues for binding the approach to question design as closely as feasible to a general accounting framework, such as that described in Chapter 3.
When establishing a survey that will be carried out regularly, as opposed to a one-off study, data producers should choose the frequency based on the characteristics of the target concept. An additional consideration is whether a repeated survey should be executed as a repeated cross-section or as a sequence of interviews with a fixed panel, possibly supplemented with additional elements to compensate for population changes since the formation of an initial panel. Repeated cross-sections can provide good estimates of changes in characteristics of population groups over time. In contrast, a panel (longitudinal) component may be desirable if changes over time at the level of individual households figure importantly in the desired estimates, or if other statistical concerns motivate repeated observation. Obtaining estimates that are representative both of a panel and of the population in periods after the initial panel formation typically requires supplementing the panel observations with elements that were either not present at the time of the panel formation or were present but in a different proportion in the population.
The sample should always be selected according to a probabilistic scheme, i.e. each unit in the population should have a known ex ante probability of being selected. Only in this case will the survey estimates have good statistical properties. Because such properties are undefined for non-probabilistic samples, it is usually not possible to describe scientifically what estimates based on such samples represent, or to provide meaningful measures of precision for those estimates.
A tolerable level of error for each of the key estimates should be agreed upon with the researchers or policy makers requesting the information, subject to any cost constraints. Survey error is a consequence of both sampling error and non-sampling error. Sampling error is a consequence of making estimates on the basis of a sample, rather than on the entire population. Non-sampling error is a consequence of non-response, conceptual error, reporting error and processing error. Once an error tolerance has been set, the minimum sample size compatible with it and with cost constraints must be computed, exploring various possibilities until an optimal sampling plan has been determined. Particular care should be taken in making realistic assumptions about the response process and the full range of survey costs. If a sample design cannot satisfy both the desired error tolerance and the budget constraint, the project might have to be reconsidered: narrowing the scope of the survey, for example, might be preferable to delivering a large quantity of inaccurate results.
A main sample should be drawn, with a size equal to the target size, supplemented by a reserve sample large enough to substitute non-responding units based on reasonable assumptions on response rates. For example, if the target size of the sample is
- 1 000 households and a response rate of 50% is anticipated, the total sample should comprise
- 2 000 households. Some variations on this approach are dealt with later in this chapter.
The quality of estimates starts with the quality of the raw data. Questionnaire design and implementation, interview mode, interviewer selection and training, economic incentives offered to participants, and real-time quality control methods are critical contributors to data quality, and each should be considered carefully.
Audit activities should be carried out both during the fieldwork phase and after its conclusion. In all cases, data producers should have a clear monitoring scheme covering contact activities, refusals, substitutions, the contents of completed interviews, and any data manipulation taking place prior to transmission to the agency sponsoring the survey. If data collection is not outsourced, a third-party auditor should be involved in the process. As a part of audit activities, a share of the sampled households should be re-contacted in order to verify the truthfulness of interviewer statements.
When the results are released, measures of variability should be published, accompanied by a non-technical explanation of what these measures mean. If a microlevel data set is released for research or public use, it should contain information that allows users to compute the variability of their own estimates.
As an ethical requirement and sometimes a legal requirement, a clear programme for protecting the confidentiality of the data collected must be developed and implemented. In some cases, a plan must be put in place to further restrict the use of the data; for example, the data might be allowed to be used only for non-commercial purposes.
A thorough and continuing programme of evaluation of all steps in the survey should be instituted. Systematic evaluation enables quality improvements as well as the detection of changes in the behaviour or opportunities available to the population.