Evaluation Research at the International Level

Chapter 1 discussed two aspects of evaluation research: data generation (through surveys, qualitative data, event data and industry data) and research designs (descriptive analyses, randomised controlled experiments and quasi-experimental designs) (see 1.3). This chapter discusses special issues and complexities in conducting international evaluation research to prepare for a critical analysis of the ways in which WWA might be able to complement such research.

Research Designs

Randomised controlled trials provide a means by which researchers can attempt to isolate the causal effect of variables of key interest. But, they are very difficult to implement at the national level (Babor et al. 2010b; Hall 2018) and are an impractical way of making comparisons between countries. Quasi-experimental national designs are feasible when changes are made to policy or when circumstances independently affect markets for psychoactive substances (Babor et al. 2010a; 2010b). Yet, their main limitation lies in their inability to control for the effects of extraneous factors. Additionally, when they are attempted at a national level, challenges arise if analyses are conducted retrospectively using data that are inadequate because they were collected for other purposes (Hall 2018).

Problems are multiplied when quasi-experimental studies are attempted at the international level - for example by attempting to assess the impact of a new policy in one country by comparing its effects on drug use with that in other countries where the policy was not implemented. Among other things, the large numbers of variables on which the countries may differ complicates international quasi-experimental studies.

Consequently, the main method of evaluation undertaken at the international level is often a comparison of descriptive statistics on drug use between countries. It is important not to underestimate the value of comparative analyses. While descriptive comparisons are limited in their capacity to confidently evaluate policy, there is no doubt that descriptive comparisons influence policy agendas, posing examples of 'better performance' for policy formation, and potentially sway political decision making. As the UNODC and WHO data illustrated in the previous section, descriptive trend data provide a global picture of markets in psychoactive substances. Without these, international agencies would have no means of identifying global priorities; national governments would lack metrics on which to compare and monitor the 'performance' of their country in terms of drug consumption prevalence, substance use-related harms and the like.

Data Generation

Every method of collecting data on psychoactive substances has strengths and weaknesses, as was discussed at the end of the last chapter. This includes WWA (see 2.5). But for international comparative analyses there are additional overarching problems. Data are not collected in some jurisdictions, or are collected using inadequate methods. This problem, which is discussed further below, is particularly acute in low- and lower-middle- income countries.

However, another enduring problem complicates comparisons between even high-income countries with well-established systems for monitoring psychoactive substances. As highlighted by Kilmer, Reuter and Giommoni's (2015) analysis, most survey data in most countries are not comparable because they are collected with different procedures, report different measures or sample different age groups.


In surveys, the main categories of procedures include face-to-face interviews, telephone surveys and mail surveys. There are further variations within each of these categories, for instance as to whether they make use of a computer, audio technology, a human interviewer, or - in the case of mail surveys - whether participants' completed mail surveys are collected from their residences or returned by the participants through the postal system (Kilmer, Reuter and Giommoni 2015).

Literature on 'social desirability bias' suggests that survey participants may (wittingly or unwittingly) report information that presents themselves in a positive light (e.g. Harrison and Hughes 1997). This phenomenon may operate even when participants are able to provide information anonymously on topics that are not particularly sensitive or embarrassing. With this background, it is not surprising that the results of surveys about drug use - which is a topic that can attract comparatively high social stigma - appear to differ according to the degree of anonymity they afford participants. It is also known that these procedures differ in their capacity to recruit samples of participants who are representative of the general population (see e.g. Decorte et al. 2009; Johnson 2015; Kilmer, Reuter and Giommoni 2015).

These issues mean that international differences between countries in the reported prevalence of drug use may partly be an artefact of the procedures employed to measure it. Recent empirical analyses seem to support this point. Giommoni, Reuter and Kilmer (2017) compared the national population surveys of the US, Canada, Europe and Australia. Prevalence estimates were adjusted in an attempt to mimic the use of a standard data collection procedure across all countries. The results suggested that survey procedures did affect cross-country comparisons to a degree.

Problems may be further compounded when survey participants differ in age between different countries. Kilmer, Reuter and Giommoni (2015) argue that broader age ranges reduce the prevalence of drug use within a population because they incorporate a greater proportion of participants who are unlikely to use drugs - namely those aged between 12 and 14 years of age and those over 64 years.

Trend data are critical in most comparisons of drug use between countries. Two issues impair the value of trend data. First, the frequency of sampling varies. For instance, between 2000 and 2010 the frequency of sampling in some high-income jurisdictions ranged from annual surveys (e.g. US, England and Wales), to every three (e.g. Germany, Australia), four (e.g. Netherlands) and five years (e.g. France) (Kilmer, Reuter and Giommoni 2015). This means that an examination of trends over one decade might have ten data points for one country and only two for another.

Second, countries alter their sampling procedures in ways that alter the apparent prevalence of drug use. For instance, Kilmer, Reuter and Giommoni (2015) suggest that a reduction in reported cannabis use in Germany between 2003 and 2006 from 6.9% to 4.7% may have reflected a change in the age of participants and the questions employed.

The problems in comparing national surveys are off-set to some degree by using data from large surveys conducted using comparable methods in multiple countries. By taking careful measures to ensure that procedures are consistent across countries and by recruiting participants of the same age range, these surveys provide greater confidence in data used for comparisons. The features of leading international surveys are presented in Table 3.1.

Three of the four representative international surveys focus on young people and children, namely the European School Survey Project on Alcohol and Other Drugs (ESPAD), WHO Health Behaviour in School- Aged Children (HBSC) and International Self-Report Delinquency Study (ISRD). This age cohort is easier to recruit and an important one to study because of its vulnerability to impaired physical, psychological, social and educational wellbeing. Trends observed in these surveys are valuable for comparative purposes but their utility for estimating prevalence of use in the general population is limited because they exclude adults.

The WHO World Mental Health Survey (WMHS) employs a very robust and resource-intensive procedure to train its interviewers and gather data. It has a particular focus on the most widely used drugs, namely, alcohol, cannabis and tobacco, and it provides information on prevalence of use, dependence on alcohol and drugs, and therapeutic needs (see Degenhardt et al. 2010).

Because substance use is relevant to different domains of life, the HBSC approaches the topic from the perspective of health, the WMHS from mental health. The ISRD adopts the standpoint of crime victimisation and offending (Enzmann et al. 2018). ESPAD examines psychoactive substance use in detail by asking participants about their use of a wide variety of substances.

In terms of use for monitoring purposes, ESPAD and HBSC produce data every four years. The ISRD has been conducted in three waves and to date the WHMS has been conducted once in each of the participating countries. None of the surveys is conducted in countries defined as low- income countries by the World Bank (2017).

Event Data

Chapter 1 shows that a wide variety of official statistics (event data) are generated by government agencies whose activities are related to psychoactive substances (see 1.3). One of the key challenges in interpreting these data at a local or national level is that it is difficult to determine the extent to which changes in the data reflect changes in agencies' practices or changes in drug market activity, including consumption levels.

International comparative analyses of event data are difficult because countries do not record event data in the same ways. To summarise some

Time Frame Between Sampling

Participants' Age (years)

Countries (N)

Low-Income Countries* (N)


European School Survey Project on Alcohol and Other Drugs' (ESPAD)

4 years




Substance use

WHO Health Behaviour in School-Aged Childrenb (HBSC)

4 years



Health (alcohol, tobacco, cannabis)

International Self-Report Delinquency Studyc (ISRD)




Crime victimisation & offending (alcohol & drugs)

WHO World Mental Health Survey”1 (WMHS)

Single samples[1]



Mental health (alcohol, drugs, nicotine)

Based on World Bank 2017 listing.

1991-1992,2006-2008,2012-present key arguments mounted by Kilmer, Reuter and Giommoni (2015) primarily about drug use in high-income countries:

  • (a) Estimates of numbers of people using drugs in problematic ways are produced infrequently, using different methods applied to administrative data often defined in dissimilar ways.
  • (b) Cross-country variations in the production of death certificates muddies comparisons of countries' drug-related deaths.
  • (c) Drug-related arrests are complicated because the meaning of arrest varies across criminal justice systems, arrests are recorded differently, they are linked to law enforcement operational priorities, and criminal offences are defined inconsistently.
  • (d) Country seizure data may differ depending on the extent to which countries are drug producers, transit points, or consumers. For example, a seizure of heroin in North America would better reflect domestic consumption than would one in Africa (a transit point) or Afghanistan (a producer) (see UNODC 2016b). Disparities exist between countries in terms of the skill of the law enforcement agencies in interdiction, the care taken by traffickers to avoid detection, the operational priorities of the agencies, and whether the purity of seizures is tested and reported.
  • (e) Analysis is complicated even in the few countries that routinely collect and report price and purity data. For example, an increase in price may reflect a shortage of supply and hence reduced consumption. It also could mean that consumption is increasing and suppliers are confident that the market will accept price increases. In regards to purity data, there is a wide range in practices in forensic laboratories, which is usually determined by jurisdictional legal and/or workload influences. For example, some jurisdictions may opt to measure purity only for seizures that exceed prima facie limits, such as those that differentiate simple possession from possession for sale. Currently, it is impractical to use price and purity data for international comparisons.

Special Challenges in Measuring Alcohol and Tobacco Use

Comparative analyses of alcohol and tobacco surveys face similar problems of missing data, inadequate data, infrequent data collection, inconsistent data collection procedures and so forth. But there are other issues that are unique to alcohol and tobacco, respectively.

Unrecorded Alcohol Consumption

Industry data, including taxation and trade statistics, are used in many countries to estimate per capita alcohol consumption per year. These industry data cover 'recorded consumption' but significant amounts of the alcohol consumed in each country may not be recorded. This may occur because alcohol is brewed at home. Unrecorded alcohol may also be brewed locally for profit in an unregulated (and potentially illegal) way and then trafficked, or sold to cross-border shoppers, or diverted from industrial or medical sources (WHO 2014). Estimates of 'unrecorded consumption' are difficult to make and differ between countries in ways that probably reflect the affordability of recorded alcohol. Estimates by the WHO (2014) suggest that globally one-quarter of alcohol consumption is unrecorded. Proportionately about 8.5% of alcohol is unrecorded in high-income countries and the figure for low- and lower-middle-income countries is about 40%. In some Islamic states where alcohol is banned nearly 100% of alcohol consumption is unrecorded (WHO 2014). The WHO (2014, 84) estimates of unrecorded alcohol consumption are made in different ways, including direct metrics from national surveys, expert opinion, indirect estimates from government data on confiscated or seized alcohol, and indirect estimates from survey data.

The WHO (2014) has estimated unrecorded alcohol consumption for some time because a failure to do so would mean that global patterns of alcohol use are incomplete (Babor et al. 2010b). The potential adverse health implications of unrecorded alcohol have been highlighted by Rehm and colleagues (Rehm et al. 2014; Rehm, Kanteres and Lackenmeier 2010). The lower price of unrecorded alcohol means that it is disproportionately consumed by people from lower socio-economic backgrounds (in any country) and often in larger amounts than recorded alcohol. The consumption of unrecorded alcohol may also contribute more to the global disease burden because it may contain toxic metals, alcohol congeners (e.g. acetaldehyde), carcinogens, or highly toxic industrial alcohol that has been made unfit for human consumption (e.g. denatured alcohol, which may contain methanol).

Tobacco Monitoring

The tobacco industry may provide data in some countries, but globally the industry is seen to hamper rather than assist in monitoring tobacco use. Trends in tobacco use in survey data clearly show that global tobacco reduction strategies have successfully reduced global consumption to a degree not matched by strategies that have targeted alcohol or illicit drugs (USNCI and WHO 2016). Global figures indicate a 2.8% reduction in the rates of smoking among those aged 15 years and older, from 23.5% in 2007 to 20.7% in 2015 (WHO 2017b). The decline has been largest in high-income countries. Reductions were observed in about half of the middle-income countries and a third of the low-income countries.

Nonetheless, the scale of tobacco use is still very large and on current trajectories it may be difficult for WHO Member States to reach their 2025 target of a 30% reduction (USNCI and WHO 2016). Although the percentage of people who smoke declined between 2007 and 2015, the absolute numbers of smokers remained stable at about 1.1 billion people because of population growth (WHO 2017b). The effects of the disease burden attributable to tobacco is unevenly distributed between countries; 80% of adult men who smoke cigarettes currently live in low- and middle-income countries - "foreshadowing grave consequences for health in these countries" (USNCI and WHO 2016, xv).

The WHO (2017b) strongly emphasises the importance of maintaining and, if possible, improving tobacco monitoring. Between 2007 and 2014 the WHO recorded a steady increase from 46 to 77 in the number of countries using best-practice monitoring standards. The figure for 2016 was 76, suggesting a plateau in improvements to monitoring systems. This is partly driven by the difficulties that low-income (and some middle-income) countries face in conducting regular representative surveys of smoking in the adult and youth populations (see WHO 2017b).

The WHO (2017b) also identified a deficiency in data collection on the breadth of tobacco products. Notably, in the last decade only a third of countries worldwide have collected and reported survey data relating to tobacco use that includes not only cigarettes, but other forms of smoked tobacco (e.g. pipes, bidis, roll-your-own) and smokeless tobacco6 products, such as snuff, snus, gutka and chewing tobacco. While improvements have been made, particularly in youth surveys that adopt WHO protocols, the WHO (2017b, 56) has stated that "effectively combatting the tobacco epidemic requires all types of tobacco use to be monitored in all countries."

  • [1] The survey has been conducted once in each of the 25 countries, beginning in 2001 in Belgium, France, Italy, Mexico, China (Beijing, Shanghai),Spain and the US. The most recent data collection phase occurred in Argentina (2015). European School Survey Project on Alcohol and Other Drugs, www.espad.org/.b Health Behaviour in School-Aged Children. About HBSC. www.hbsc.org/about/index.html. For participating countries see Health Behaviour inSchool-Aged Children. HBSC member countries, www.hbsc.org/membership/countries/index.ntml.c Northeastern University. The International Self-Report Delinquency Study, https://web.northeastern.edu/isrd/. d Harvard Medical School. The World Mental Health Survey Initiative, https://www.hcp.med.harvard.edu/wmh/. Forparticipating countries see Harvard Medical School. WMH Cross National Sample, www.hcp.med.harvard.edu/wmh/national_sample.php.
< Prev   CONTENTS   Source   Next >