Reading 5.2 A Comparison Between Two Retrospective Alcohol Consumption Measures and the Daily Drinking Diary Method With University Students

Repeated alcohol consumption among specific individuals is nearly impossible to observe as, drink- ing can occur in multiple locations at any time of the day. The most practical way to capture such data is to rely on people to self-report their drinking. As Patterson, Hogan, and Cox (2019) discovered, not every' self-report method is equally valid and reliable. Researchers continue to search for measures that are easy for respondents to use but still provide accurate estimates of daily drinking habits. Patterson and colleagues compared three different ways to measure drinking habits retrospectively. Each measure required respondents to catalog their drinking habits differently. This article is an excellent example of how the decisions we make when collecting data have an impact on the results that we find.

A Comparison Between Two Retrospective Alcohol Consumption Measures and the Daily Drinking Diary Method With University Students

Chris Patterson, Lee Hogan, and Miles Cox

Recent epidemiological evidence has identified that as little as one heavy-drinking episode per week increases a person’s chances of dying from a long-term illness (1). Considering the significant risks and consequences posed by excessive alcohol consumption, accurate alcohol consumption measurement is essential (2).

The most widely used method of assessing and evaluating alcohol consumption is self- report. This is because it: (a) is less intrusive and easier and cheaper to administer and interpret than alternative methods (i.e., biochemical), (b) provides detailed information that can be used to identify alcohol misuse and dependency, and (c) has consistently been shown to be a reliable and valid method of measuring alcohol consumption (3). There are three main categories of self-report measures: (1) retrospective, summary measures; (2) retrospective, daily drinking measures; and (3) concurrent, daily drinking measures. Retrospective summary measures (e.g., quantity-frequency measures) require respondents to report their average quantity of alcohol consumed on a drinking occasion and average frequency of drinking occasions for a specified period in the past. These quantity and frequency estimates are then multiplied together to calculate total alcohol consumption (4).

The earlier, more simplistic QF measures were criticized for being unable to identify the variability of a person’s drinking (5). For instance, they were unable to distinguish between different drinking patterns when the total number of units consumed was the same: (a) drinking two units of alcohol on each day of the week, (b) drinking seven units twice a week, and (c) drinking fourteen units once a week. Information about drinking pattern variability is essential, especially considering the unique risks associated with particular drinking patterns, such as heavy drinking (6). In response to this criticism, researchers (7) designed more sophisticated QF measures that could classify respondents’ drinking (e.g., abstainer, light, moderate, and heavy).

Quantity-frequency measures have been consistently shown to produce reduced estimates of alcohol consumption and to be poor at distinguishing different drinking patterns, compared to daily drinking measures (8-10). For example, in FlegaPs study (10), 31% of heavy drinkers—identified by a daily drinking measure— were classified as moderate drinkers by the QF method. Similarly, in Redman et al.’s study (9), the QF method failed to detect 78% of heavy drinkers identified by a daily drinking measure.

Fitzgerald and Mulford (8) managed to quantify the QF method’s insensitivity for capturing atypical drinking. Participants in Fitzgerald and Mulford’s study (8) were asked to complete a standard QF measure (i.e., reporting their typical drinking pattern) and then separately asked to report any atypical drinking. The addition of atypical drinking questions resulted in 35% of participants reporting more drinking, thus increasing the total alcohol consumption estimate by 14%. Adapted QF measures (which also ask respondents to report atypical and heavy drinking) have been found to produce estimates of alcohol consumed that are similar to those obtained from concurrent drinking records (11).

The Timeline Followback (TLFB) (i.e., retrospective, daily drinking) is widely accepted as the gold standard alcohol consumption measure. This is because it can capture detailed information about people’s drinking behavior, including drinking pattern and variability (12). Moreover, the reliability and validity of drinking data captured by the TLFB have been shown to be high (13).

Clinicians favor the TLFB because its data can be reviewed to help identify individuals’ triggers to use, high-risk situations, and relapse periods and increase individuals’ motivation and commitment to change (14). As a product of the precision with which the TLFB captures alcohol consumption, however, the TLFB’s administration is more time consuming and more demanding than other measures (i.e., QF measures). The TLFB may, therefore, be unsuitable in time- limited situations where precise information is not required: for example, some survey studies. Survey studies that have used the TLFB have reported high rates of attrition (15).

Concurrent daily drinking measures require respondents to make concurrent, detailed records of their daily alcohol consumption (e.g., recording amount, frequency, mood, and urges). This approach has been used extensively to monitor different behaviors in clinical settings

  • (16) . Many researchers believe that the Daily Drinking Diary (DDD) method produces the most accurate reports of alcohol consumption as respondents are less likely to misreport their consumption as a consequence of forgetting
  • (17) . As a result of its increased accuracy, the DDD method often produces higher estimates of drinking frequency in studies.

This concurrent daily drinking method is recommended to researchers and clinicians who require precise information about the frequency of alcohol consumption (12). Additionally, researchers and clinicians may ask their clients to keep a daily record of their drinking during treatment. Clients might also be asked to record other variables such as their mood at the time of drinking. This information can be used to identify the antecedents to their drinking; for example, a person might consume alcohol as a means of avoiding unwanted emotional experiences, such as anxiety. DDD data can also be used to track a client’s progress during treatment.

It is important to consider that the DDD method has a number of limitations. First, individuals might not adhere to self-monitoring instructions (18). Second, the DDD method cannot be used to gather information about pre- treatment alcohol consumption. In cases in which pretreatment drinking information is required, researchers and clinicians would have to use a retrospective measure (12). Third, respondents tend to reduce their drinking as a product of recording their alcohol consumption concurrently (i.e., recording one’s own drinking is reactive). For instance, when DDDs have been used as control or waiting conditions in clinical trials, significant reductions in alcohol consumption have been reported (19). Reactivity, however, can be beneficial in clinical settings, where the aim of treatment is to reduce alcohol consumption and change harmful drinking patterns.

This study aimed to compare the accuracy of two retrospective alcohol consumption measures, the Timeline Followback (TLFB) and the Typical and Atypical Drinking Diary' (TADD) to an assessment of alcohol consumption captured concurrently in Daily Drinking Diaries (DDD) during a 28-day period. The administration of the retrospective drinking questionnaires was to be delayed for a further 28 days after the completion of the diaries in order to prevent the easier recall of estimates following the daily diary procedure. Comparisons of drinking estimates would focus on three aspects of drinking behavior: (a) the total amount of alcohol consumed, (b) the total number of drinking days, and (c) the total number of heavy drinking episodes. There is a strong body of evidence indicating that the DDD method provides the most accurate record of alcohol consumption. It was hypothesized that the gold standard TLFB would provide the most accurate retrospective estimates of alcohol consumption. If the TADD was found to produce estimates that were as accurate, or more accurate, than those provided by the TLFB, this would offer certain advantages to professionals within the field. There would then be evidence that the self-administered and easy to administer TADD can produce reliable and valid estimates of total alcohol consumption, drinking patterns, and drinking variability.



Out of the 75 psychology undergraduates who initially volunteered to participate in this study, 43 managed to complete the study and provide valid data: 34 females (79.1%) and 9 males (21.9%), whose ages ranged from 18 to 46 years (M = 20.8, SD = 5.1). Twenty participants were first-year students (46.5%), 21 were second-year students (48.8%), and 2 were third-year students (4.7%). The majority of participants identified themselves as White (76.7%), while the remainder identified themselves as Asian (23.3%). There were 33 British students (76.7%), 5 from other European countries (11.2%) and 5 from Asia (11.2%). Participation in this study was rewarded with course credit, as well as tickets for a prize draw where monetary prizes could be won: participants who provided all the necessary data gained the maximum number of tickets. The School of Psychology’s ethics committee approved this study prior to its commencement.


Participants met with the investigator in a quiet experimental room in groups of five or less (M =2.41, SD = 1.40), between 25 and 33 days (M = 28.20, SD = 3.34) after the start of the second semester. Participants were required to read the participant information sheet and provide their informed consent before commencing this study. Participants were then provided with their first drinking diary and instructed on how to complete the diary correctly (see DDD section).

As agreed, participants met with the investigator for ten minutes once a week for four weeks. During these meetings, participants handed over their completed diaries and received new diaries. If participants completed their diaries correctly, they were praised. If they completed their diaries incorrectly, they were given extra instruction.

Between 28 and 41 days after completing the DDD (M = 34.02, SD = 3.96), participants met with the investigator, in groups of six or less (M = 2.38, SD = 1.52), for approximately 30 minutes. Participants filled in a demographics form and then estimated their alcohol consumption for the period of time that they recorded their drinking using the DDD (i.e., a time period that was approximately 56 days to 28 days previously) using two retrospective measures: the TADD and TLFB (in that order). Subsequently, participants were debriefed, thanked, and dismissed. A prize draw was conducted when data collection was complete.



The DDD is a method of concurrently recording alcohol consumption information. The DDD method has been used widely in research validating retrospective alcohol consumption measures. In this study, participants recorded their daily alcohol consumption for 28 consecutive days, detailing the alcohol percentages, volumes, and quantities of beverages consumed: this information enabled the calculation of units. Participants recorded their alcohol consumption the day after it had occurred in a seven-day diary created specifically for this study. They submitted each of their four weekly diaries directly to one of the researchers. Due to its not being a standardized alcohol consumption measure, the DDD does not have any psychometric properties to report.


The six-item TADD was developed by Hogan

  • (20) as a method of retrospectively estimating alcohol consumption and drinking patterns for a specified time period. It can also calculate peak blood-alcohol concentration (ВАС) for each drinking session if required (21). The TADD comprises two weekly diaries: one for typical weeks and the other for atypical weeks
  • (21) (i.e., heavier or lighter drinking weeks). In the typical drinking diary section, respondents stated the types of drinks, alcohol percentages, volumes, and quantities of the beverages that they consumed for each day of a seven-day week (i.e., Monday through Sunday), and then they estimated how many weeks they drank this typical amount during a specified time (in this instance, four weeks). In the atypical drinking diary section, respondents provided all the same information, but for a pattern of drinking that might be an atypical week for the respondent (i.e., either greater or lesser than their typical weekly pattern). Again, they estimated how many weeks they drank this atypical amount during the four weeks of the drinking diary period. Typical beverage sizes and their alcohol content were shown in an accompanying table, to aid recall. Hogan (20) reported a Cronbach’s alpha, a measure of internal consistency (reliability), of .78 (n = 170) for the TADD. In terms of (concurrent) validity, when compared with the TFLB, the TADD’s ICC = .872 (95% Cl = .677—-935) for an 84-day period.


The TLFB (22) is a retrospective daily drinking alcohol consumption measure, which can be used to gain a detailed picture of a person’s daily drinking over a specified time period. There is a robust body of evidence attesting to the TLFB’s ability to produce reliable and valid estimates of alcohol consumption with a wide range of clinical and nonclinical populations (12, 22, 23). For example, the TLFB has been reported to have a Cronbach’s alpha of .84 (24) and test-retest reliability ranging between r = .80-1.00 for a number of drinking variables over a 90-day interval (25).

In an interview with the investigator, who was trained to administer the TLFB, respondents used the calendar-based TLFB form to retrospectively estimate their daily drinking over a specified period of time: the 28-day period when they concurrently recorded their daily alcohol consumption using the DDD method. To aid recall, participants were provided with a sheet outlining the volume and alcohol content of the most commonly consumed beverages.


This study used a within-subjects design to compare participants’ reports of alcohol consumption on three instruments.

Statistical Analysis

Summary statistics (i.e., means and standard deviations) were obtained for the three alcohol consumption measures on three drinking variables: total alcohol consumption, number of drinking days, and number of heavy drinking episodes.

Intra-class correlations (ICC) were employed to establish how similar the estimates provided by the TLFB and the TADD were in comparison to those produced by the DDD. ICC used the two-way mixed subjects and absolute agreement methods. Moreover, Koo and Li’s (26) guideline for reporting ICC was used. This guideline states that values less than 0.5 are ‘poor,’ between 0.5 and 0.75 are ‘moderate,’ between 0.75 and 0.9 are ‘good,’ and greater than 0.9 are ‘excellent.’

Paired-sample t-tests were applied to ascertain whether the TLFB’s and the TADD’s estimates were significantly different from those yielded from the DDD.

Drinking Definitions

In this study, alcohol consumption was calculated using the UK’s unitary system. According to this system, one unit is equal to 8 g or 10 mL of pure alcohol. One UK unit is equivalent to

0.56 standard drinks in the USA. Heavy drinking is referred to in the following results section. This term refers to instances when male participants consumed eight or more UK units (4.51 or more standard drinks in the USA), and female participants consumed six or more UK units in a single drinking session (3.38 or more standard drinks in the USA).


Using data obtained from the concurrent DDD method, the mean weekly consumption of alcohol in UK units was 15.33 (SD = 11.10). On average, students drank alcohol on 8.83 days over the 28-day period they recorded their drinking (SD = 4.06) and drank heavily on 4.02 days (SD = 3.48). Overall, 86% of students drank heavily on at least one occasion during the 28-day period, and 46.5% drank heavily at least once per week. In Table 1, summary

Table 1 Summary Statistics of Drinking Measures for the Concurrent Daily Drinking Diary (DDD) and the Retrospective Measures of the Typical and Atypical Drinking Diary (TADD) and the Timeline Followback (TLFB)

Total alcohol consumption

Number of drinking days

Number of heavy drinking episodes

M (SD)

M (SD)

M (SD)


61.34 (44.39)

8.83 (4.06)

4.02 (3.48)


65.80 (43.34)

9.0 (4.05)

4.86 (4.29)


36.39 (33.13)

5.49 (2.91)

2.51 (2.93)

Note: Total alcohol consumption Is reported in terms of the number of UK alcohol units consumed.

statistics are shown for each of the three methods for assessing alcohol consumption.

Comparison of the DDD and TLFB for overall alcohol consumption had an ICC = .735 (95% Cl .229-886), which is ‘moderate.’ When compared, the DDD’s and TADD’s overall alcohol consumption estimates had an ICC = .908 (95% Cl .832-950), which is ‘excellent.’ With regard to estimated number of drinking days, the DDD and TLFB had an ICC = .498 (95% CI-.087 to .758), which is ‘poor.’ Comparison of the DDD and TADD for number of days drinking had an ICC = .886 (95% Cl .789—.938), which is ‘good.’ The ICC between the DDD’s and TLFB’s estimated number of heavy drinking episodes was = .799 (95% Cl .493—.907), which is ‘good.’ Similarly, the ICC between the DDD’s and TADD's estimated number of heavy drinking episodes was = .757 (95% Cl .555-868), which is ‘good.’

Paired-samples t-tests revealed no significant difference between the TADD and the DDD in terms of total alcohol consumption, t (42) = -1.16, p = .254, number of drinking days, t (42) = -0.41, p = .686, and number of heavy drinking episodes, t (42) = -1.61, p = .115. Further paired-samples t-tests demonstrated that the TLFB, in comparison to the DDD, produced significantly lower estimates of alcohol consumption, t (42) = 5.35, p t (38) = 6.07, p t (42) = 4.23, p < 001.


This study established that it is possible to accurately estimate alcohol consumption using a retrospective alcohol consumption measure. The TADD provided highly accurate estimates of three important drinking variables: total alcohol consumption, number of drinking days, and number of heavy drinking episodes. In contrast, the TLFB significantly underestimated total alcohol consumption, number of drinking days, and number of heavy drinking episodes.

Why did the TLFB underreport actual consumption? In line with Fishburne and Brown’s (27) social desirability hypothesis, students in this study might have feared that the interviewer was going to judge them negatively as the TLFB was administered in a one- to-one interview. In contrast, the TADD was completed independently without an interviewer scrutinizing the drinking estimates. It is hypothesized that this level of independence enabled respondents to describe their alcohol consumption patterns without fear of judgment.

The results from this study provide further evidence that QF measures can provide accurate retrospective estimates of alcohol consumption and drinking variability, as long as they ask questions about both typical and atypical drinking (8, 11, 28). Giving respondents a full weekly pattern of assessment for their alcohol consumption for a typical week and for an atypical week in the TADD provided sufficient range/variability to capture a reasonable estimate of actual drinking. The independent administration of the TADD gives it a greater advantage over the TLFB in terms of ease of administration and reduced burden on respondents.

A number of limitations exist in this study. For example, it is possible that the estimate of alcohol consumption using the DDD might not have accurately estimated actual consumption: the DDD was also an estimate of consumption rather than an objective independent report. The procedure of administering the TADD prior to the TLFB might have influenced the completion of the TLFB; however, it was felt that the completion of the TLFB first was highly likely to improve recall on the TADD.

The generalizability of this study’s results is also limited by the study’s sample size, as well as the underrepresentation of male students. While a sufficient number of individuals initially volunteered to participate in this study, a large number failed to complete it. This high rate of attrition is likely to have been associated with this study’s demanding data collection procedure. If this study was to be replicated, researchers should consider replacing face-to-face contact with communication via email. The ratio of male to female students in this study was characteristic of the anecdotal underrepresentation of male psychology undergraduates. In retrospect, it would have been beneficial if students studying other subjects were recruited to balance this ratio.


These results can be taken as preliminary evidence that the TADD can be a quick and easy to administer instrument that provides accurate estimates of total alcohol consumption, number of drinking episodes, and number of heavy drinking episodes. In the future, therefore, clinicians and researchers should consider using the TADD if they require accurate retrospective information about a client’s total alcohol consumption and drinking variability, especially if they are time limited.

Discussion Questions

  • 1. If the TADD and the TLFB were both retrospective consumption measures, what is the likely explanation for why they produced such different results?
  • 2. From a research participant’s standpoint, what were the pros and cons of having to fill out each of the three data collection instruments?
  • 3. How did the fact that participants had to complete one of the forms in front of a research staff member impact those results?


  • 1. Holmes J, Angus C, Buykx В Ally A, Stone T, Meier В Brennan A. Mortality and morbidity risks front alcohol consumptiott in the UK: attalyses using tlte Slteffield Alcohol Policy Model (v. 2.7) to inform tlteChief Medical Officers’ review of the UK lower risk drinking guidelines. Sheffield: ScHARR, University of Sheffield; 2016.
  • 2. Gmel G, Rehm J. Measuring alcohol consumption. Contemp Drug Probs. 2004;31:467. https://
  • 3. Del Boca FK, Noll JA. Truth or consequences: the validity of self-report data in health services research on addictions. Addiction. 2000 Nov 1 ;95 (1 ls3) :347-360. 1360-0443.95.1 ls3.5.x.
  • 4. Midanik LT Comparing usual quantity/frequency and graduated frequency scales to assess yearly alcohol consumption: results from the 1990 US

National Alcohol Survey. Addiction. 1994 Apr 1;89(4);407412.

  • 5. Alanko T An overview of techniques and problems in the measurement of alcohol consumption. In: Smart RG, Cappell HD, Glaser FB, et al., eds. Research advances in alcohol attd drug problems. Boston (MA): Springer; 1984. Pp. 209-226. 7.
  • 6. Connor J. Alcohol consumption as a cause of cancer. Addiction. 2017 Feb 1;112(2):222—228. Ill 1/add. 13477.
  • 7. Polich JM, Orvis BR. Alcohol frroblents: patterns and prevalence in the US air force. Santa Monica (CA): RAND CORP; 1979 Jun.
  • 8. Fitzgerald JL, Mulford HA. Self-report validity issues. J Stud Alcohol. 1987 May;48(3): 207-211.
  • 9. Redman S, Sanson-Fisher RW, Wilkinson C, Fahey PR Gibberd RW Agreement between two measures of alcohol consumption. J Stud Alcolwl. 1987 Mar;48(2): 104-108. 1987.48.104.
  • 10. Flegal KM. Agreement between two dietary methods in the measurement of alcohol consumption. J Stud Alcohol. 1990;51 (5):408—414- https:// 1990.51.408.
  • 11. Wyllie A, Zhang JF, Casswell S. Comparison of six alcohol consumption measures from survey data. Addiction. 1994 Apr 1 ;89(4) :425—430. https://doi. org/10.1111/j.l 360-0443.1994.tb00917.x.
  • 12. Sobell LC, Sobell MB. Alcohol consumption measures. Assess Alcohol Prob. 1995;2:75-99.
  • 13. Sobell LC, Maisto SA, Sobell MB, Cooper AM. Reliability of alcohol abusers’ self-reports of drinking behavior. Beltav Res Ther. 1979Jan 1; 17 (2): 157—160.
  • 14. Sobell LC, Cunningham JA, Sobell MB, Agrawal S, Gavin DR, Leo GI, Singh KN. Fostering self- change among problem drinkers: a proactive community intervention. Addict Belutv. 1996 Nov l;21(6):817-833. 4603(96)00039-1.
  • 15. Cunningham JA, Ansara D, Wild TC, Toneatto T, Koski-Jannes A. What is the price of perfection? Tlie hidden costs of using detailed assessment instruments to measure alcohol consumption. ] Stud Alcohol. 1999 Nov;60(6): 756-758. https://
  • 16. Korotitsch WJ, Nelson-Gray RO. An overview of self-monitoring research in assessment and treatment. Psvclwl Assess. 1999 Dec; 11 (4) :415. https://
  • 17. Sobell MB, Bogardis J, Schuller R, Leo GI, Sobell LC. Is self-monitoring of alcohol consumption reactive. Behav Assess. 1989 Jan 1; 11 (4) :447—458.
  • 18. Sanchez-Craig M, Annis HM. ‘Self-monitoring’ and ‘recall’ measures of alcohol consumption: convergent validity with biochemical indices of liver function. Alcolwl Alcohol. 1982 Dec 1; 17 (3): 117-121.
  • 19. Kavanagh DJ, Sitharthan T, Spilsbury G, Vignaendra S. An evaluation of brief correspondence programs for problem drinkers. Behav Ther. 1999 Sep 1 ;30(4) :641—656. https://doi. org/10.1016/S0005-7894(99)80030-6.
  • 20. Hogan LM. Developing and evaluating brief, computerised interventions for excessive drinkers [Doctoral dissertation]. Bangor: University of Wales; 2005.
  • 21. Hogan LM. Relationships among alcohol use, emotion, motivation, and goals [Doctoral dissertation]. Bangor: University of Wales; 2008.
  • 22. Sobell LC, Sobell MB. Timeline follow-back. In Measuring alcohol cotrsumption. Totowa, NJ: Humana Press; 1992. Pp. 41—72.
  • 23. Sobell LC, Sobell MB. Alcohol Timeline Folloivback (TLFB). Flandbook of psychiatric measures B2. Handbook of psychiatric measures. Washington, DC: American Psychiatric Association; 2000. Pp. 477479.
  • 24. Wennberg R Bohman M. The timeline follow back technique: psychometric properties of a 28-day timeline for measuring alcohol consumption. Ger J Psychiatry. 1998;2:62-68.
  • 25. Sobell LC, Sobell MB, Leo GI, Cancilla A. Reliability of a timeline method: assessing normal drinkers’ reports of recent drinking and a comparative evaluation across several populations. Br J Addict. 1988 APr;83(4):393-402.
  • 26. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016 Jun 1; 15 (2): 155— 163. https://doi.Org/10.1016/j.jcm.2016.02.012.
  • 27. Fishburne JW, Brown JM. How do college students estimate their drinking? Comparing consumption patterns among quantity-frequency, graduated frequency, and timeline follow- back methods. J Alcohol Drug Educ. 2006 Mar 1 ;50( 1): 15.
  • 28. Rehm J, Dawson D, Frick U, Gmel G, Roerecke M, Shield KD, Grant B. Burden of disease associated with alcohol use disorders in the United States. Alcoholism. 2014 Apr 1;38(4): 1068-1077. https:// 12331.
< Prev   CONTENTS   Source   Next >