The epistemology of data journalism amid challenges of misinformation
There is a significant body of literature in (digital) journalism studies focusing on the developments in data journalism over time as well as current practices, challenges, and outcomes (Appelgren, Linden, & van Dalen 2019; Hermida & Young 2019; Lewis 2015). Data journalism is associated with computer science and statistics, involving the use of computers and software programmes to process and analyse quantitative data to build new knowledge, and the publication of the results through information visualisation. The term data journalism is used to capture the multiple and fluid forms of data-driven journalism (Fink and Anderson 2015; Hermida and Young 2019; Mutsvairo 2019).
In the best of worlds, data journalists and other social actors can employ data to examine, reveal, and visualise complex phenomena in ways that advance journalism practice and offer important, accurate, and verified knowledge. In the worst-case scenario, journalists and news publishers end up publishing data journalism that skews information and ends up misinforming the public (Coddington 2015; Lewis & Westlund 2015). Raw, objective, and completely unbiased data is a fantasy rather than a reality. Even in countries where public authorities have ambitious intentions for collecting and compiling reliable datasets, analyses of such data can result in drawing inappropriate data visualisations and conclusions, resulting in readily available misinformation that can be shared by the public.
In the course of writing this chapter, the COVID-19 pandemic has changed society and life as we know it. Many nations enforced lockdown measures in an attempt to prevent the virus from spreading, with unprecedented consequences for macroeconomics and climate, as well as daily life. COVID-19 has resulted in packed hospitals and depleted stocks of medical equipment, panic buying by citizens, armed demonstrations against lockdown measures, and everyday acts of kindness such as coordinated live music performances across balconies. Napoli (2020) points out that amid COVID-19, a convergence is taking place between health misinformation and political misinformation.
COVID-19 has resulted in millions of news articles and news broadcasts by journalists. Then there have been the countless pictures, videos, and observations by the public. There is a sort of mechanical objectivity in the nature of pictures (Carlson 2019), showing the world ‘as it is’. Nevertheless, photos of empty shelves in grocery stores can misinform, even more so when it comes to the status of the supply chains. Data journalists can identify and report on the supply chains to grocery stores, such as from toilet paper factories, in order to debunk misinformation about perceived shortages. During such a pandemic, data journalists can play a significant role in gathering, analysing, and publishing data journalism of crucial importance alongside other journalists, authorities, the public, and various stakeholders. In the next two sections, we explore two key aspects integral to the epistemology' of data journalism and problematise these in relation to misinformation and COVID-19.
What (data) journalists know and how they know it
Producing reliable information and knowledge through data journalism depends on a range of conditioning factors, including, but not limited to, (1) access, (2) expertise, and (3) coordinating practices.
First, news publishers and journalists must have access to relevant and reliable datasets, which is not the case for several areas of inquiry and varies in different countries (Lewis & Nashmi 2019; Porlezza & Splendore 2019a, 2019b). Journalists can also turn to international and accessible sources to extract data and reveal patterns in specific countries: for example, by using satellite images and data to track a multitude of aspects relating to climate change. In relation to this, journalists have been developing online sourcing to include satellite images to detect and analyse activities relating to news events that may contrast with misleading official accounts (Seo 2020).
Second, news publishers and related social actors must have relevant expertise to process, analyse, interpret, and present the data. Specifically, social actors must have fundamental or advanced knowledge of statistics and handling statistics software to process the data in appropriate ways (Coddington 2015; Lewis & Westlund 2015). In analysing and interpreting datasets, they should be sensitive to the strengths and weaknesses of the data and, ideally, be transparent about these. Data journalism may require expertise in how to develop algorithms to automatically collect large amounts of data from authorities and international organisations, such as the World Health Organization and the United Nations, and from social media platforms, such as Twitter. There is the additional step of presenting the data to the public, often through data visualisations that may offer some interactivity (Young, Hermida & Fulda 2018).
Third, data journalism is a specialised expertise that not all journalists have, and thus, coordinating practices can be critical for advancing and integrating tacit and explicit knowledge amongst members of the news organisation. Inside some news organisations, journalists collaborate with technologists towards shared goals by building on each others tacit and explicit knowledge (Hermida & Young 2019; Lewis & Westlund 2015; Usher 2016). Some of the most well-known data journalism efforts, such as the Panama Papers, resulted from cross-cultural coordination among journalists who shared resources and efforts during the investigation. Data journalists may also have to coordinate their practices with actors outside journalism, such as civic technologists (Cheruiyot, Baack & Ferrer-Conill 2019).
Now lets turn to what data journalists know and how they know it in the salient case of COVID-19. In their reporting, news publishers can follow updates and access data collected and assembled by entities such as the WHO, Johns Hopkins University, national governments, et cetera. Data from the WHO about new cases, active cases, recovered cases, deaths, total cases, and so forth allows comparison across countries and over time. However, such comparisons depend on individual countries reporting accurately and regularly and using the same methods to count the number of cases and fatalities. Despite many inconsistencies, such figures have become a feature of daily reporting. Take The Guardian and its daily “Coronavirus latest: at a glance” report as an example. On 6 April 2020, it reported:
Italy registered 525 new coronavirus deaths on Sunday, the lowest daily rate since 19 March, while Spain recorded 674 deaths in the past 24 hours — the lowest daily death toll reported since 26 March. In France, 357 people died from COVID-19 in hospitals.
Do The Guardian and other news publishers producing similar news materials inform or misinform when reporting this data? The reporting on figures and developments depends on the reliability of the databases. For each country to produce reliable and comparable data, there must be systematic procedures of reporting diagnosed cases, the number of patients in treatment, recoveries, and deaths. There is good reason to assume the actual number of those infected is far higher than the number of reported diagnosed cases, which depend on the scale of testing conducted in each country. Researchers reported that even in the early stages of the spread of the coronavirus, the number of undiagnosed cases was high (Li et al. 2020).
Journalists, authorities, and publics are acting on publicly accessible data, despite such data being problematic and seemingly unstandardised. How some countries track infections and deaths has changed over time. For example, at the end of April 2020, the UK government changed how it reported deaths related to COVID-19 to include fatalities outside hospitals. The result was news stories about the UK being the ‘worst-hit European country’, outstripping Italy to have the highest number of coronavirus deaths in Europe (Campbell, Perraudin, Davis & Weaver 2020). Whether this was true is hard to ascertain as Italy used a different method to count cases, and the actual figures in both countries might be higher due to missed cases or delays in reporting. Some countries report all deaths and not only COVlD-19-related deaths, which results in a higher number. Other countries only report deaths as being caused by
COVID-19 when there has been a confirmed test. Thus, official figures are open to manipulation and/or misrepresentation.
At the end of April 2020, as COVID-19 deaths were rising in the UK, the government added a new graph to its news briefing. The slide offered a comparison of global deaths per million population, suggesting that the death rate in the UK was below those in Belgium, Italy, and Spain (Doyle 2020). The visuals told a politically convenient story, even if the small print acknowledged differences in fatalities attributed to COVID-19. Politicians and authorities elsewhere have adopted similar approaches to shape the communication of COVID-19 data. Moreover, a shortage of testing kits has meant testing the dead has been a low priority (Dupree, Hauslohner, Dalton & Sun 2020). Not only are there problems with testing accuracy and the availability of testing equipment; in some countries political leaders have questioned and/or seemingly downplayed the prevalence of the virus altogether.
The problems with accessing and reporting reliable data for COVID-19 also extend to hospital beds, ventilators, masks, and so forth. Alternative news media and citizens across the globe have published and shared materials for digital media, some with videos, discussing immediate shortages at grocery stores and hospitals. This has fuelled fear, panic buying, hoarding, demonstrations, and violence. Journalists, authorities, and fact-checkers, as well as platform companies and citizens, play important roles in critically examining information and disinformation. Platform companies continuously moderate illegal content as well as misinformation (Gillespie 2018), and companies such as Facebook have ramped up these efforts during COVID-19.
Since institutions of journalism play an authoritative role in pursuing truthfulness and typically verify information with different and reliable sources (Carlson 2017), professional factcheckers have been working on debunking misinformation in most countries. However, fact-checkers in Austria, Germany, the UK, and the US demonstrate substantially different approaches to transparency (Humprecht 2020), despite arguments about the need for transparent practices to gain trust. A study from Brazil found that people are not very receptive to debunking misinformation relating to new and unfamiliar diseases such as Zika, compared to more familiar diseases such as yellow fever (Carey, Chi, Flynn, Nyhan & Zeitzoft 2020).
Journalists can, of course, interview and quote reliable sources discussing inventory while data journalists seek to access datasets and visualise developments in real time as well as over time. Data journalists reporting about COVID-19 should have expertise in examining the strengths and weaknesses in such data and do their best to report in transparent ways as the pandemic evolves. In the rush to cover all aspects of the coronavirus pandemic, many news outlets have reassigned reporters and editors with no background or expertise in science or health communication to the story. Aside from getting to grips with the terminology, methodologies, and research on viruses and pandemics, there is the additional challenge of interpreting data such as national fatalities. Given the limitations of daily death rates, a more reliable approach advocated by health experts is to compare the number of deaths with the expected numbers — the excess mortality To their credit, some news publishers such as the BBC, The Economist, The Financial Tinies, and the New York Tinies have been reporting excess mortality rates. Integrating information from different datasets can produce more reliable information and help debunk misinformation. Expertise as well as coordinating practices are important for achieving this.