Knowledge claims associated with data journalism
The allure of ‘big data’ is that ‘it is unencumbered by the conventional thinking and inherent biases implicit in the theories of a specific field’ (Mayer-Schonberger & Cukier 2013, 71). Despite critical questions (boyd & Crawford 2012), the news industry has been optimistic about new possibilities for producing information with a strong knowledge claim as it aligns with journalisms assumed authoritative role in society to provide verified facts (Carlson 2017; Karlsson, Clerwall & Nord 2017). In traditional news reporting, journalists articulate knowledge claims in their text and the language of news in the way they present sources, the subject, and themselves in text, talk, and visual representations. Visuals create a feeling of ‘out-there-ness’ (Montgomery 2007), resulting in a sort of mechanical objectivity associated with visuals and other forms of photojournalism. While there is a long history of data being visually represented, there has been renewed interest in the digital possibilities of data visualisation and interactivity (Young, Hermida & Fulda 2018). Data visualisations present a visible argument for a story and can be more persuasive than words and figures alone as they ‘look and feel objective, precise, and, as a consequence, seductive and convincing’ (Cairo 2015, p. 7). Yet choices of scale, form, colour, and hue all shape the narrative and impact of a visualisation (Munzner 2014).
In data journalism, descriptions of sources and epistemic truth claims made on the basis of the data are important. Following processes of computer-supported analyses of quantitative datasets, data journalists publish findings in the form of data visualisations, interactive representations, and/or textual storytelling (possibly with accessible datasets). The data journalist extracts or simply shares findings from the dataset used for the visualisations and/or interactives, based on an epistemic process of producing knowledge about a phenomenon. Statistics are repeatedly presented and interpreted as objective ‘facts’. However, any statistician knows data is not objective as its characteristics and shortcomings can lead to misinformation or even be manipulated for disinformation. Gitelman (2013) problematises this in discussions of raw data being an oxymoron. Ultimately it is important to ask if data journalism and the findings produced are presented as ‘facts’ or with descriptions of the biases and limitations of the data.
In terms of knowledge claims, there are multiple questions that data journalists can ask and provide answers to if they access and analyse reliable datasets. Ongoing research at the Swedish public service broadcaster Sveriges Television (SVT) by one of the authors find that in their COVID-19 data journalism, the SVT data journalism team has systematically sought to be transparent about the data used, how these are analysed, and its shortcomings. The New York Times addressed the fundamental question of how the virus got out using data to show how hundreds of millions of people travelled out of Wuhan in China in the early days of the virus (Wu, Cai, Watkins & Glanz 2020). In the analysis, the journalists used not only reported data about confirmed cases but also rough estimates of total cases at the time provided by scholars from two US universities. They also accessed data from technology' giant Baidu and telecom operators, reporting that a million citizens left Wuhan for other cities on 1 January 2020, with another seven million travelling in the following three weeks. The storytelling also combines data from the airline industry, reports on diagnosed cases from China, and diverse estimates by US scholars. The headline of the story authoritatively states, ‘How the virus got out’. The overarching narrative is marked by robust knowledge claims on how Wuhan citizens travelled and spread the coronavirus across China and elsewhere in the world for multiple weeks before travel restrictions came into force. The piece concludes, ‘But by then, the virus had a secure foothold. It continued to spread locally throughout parts of Seattle, New York City and across the country, once again outpacing efforts to stop it’. In the weeks that followed, the number of cases and deaths in the US grew exponentially. While data journalists are clear and transparent about sourcing, the only cues about uncertainty in the data and findings are phrases such as ‘estimates of’. Ultimately, the news piece takes an authoritative voice, masking uncertainties about the data. Other sources have reported on COVID-19 emerging from elsewhere than China, with the UN launching a well-resourced investigation into its origin and spread in May 2020.