Compiling a Data Set

Initial work on identifying texts that mentioned the BP Deepwater Horizon oil spill showed that there were many thousands in the Nexis database alone. My aim was to extract a greatly reduced text data set in a systematic way. I was potentially interested in both synchronic and diachronic patterns of representation—in other words, how were the events represented at a given time and did this change over time? One way of deriving a data set that would enable me to address both of these questions, while at the same time reducing the text numbers to a manageable level, was to look at all the texts on a number of single days. My first selected date was 27 April 2010. The reasoning behind this choice of one week after the Deepwater Horizon explosion was to allow reporting on the events to have become widespread across a number of publication types. Muralidharan, Dillistone, and Shin (2011) note that the first tweet and Flickr photo from BP were both posted on 27 April. However, the date chosen could equally well have been the day after the explosion, or three or four days later, as long as the crisis period was still in train. What I felt to be important, having chosen a preliminary date, was that subsequent data sets should be drawn from the same date in the years up to the time of analysis, that is, 27 April 2011 and 27 April 2012. This would generate texts from three days, each separated by one year. Using all the data from each date would allow for an exhaustive study of texts relating to the BP events within a narrowly specified time frame, although this does not entail that the dates are representative of general BP coverage.

Concurrently, in order to determine how many texts this time-based search would generate, I needed to define what constituted “texts relating to the BP events”. The main search term would be “BP” as there were no other viable candidates (the former name “British Petroleum” was superseded by “BP” in 1998, and is not used in news media texts). I used three additional terms that my own reading of newspaper and online reports suggested were reliably present each time the events were covered. These were “crisis”, “oil spill” and “disaster”. A search for any one of these three plus “BP” should, in my view, have found virtually all of the texts covering the BP events. I therefore carried out an initial search based on the following terms:

  • • BP AND
  • • “crisis” OR
  • • “oil spill” OR
  • • “disaster”

From my previous reading of news reports on the BP oil spill, I felt that these terms would return a near-complete set of texts; however, I was also aware of other references to the events that paired names such as “Macondo” (the company name for the well) and “Deepwater Horizon” (the name of the Transocean rig) with “oil spill” and so on, referring to exactly the same events. I wanted to test the possibility, particularly in shorter or later texts, that BP might not be a default descriptor. Further, if I was going to examine, for example, how the events were named, I needed to feel secure that I was not simply replaying the search terms I myself had defined. To address these issues, I performed an alternative search as follows:

  • • BP OR
  • • “Macondo” OR
  • • “Deepwater Horizon” AND
  • • “oil spill”

This alternative search produced almost identical lists to the original search, ratifying my choice of search terms.

Table 4.1 Sample of BP-related texts from Nexis UK database

Texts including search terms "BP" AND "crisis" OR "oil spill" OR "disaster"

Sample for depth analysis

27 April 2010


2 0

27 April 2011


2 0

27 April 2012


2 0

I cleaned the data by removing highly similar texts through the search filter and hand sorting the rest for exact duplication, retaining those where a proportion of the text was similar or the same, but not all. In many cases this partial duplication appeared to arise from the direct reproduction of the wording of press releases, which was an interesting point for investigation in itself. The result of these selection decisions was three separate data sets: 169 texts for 27 April 2010, 94 texts for 27 April 2011 and 31 texts for 27 April 2012, shown in the first column of Table 4.1.

I considered that the each full data set would contain a workable number of items for broad contextual analysis to be carried out, using quantitative methods (although any number-based findings from 2012 would need to be treated with caution as this is a small data set of 31 texts). The majority of my analysis would be qualitative, for which I judged a smaller sample was appropriate. I selected 20 texts from each data set for deeper qualitative analysis, using systematic random sampling, that is, choosing every nth text in the larger data set to give me 20 texts (this does not imply that my sample is random in any other way). So, for example, from 94 texts in 2011, I selected every fifth text plus the final text, which yielded 20 texts. The final sample is shown in the second column of Table 4.1.

< Prev   CONTENTS   Source   Next >