MOVING TO DATA-DRIVEN FOOD SAFETY
Major advances in many industries can be attributed to the convergence of multiple technological advances whose synergistic effects enable major transformation within that industry. Defined as the coming together of two or more disparate disciplines or technologies, convergence has been associated with advances from early in the industrial age—from firearms and sewing machines at the beginning of the 20th century to jet engines today. The fax revolution was produced by a convergence of telecommunications technology, optical scanning technology, and printing technology. Today fundamental shifts in our basic industries are emerging from a confluence of the internet and related communication technologies with technical advances in specific domains—including the food industry.
A fortuitous and simultaneous convergence of internet and communications technologies along with a new generation of sensors and analytical tools is reshaping the food industry—and its ability to reduce the risk of food contamination and resulting foodborne illness. The availability of low- cost sensors, scanners, and various mobile devices, along with new communications technologies linked to the internet, offers visibility across the food chain. When combined with data analytical tools capable of fusing extremely large quantities of data of different formats and extracting relevant information, these technologies are opening the door to real-time, end-to-end monitoring, and control of the movement of food products across the chain. And, as we will see later in this chapter, this convergence can be marshaled to make it possible to reduce the latencies in both detecting a food contamination event and responding to it.
The food supply chain starts at the farm and encompasses food transportation companies, processing facilities, distributors, retailers, brokers, importers, and governmental agencies responsible for overseeing and regulating the system—and ends at the consumer’s table. Given the large-scale and distributed nature of the food system, it can be viewed as a “system of systems” whose components are complex, heterogeneous, self-organizing networks of systems that operate independently but are ultimately integrated into a dynamic, evolving “organism” that expertly manages the continuous production, distribution, and sale of food. Bringing these stakeholder systems together into an efficient and effective food safety network has been the signature challenge of regulatory agencies such as the US Food and Drug Administration (FDA).
Across many of these food chains today, sensors and other hardware are able to record a wide range of parameters—from location of a pallet or even item of food to its temperature while in transit from farm to fork. These sensors provide a level of granularity that was not available previously. A sensor attached to a carton of New Zealand milk will record the swings in temperature that accompany that carton as it moves from the New Zealand dairy farm by truck to airplane hold and by truck to retailer in China or elsewhere in Asia. This information, alone, can assist in identifying milk that might have spoiled before it is placed on the grocer’s shelf. Temperature traces in route when combined with weather data, as well as shelf-life curves for that product, can also let retailers know what the remaining shelf-life is for that product.
In addition to preventing food spoilage and contamination, these new technologies are enabling better surveillance to determine the onset of food- borne illness. Although the specific authority varies from country to country, surveillance has typically been the purview of public health departments. Public health officials engage in surveillance activities to determine whether reported cases of foodborne illness are part of a large outbreak. Local public health departments are usually the first to pick up the signals of foodborne illness. These signals may correspond to isolated reports of illness or they may be causally linked and part of a larger outbreak. Or they may be uncorrelated and isolated cases that are not precursors of an emerging event.
When public health officials suspect a set of causally related cases, samples are sent to official laboratories such as the CDC for DNA “fingerprinting” to confirm that the illness is due to the same pathogen. Confirmation of the pathogenic source becomes the starting point for investigations by response teams to determine the specific food types that are responsible for the illness. Numerous delays occur in the surveillance and response processes. The promise of data science and big data is that these latencies can be reduced by timely fusion and interpretation of information on potential cases of foodborne illness.
Large amounts of data are already collected during the surveillance and response processes. What separates “big data” from “small data” in the food chain? Big data is distinguished by five characteristics referred to as the “5 Vs”—the volume, velocity, variety, veracity, and value of the data generation process. More data is being collected faster and in many different formats. The highly structured data that is typical of processing histories, shipment records, and lab reports is being augmented by data generated and/ or transmitted from many nontraditional sources including wireless sensors such as RFID, temperature, and chemical sensors that monitor ambient conditions during transport, and mobile technologies—as well as satellite images, real-time data collected by drones, text data from telephone hotline calls, electronic medical data, and even social media.
Except for highly sensored food chains, the volume of data currently collected across a food chain is not extremely large when compared with other industrial processes such as aerospace where voluminous data is reported by aircraft in flight to ground stations for analysis. Similarly, the velocity with which the data is gathered is not extremely high compared with other domains such as financial systems. However, in both the food and agriculture industries, there is a proliferation of data variety with different levels of value. To build the capabilities necessary for improved surveillance and response, data must be collected and combined from the multiple and heterogeneous sources listed previously. And with increasing numbers of sources, there is inevitably a data quality and confidence problem—so veracity is an issue as well.
The proliferation of multiple data systems and tools that lack interoperability hinders effective information gathering and timely response to emerging but yet unconfirmed foodborne illness. As already noted, most of the public health and food safety informatics work in the United States—from early detection of food-related outbreaks by local and state health departments to confirmation by the CDC through “fingerprinting” of pathogenic contaminants—takes place at different local, state, and federal jurisdictional levels causing significant delays that have significant cost in terms of lives and dollars. A data-driven approach to food safety would reduce these latencies by bringing together: (1) traditional and new nontraditional data sources across all stakeholders in the food safety network; (2) new information and communication technologies for fusing and interpreting this data; and (3) new informatics and visualization tools capable of extracting knowledge that establishes “evidence” that can be used effectively by all the stakeholders across the food chain.