Exposure Health Informatics Ecosystem
University of Utah
Brigham Young University
Sneha Kasera, Scott Collingwood, Mollie Cummins, Julio C. Facelli, and Katherine Sward
University of Utah
Determinants of Health
While estimates of the exact contributions vary between studies, at least 50% of a person’s health can be attributed to their environment, lifestyle, and behavior (Centers for Disease Control and Prevention 2019, Tarlov 1999, McGinnis, Williams- Russo, and Knickman 2002, Choi and Sonin 2019). A study in England attributed 40% of all disease burden to identifiable risk factors and almost 75% of these to a combination of individuals’ environmental, behavioral, and metabolic profiles (Newton et al. 2015). About one in four deaths worldwide, and a similar proportion of deaths among children under five, are due to modifiable environmental factors (Priiss-Ustiin et al. 2016, Landrigan et al. 2018). These 12.6 million deaths are attributed to more than 100 different diseases (World Health Organization 2016a,b).
By itself, air pollution is linked to one in eight deaths globally (World Health Organization 2014, 2017). About two billion children live in areas exceeding the World Health Organization’s annual limits for fine particles (United Nations International Children’s Emergency Fund 2016). It is noted that 169,250 and 531,000 child deaths are attributable to ambient and household air pollution, respectively (World Health Organization 2017). The global social cost of air pollution is about $3 trillion per year (Erickson and Jennings 2017). Recent studies have provided strong evidence associating air quality with pediatric asthma (Pollock, Shi, and Gimbel 2017). At the same time, there is discussion about our environment being cleaner than ever before and the unprimed nature of our immune systems being responsible for disease manifestation (Richtel 2019). While it is reasonably well understood that continuous exposure to high levels of pollution is unhealthy, much less is known about health effects of low levels, intermittent exposure, and combinations of exposures. Studying the latter requires informatics infrastructures that can aggregate environmental and physiological data from multiple sources at high temporal and spatial resolutions. Research and development of such multi-scale and multi-model informatics infrastructure is just beginning; and in this chapter, we describe the requirement, design, and development activities undertaken to address these issues.
The Exposome and Its Generation
Comprehensive quantification of effects of the modern environment on health requires taking into account data from all contributing environmental exposures and how those exposures relate to health; this is termed the exposome, a complementary concept to the genome (Wild 2005, 2012). Measuring the exposome can span a lifetime of exposures starting from conception and includes endogenous processes within the body, biological responses of adaptation to environment, physiological manifestations of these responses, and socio-behavioral factors (Wild 2005, 2012). Generating exposomes at high resolution requires integration of data from wearable and stationary sensors, environmental monitors, personal activities, physiology, medication use and other clinical data, genomic and other biospecimen-derived, person-reported and computational models. Exposomic research is translational in nature, as the exposome includes direct biological pathway alterations, as well as mutagenic and epigenetic mechanisms of environmental influences on the phenome (Miller 2013, Miller and Jones 2014, Lioy and Weisel 2014, NIOSH 2018). The phenome, which is an individual’s state of well-being and disease, is a result of the interaction between a person’s genome and their expo- some. See Figure 16.1 for a holistic understanding of disease, integration of the exposome, genome, and other factors.
FIGURE 16.1 Holistic understanding of disease requires integration of the exposome with the genome with other biomedical data.
There is a need for understanding an individual’s total exposure including simultaneous, cumulative, and latent exposure to multiple environmental species on health (Pollock, Shi, and Gimbel 2017). We refer to any physical (e.g., temperature, humidity), chemical (e.g., particulate matter (PM), ozone), or biological (e.g., pollen, mold) environmental or physiological (e.g., breath rate, forced expiration volume) entity measured by a sensor as a species. Processes to support this aggregation and integration must accommodate variable spatio-temporal resolutions and account for multiple study, experimental and analytical designs. Gaps in measured data may need to be filled with modeled data along with characterization of uncertainties.
The air quality exposome is important to our improved understanding of pediatric asthma and other respiratory conditions (Pollock, Shi, and Gimbel 2017), cardiovascular disease (Lee, Kirn, and Lee 2014), cancers (Santibanez-Andrade et al. 2017), pregnancy (Leiser et al. 2019), suicide (Bakian et al. 2015, Gladka, Rymaszewska, and Zatoriski 2018), and its mechanistic role in damage to deoxyribonucleic acid (Bosco et al. 2018, Miri et al. 2019). It includes a combination of chemical (PM, ozone, and volatile organic compounds), biological (pollen, spores) and physical (temperature, humidity) environmental species. Studies involving the exposome can be observational, epidemiological, interventional, or mechanistic in nature (Rohrig et al. 2009).
The Pediatric Research Using Integrated Sensor Monitoring Systems Programs
The Pediatric Research using Integrated Sensor Monitoring Systems (PRISMS) program was launched in 2015 to develop a sensor-based, data-intensive infrastructure for measuring environmental, physiological, and behavioral factors for performing pediatric and adult epidemiological studies (https://www.nibib.nih.gov/research- funding/pediatric-research-using-integrated-sensor-monitoring-systems). PRISMS is administered by the National Institutes of Health (NIH) National Institute of Biomedical Imaging and Bioengineering (NIBIB). Figure 16.2 shows the various projects under the PRISMS program.
The University of Utah was funded through this program to identify informatics challenges and develop solutions to address them (http://prisms.bmi.utah. edu/). Recognizing that solving these challenges will require a wide range of perspectives, the Utah team is a diverse group of faculty, research staff, software developers, post-doctoral fellows, and graduate and undergraduate students from atmospheric science, bioengineering, biomedical informatics, chemical engineering, chemistry, clinical and translational science, computer science, electrical and computer engineering, industrial engineering, nursing, occupational health, pediatrics and pulmonary medicine.
Exposomic Research Challenges and Informatics Solutions
Exposomic research requires simultaneous measurement of many types of environmental, physiological, and behavioral factors using sensors. These measurements can be obtained using sensor technologies that are often novel and in various stages of development, evolving to capture measurements of novel species, with improvements in their sensitivity, performance, and validity in measuring different species, in their form factor so that they can be used in personal and mobile settings, and price. In addition, these sensors use diverse device communication protocols and require additional hardware and software modifications for using research studies and for secure data acquisition and transmittal.
Environmental species have spatial and temporal variations and humans are mobile and spend time at home, commuting, at work or school, and in recreation.
FIGURE 16.2 The PRISMS program and tasks being performed by the University of Utah.
Generation of comprehensive spatio-temporal records of exposures requires collection and integration of data from different types of sensors that might be available at different locations and times corresponding to the locations of the subject under consideration (Gouripeddi et al. 2017). For example, an air quality exposome may require the integration of data from indoor and mobile sensors, stationary regulatory monitors, citizen’s networks, and finally supplementation w'ith data from computational models to fill in the gaps when there is an absence of experimental data. All of these w'ould require appropriate spatio-temporal dimensions and resolution w'ith their absence often limiting the quality of studies and potentially leading to erroneous results (Gouripeddi et al. 2017).
Moreover, sensors used for measurements of exposures are not always collocated with the subject under consideration. The lack of proximity of the sensor to the subject leads to uncertainties when using their measurements as exact quantifications of exposures. In addition, sensors have varied capabilities, granularities and resolutions in measuring different environment species, wdiich need to be harmonized prior to analysis.
Total exposure research studies need to be performed across health conditions, age ranges, and sensor types and utilize heterogeneous data at multiple levels of granularities in their semantics and temporalities. Different translational research archetypes require different data, data transformations, data integration work- flows, and analytics to support observational and interventional study designs (Gouripeddi 2016).
Addressing these challenges in exposomic research requires an informatics architecture that embeds multiple features that are loosely coupled and interoperable (Sward et al. 2017, Martin Sanchez et al. 2014): 
would enable their proper use within data pipelines, designing appropriate studies, and performing apt analysis. These limitations and uncertainties can be captured and shared as metadata.
- 5. Generation of a high-resolution spatio-temporal grid of exposures: Exposures are intrinsically tied to location and time. Different sources of exposure data, sensors, and computational model need to be combined to generate a high-resolution grid of personal exposure. These sources could have different granularities and resolution, and their integration would need to support these heterogeneity.
- 6. Data integration: To support the above requirements, integration of these heterogeneous exposomic data would need to be semantically consistent (Habre et al. 2016) and metadata driven. In addition, the diversity of different objects represented in translational exposomic research require them to be integrated on their spatial and temporal dimensions. Representing data as events permits temporal analysis and reasoning around a diverse array of environmental measurements, physiological responses, and conditions.
An event-based infrastructure would support multi-scale and multi-omics integration.
- 7. Presentation and visualization: In order to make meaningful use of the data and processes in exposomic research, there is a need for acceptable and user-friendly interfaces for study participant and investigator interactions. These interfaces will provide feedback, allow participants to be provided instructions for interventions, and a means for participants to input additional requested data. Investigators will be able to manage study processes, assess ongoing data collections, and tailor interventions. There will likely be a need to have these presentation and visualization layer be person- centered and on mobile platforms.
- 8. Support a diverse set of translational research archetypes: The informatics infrastructure would need to support diverse study types, including observational, epidemiological, interventional, secondary analysis, and mechanistic study. In addition the infrastructure would need to enable reproducibility and transparency of study results with metadata to track data and process provenances.
-  Sensor data acquisition: The evolving nature of sensors requires a sensordata acquisition paradigm that is agnostic to the sensor and the type ofthe species it is measuring. In addition, acquiring these sensor data shouldaccommodate mobile and stationary devices that measure personal andambient environments. 2. Selection of heterogeneous data sources: Prospective studies require useof sensors that are well-matched for the purposes of the study. Secondaryanalyses require descriptions about sources and methods including types ofsensors used, to support appropriate analysis. In both cases, research teamsrequire metadata about the sensors and the data sources. 3. Computational modeling for filling gaps: It may not be possible to measureevery environmental variable at the desired temporal and spatial resolution, either due to availability and/or challenges with use of sensors, cost,privacy, number of sensors needed in large cohort studies, etc. Havingcomputational models to help fill gaps can provide substitutes for or augment sensor-measured environmental factors, activities, and locations ofindividuals. 4. Uncertainty characterization of data: Understanding limitations and dataquality of sensors, their measurements and, similarly, computational models