Exposure Health Informatics Ecosystem
In order to meet the diverse requirements listed above, we are developing a scalable informatics infrastructure, Exposure Health Informatics Ecosystem (EHIE) (Sward and Facelli 2016, Sward et al. 2017, Gouripeddi et al. 2019b) following an ecosystemic approach. An ecosystem is a collection of loosely coupled software and hardware platforms that co-evolve, interact with one another and with human actors to serve a common business need (i.e., research, in this case), by maintaining symbiotic operational relationships with each other through exchange of data, metadata, knowledge, and process artifacts (Messerschmitt and Szyperski 2003, Jansen, Finkelstein, and Brinkkemper 2009, Lungu 2009, Popp and Meyer 2010,
Jansen, Brinkkemper, and Cusumano 2013, Bala Iyer 2014). Adopting this approach enables researchers to sustain healthy large-scale infrastructures, by having a diversity in tasks performed by different actors within the ecosystem (Manikas and Hansen 2013). In addition, having this diversification in niche components simplifies the management and evolution of the ecosystem as a whole, as each component has its own development cycle managed by a team of experts (Dittrich 2014). Modifications from a specific component can be independently scaled as needed for a particular research use case. Similarly, operational use of the ecosystem paradigm for research studies would have support from several actors with appropriate expertise together providing greater value than on their own (Wnuk et al. 2014).
EHIE is derived from the federally funded National Institutes of Health’s (NIH) National Institute of Biomedical Imaging and Bioengineering (NIBIB) Pediatric Research Using Integrated Sensor Monitoring Systems (PRISMS) program (Sward et al. 2016). EHIE addresses the above list of exposomic research challenges by providing informatics solutions at scale, incorporating the latest Big Data approaches. The infrastructure is a comprehensive, standards-based, open-source informatics platform that provides semantically consistent, metadata-driven, event-based management of exposomic data. Using an event-driven architecture allows the modeling and storage of all activities related to the study itself and its operations in their primitive form on a timeline as events that can be transformed to higher/analytical models based on use cases. Moreover, its implementation using advanced graph and document store technologies limits semantic dissonance and enables the use of novel Big Data approaches in a natural way. EHIE is aligned with the goals of modern environmental health research supporting meaningful integration of sensor and biomedical data (National Institute of Environmental Health Sciences 2012, Barksdale Boyle et al. 2015, National Institute of Environmental Health Sciences 2018). See Figure 16.3 for an overview of EHIE informatics ecosystem.
Conceptually, all the evolving software and hardware artifacts within EHIE can be grouped into the following components: 
FIGURE 16.3 Exposure Health Informatics Ecosystem (EHIE) and its main components.
5. Central Big Data federation/integration platform: Standards-based, open- access infrastructure that integrates measured and computationally modeled data with biomedical information along with characterizing uncertainties associated with using these data.
In the following sections, we describe key features of each of these components.
Data Acquisition Pipeline
Current Internet-of-Things (IoT) solutions are not necessarily designed for health research. Systems are not designed for large study-based deployments wherein the cost and resources required for management of IoT sensors exceeds the cost of the sensors themselves. Research solutions are required to be compliant with pertinent privacy laws applicable at different jurisdictions (e.g., deployment site(s), study site, and/or study sponsor location) for data transmission (Luxton, Kayl, and
Mishkind 2012) and storage. While IoT sensors provide low-cost and smart solutions to measure study participants’ environments, they usually use custom software and hardware, require regular maintenance, and have data integrity problems. We, therefore, needed to design an open-source platform that is customizable to different sensors, study designs, and participant requirements. Such a platform would need to have a short deployment time, provide high-quality data, and stream data in real time enabling control loops of feedback and interventions.
In order to meet these needs, we developed a multi-pronged approach for data acquisition. We developed EpiFi (Figure 16.4) (Lundrigan et al. 2018) to overcome these limitations and extend the use of off-the-shelf sensor technologies as IoT solutions for health research. In addition, we developed methods and processes for sensors that can directly transmit data to data acquisition servers, using protocols such as the Message Queuing Telemetry Transport (MQTT) (Hunkeler, Truong, and Stanford-Clark 2008) or HTTP/HTTPS. Our collaborators at Columbia University have adopted AethLabs sensors (www.aethlabs.com) to use these protocols to measure and transmit measurements for PM composition, black carbon, temperature and relative humidity, accel- erometry and volatile organic compounds levels (Cox et al. 2019). In order to help investigators choose an appropriate approach, we developed a framework that considers the type of sensors and their transmission, study design, and participant involvement (Tiase et al. 2018). EpiFi provides flexibility in using existing participant home infrastructure and accommodates participant-in-the-loop study designs.
EpiFi brings IoT to health research by providing robustness to consumer applications needed by different study designs. It allows researchers, participants and their families, and clinicians to process data in real time. It simplifies the process of IoT deployment and management in hundreds of participant homes as might be needed in clinical studies. EpiFi consists of a small single-board computer (i.e., Raspberry Pi) gateway and open-source Home Assistant home automation platform (Home Assistant 2019), with custom code to address challenges of using sensors for research data acquisition. It has means to reliably transfer to a remote database using a home WiFi router
FIGURE 16.4 Overview of EpiFi.
and local storage that can act as a buffer when transmission to the remote database is not available or required. The system architecture of EpiFi is shown in Figure 16.5.
EpiFi (Lundrigan et al. 2018) supports multiple features that make it appropriate for use as an IoT solution in clinical studies:
- 1. Device observability: Allows a remote study manager to know if a WiFi device is functioning or not. It distinguishes between WiFi disruptions and other types of disruptions, so that appropriate troubleshooting can be performed.
- 2. Secure WiFi bootstrapping: Allows secure bootstrapping of WiFi connectivity of multiple devices by making the gateway a temporary access point.
By overloading the use of source and destination addresses of an Ethernet frame, the Secure Transfer of Association Protocol (STRAP) (Lundrigan, Kasera, and Patwari 2018) allows a trusted device on the network to send data to unconnected WiFi devices (Figure 16.6). This protocol addresses the challenges of securely connecting new sensors within a home. It protects against eavesdroppers, modified messages, replay attacks, and rogue access point attacks. STRAP also reduces deployment time by needing the entry home WiFi credentials only on EpiFi and eliminating the need of entry by each individual sensor.
FIGURE 16.5 Architecture of EpiFi.
FIGURE 16.6 The Secure Transfer of Association Protocol (STRAP).
- 3. Secure sensor reuse: Tracking of sensors when their location changes and management of backlogged data on sensors. Sensors learn their locations based on network characteristics. A change in network characteristics indicates that the sensor is now in a new location, which then sets off processes to update deployment metadata. Also, each location has a key that is used to encrypt data, which prevents backlogged data from being read by a person at a different location.
- 4. Study management tools: Provides a presentation layer to support a diverse range of tools for study management (Figure 16.5). Integrated bi-directional communications with the gateway device help with remote management and troubleshooting potential sources of signal disruption and apply fixes. We currently use the following study management tools for the PRISMS pilot study (Collingwood et al. 2018, Gouripeddi et al. 2019c):
a. Deployment status page: Provides the status of various deployments.
b. Export tool: Export streaming data in various formats for ad hoc analysis.
c. Grafana (2019): Streaming data visualization, monitoring, and analytics.
- 5. Support of multiple wireless protocols: Supports among other cellular, Z-Wave (Yassein, Mardini, and Khalil 2016), ZigBee (Farahani 2011), LoRa (Lee and Ke 2018), and HaLow (802.1 lah) (Adame et al. 2014).
- 6. Data integrity: Prevents data loss arising due to packet losses, gateway outages, home WiFi router outages, and internet outages by persisting data at every opportunity, deleting persisted data only after acknowledgment of receipt from remote storage, and sending multiple data packets when backlogged.
EpiFi is currently deployed for pilot studies at Utah for facilitating acquisition of data from: 1 
2. Bluetooth Low Energy (BLE) sensors
a. Wearable air quality sensor from George Washington University (Li et al. 2019): Nitrogen dioxide (NO,), ozone (03), ambient temperature, formaldehyde, other aldehydes, and relative humidity.
b. Wearable air quality sensor from Arizona State University (Wang and Tao 2017): 0„ volatile organic compounds (VOCs), ambient temperature, relative humidity, accelerometry, nitrogen oxides (NOx), formaldehyde (CH,0), and PM.
c. Wearable device from the University of Maryland (Chatterjee et al. 2014, Kukkapalli et al. 2016): PM, temperature, transcutaneous partial carbon dioxide (CO,) pressure, and respiratory rate.
EpiFi has been evaluated in different types of deployment designs, including:
- 1. High-resolution air sensing (Min et al. 2018): EpiFi acquired data from eight UMDs and AirUs deployed indoors and outdoors, respectively, to create a profile of air quality within a home due to various activities. For example, one of our findings for a home under study showed that, while the furnace fan rapidly improves PM levels in the kitchen, there were shortterm increases in PM in other rooms.
- 2. Automation of interventions (Min et al. 2018): We demonstrated that EpiFi can be integrated into home automated control systems, such as a furnace fan, via an Ecobee thermostat which triggers the furnace fan to switch on when PM levels crossed a present threshold measured by UMDs. This smart control of the furnace fan led to a 70% reduction in power consumption when compared to periodically turning it on.
- 3. Requisition of clinical status, feedback, and activity annotation (Collingwood et al. 2018): Using EpiFi, we were able to send text notifications to participants when specific thresholds of PM levels were crossed, to acquire participant clinical status, feedback, and log the activities they were performing.
- 4. Acquisition heterogeneous sensor data from participant homes: Including motion sensors, door sensors, tracking smartphones, participant locations, smart light bulbs, WiFi usage, temperature, humidity, energy meters, and any other commercial IoT device to get a sense of participant activities.
Data from EpiFi is stored remotely in a time-series database (Figure 16.5), i.e., InfluxDB (InfluxData 2019). Metadata about the deployments is authored through a graphical user interface into a deployment metadata repository (DMDR) that has been instantiated in a MongoDB database (MongoDB 2019). Together these two stores, along with a set of Software Services (SS), support the presentation layer which provides displays that can be used by participants, researchers, and additional administrative tools. A tracking page was developed to provide the real-time health status of each deployed sensor allowing the administrative team to detect, analyze, and troubleshoot issues in various deployments using established procedures and protocols (Figure 16.7).
FIGURE 16.7 Example troubleshooting protocol.
In addition to supporting the presentation layer, data and metadata from the time- series database and the DMDR are consumed by the SS of the data federation and integration component for assimilation, generation of exposure records, and study analysis. Software details on EpiFi are available in Lundrigan (2019). We are currently evaluating blockchain approaches for systematically capturing the versioning of sensor deployment metadata as sensors go through life-cycles of deployment and maintenance and to support robust provenance of data arising from these sensors (Sarbhai et al. 2019). Moreover, implementation of blockchain technology will allow us to provide much higher control of data access.
-  Data acquisition pipeline: Hardware and software tools, wireless networking, and protocols to support easy sensor system deployment and robustsensor data collection. 2. Participant-facing tools: Collect and annotate a variety of patient-reportedand activity data, as well as inform and provide feedback to study participants on their current clinical and environmental exposure status. 3. Researcher-facing platforms: Tools and processes for researchersundertaking exposomic studies for a variety of experimental designs or forclinical care. 4. Computational modeling platform: Generate comprehensive spatio-temporal data in the absence of measurements and for recognition ofactivity signatures from sensor measurements.
-  WiFi sensors a. Utah Modified Dylos (UMD) (Min et al. 2018, Vercellino et al. 2018,Collingwood et al. 2019): PM as PM2.5 (i.e., PM which is 2.5 pm andsmaller) and PM 10 (i.e., 10 micrometers and smaller), temperature, andhumidity. b. AirU (Kelly et al. 2017): PM.