Section II: Advances in Toxicology and the -Omics

0 Personalizing

Personalizing Environmental Health for the Military—Striving for Precision

Christopher E. Bradburne

Johns Hopkins University

The Need for Precision in Military Environmental Health

Military service members see a broad range of operating environments, with a range of known and unknown chemical and immunological challenges during deployments. High-profile chronic health conditions have been associated with toxic agents, such as Agent Orange in Vietnam, but there are also those with no clear cause, such as Gulf War Syndrome. These cases did more than just cause health issues—they also resulted in enormous costs, lost time and productivity, and the degradation of trust in the military to protect its own. Better understanding of causes and effects could have provided better options for preventive medicine, personal protection, and other means of risk mitigation. Likewise, better understanding of individual susceptibilities, proximity and duration of exposures, and extenuating factors could have provided additional avenues to protect those most vulnerable.

Current advances in both environmental health and precision medicine offer some intriguing possibilities. A goal of the environmental/occupational health field is to identify health risks and minimize their impact on workers. Goals include identifying, characterizing, and mitigating threats; implementing preventive and protective strategies, detection with health surveillance and diagnostic tools; and providing treatment where needed. Scientific approaches have historically focused on reducing threats to single chemical causes, using established epidemiological tools population statistics to predict individual risks. While useful for sources and exposures of high or sustained effect, these approaches can collapse for exposures w'ith more moderate effect and/or multifactorial causes and when trying to estimate individual risks and susceptibilities. It is difficult to overstate the need for better individual tracking and susceptibility characterization to environmental threats; generally, the more that is known about the source and the individual, the better (Figure 6.1). Knowledge of ‘personalized’ susceptibility and individual exposures can illuminate health threats that might otherwise be invisible in population health data. Such information can identify toxicological sources and health effect modifiers and provide more precise actionable information for decision makers, health care providers, and researchers.

Most strategies for connecting active health effects with sources are retrospective. For example, for an acute exposure that results in a health effect, an epidemiological investigation may involve ‘tracing back in time’ an individual’s geo-temporal activities to co-localize them w'ith a source. Similar actions may be taken for chronic exposures, but it may be years before associations and other affected individuals are identified, if ever. Characterizing the threat from toxicological sources would benefit from better tools which allow precise measurements of components, geographic outlay, functionalization, and persistence. For the individual, better understanding of genetic susceptibility, individual risk factors, geographic proximity, and

‘Knowledge is power’

FIGURE 6.1 ‘Knowledge is power’: The more known about a hazard source, the individual, and their overall interaction, the better decisions can be made by individuals, decision makers, and stakeholders.

Temporal nature of tools and actions for precision

FIGURE 6.2 Temporal nature of tools and actions for precision (genomic) environmental health. Actionable information for precision environmental health is developed using retrospective investigation techniques and a constantly changing genomic toolset. For population studies, GWAS done with earlier genomic technologies may miss variants that are predictive of susceptibility to an exposure event. For individual exposures, genomic tools used for detection of exposure-linked adducts or biomarkers may be inadequate.

duration of exposure would also be useful. Many studies have been done to associate toxicological health effects in populations and associate them with genetic markers. However, the analytic utility (i.e., the ability of the technology to detect true genomic markers) can shift over time as the technologies shift in their ability measure sources of genetic variation (Figure 6.2).

One particular challenge is dealing with the constant change in technology and interpretation of impacts for precision medicine. The genomics and other ‘omics’ technologies present an array of new tools and techniques to characterize individual susceptibility and a range of biomarkers (Bradburne & Lewis 2018). However, interpretation can change over time as tools improve, meaning that an understanding of ‘where we have been’ is just as important as ‘where we are going’. This chapter will attempt to shed light on the history and trajectory of precision (genomic) medicine tools for determining genetic susceptibility to environmental toxicants and review a framework for how comprehensive risk outlooks can be developed which combine genomic and environmental diagnostic approaches.

From Past to Present: The Changing Landscape of Precision Medicine

Technology and Medicine

Technology is constantly changing over time, which can create both excitement and frustration in a technology-dependent field, such as medicine. An example is the genomic tool evolution over the past few decades that has created the field of ‘precision medicine’.

Precision medicine is defined by the American College of Medical Genetics (ACMG) board of directors (Adler & Stead 2015) as an approach to disease treatment and prevention that seeks to maximize effectiveness by taking into account individual variability in genes, environment, and lifestyle’.

It has at its core the sub-field of genomic medicine.

Genomic Medicine is defined by the Clinical Pharmacogenetics Implementation Consortium (CPIC) (National Academies of Sciences 2014) as ‘... the use of genomic information and technologies to determine disease risk and predisposition, diagnosis and prognosis, and the selection and prioritization of therapeutic options’.

Neither term would have been coined without the emergence of genomic typing and sequencing tools. But as these tools have emerged and improved, how have they changed medicine, and how can we tell where we are at any given time in the medical landscape? Does a genomic test from 2010 provide the same efficacy as a genomic test from 2020?

Genetics versus Genomics

Historically, the field of genetics is very different than the newer field of genomics. Genetics started >150 years ago with Gregor Mendel, who described patterns of inheritance that followed simple mathematic rules. These ‘Mendelian traits’ are described as phenotypes that can be localized to individual genetic loci. However, the majority of common traits in higher organisms and humans are complex (e.g., tied to more than one locus) and do not follow Mendelian patterns of inheritance. For example, most chronic diseases come about through some level of inherited genetics and environmental exposures. Therefore, the field of genetics has evolved to rely on heavy mathematical inference, with little understanding of individual molecular mechanisms. The heyday of quantitative genetics was in the 1900s-1930s, when statisticians such as Ronald Fisher described quantitative descriptions of inheritance that are still used to this day in agriculture and animal breeding, such as linear mixed models, the infinitesimal model, and others (Bradburne & Lewis 2017).

In the 1990s, the new field of genomics offered to change that paradigm. Genomics emerged with the advent of the human genome project, which in 2001 generated a draft sequence of one person (Bradburne et al. 2015) and later went on to characterize population genetic variation between tens of global human ethnic populations. The mapping of the human genome and the contrasting of individual differences offered to provide the ‘roadmap’ to understanding molecular mechanisms, by tying individual genomic variants or groups of variants to traits and diseases. The workhorse of tying genetic loci to traits was the Genome-Wide Association Study (GWAS), which genotyped control and trait/disease groups and looked to identify genetic variants that could be statistically associated with the trait/disease. By 2011, there had been ~1400 GWAS on 380 traits and diseases (De Castro et al. 2016). The National Human Genome Research Institute (NHGRI) published a strategic roadmap the same year that predicted impactful contributions from basic human genomic sciences to begin to significantly advance the science of medicine and improve the effectiveness of health care over the next 10 years (De Castro et al. 2016).

Genomics and the Overpromise of GWAS

While the advancements have been exciting, there have also been unrealized expectations. From 2001 to 2015, the primary workhorse of genotypic variation has not been genomic sequencing, but rather, single-nucleotide polymorphism (SNP) genotyping arrays. SNP genotyping arrays are an older technology (developed between the late 1990s and early 2000s) and have generated almost all of the hundreds of GWAS-defined trait associations curated by the NHGRI and the European Bioinformatics Institute (EBI) (US National Library of Medicine 2019). However, SNPs are not very predictive for most traits and phenotypes. There are several reasons for this: (1) SNPs do not represent all of the variations of the human genome. In fact, they are less than half of the variations by most estimates (National Human Genome Research Institute 2019). Other forms of genomic variation include insertions and deletions (INDELS), copy number variants (CNVs), segmental duplications, and others. These ‘structural variants’ are not readily assayable by standard SNP array technologies. (2) SNPs do not account for much of the heritability of most complex traits (Fisher 1918). The highest estimated heritability that can be explained by SNPs for a non-Mendelian, complex disease is age-related macular degeneration (~50% from 5 SNPs), w'hich has made it a prime candidate for several focused gene therapy treatments currently underway (Venter et al. 2001), but most other complex diseases are not this straightforward. (3) Genetics (SNPs, etc.) does not account for the majority of the heritability of most human chronic diseases (Green & Guyer 2011). In fact, most have a larger environmental, causative component.

Sequencing and ‘Omics’ Technology Advancements

It is important to note the issues with genotyping technologies over the past two decades and how these affect our current practice of genomic medicine. The majority of the thousands of GWAS have been done with SNP arrays, which means they did NOT look for non-SNP variation such as INDELS, CNVs, and other structural variants. Interestingly, most commercial SNP genotyping since the 2000s did include array-based CNV typing capabilities, but because of the variability in CNV morphologies in vivo, these have not been useful. In fact, curation of CNV variant calls generated from these chips over the last two decades was discontinued by the National Human Genome Research Institute-European Molecular Biology Laboratory (NHGRI-EMBL) SNP variation databases in early 2018.

The original draft human genome from 2001 represents a higher quality construct than most of the individual genomes that came after it, as it was done by an accurate but very low'-throughput and painstaking technology called Sanger sequencing. High-throughput Illumina sequencing technologies emerged in 2007 and have controlled most of the genomic sequencing market (~90%) from 2010 to 2019. However, this technology relies on breaking up genomes into pieces between 50 and 300 base pairs (bps), in order to sequence and requires large in silico resources and complex approaches to reconstruct the genome. Since much of the human genome is comprised of low-complexity regions (e.g., two base pair repeats such as ATATATATATAT...), a typical moderate-quality human genome construct only comprises 20%-30% of the actual human sequence. A related approach, exome sequencing, provides even less—typically targeting only the expressed regions of the human genome, or about l%-2%. As most SNPs associated with complex diseases and traits are NOT causative and not in the expressed protein regions (Buniello et al. 2019), it is reasonable to assume that missing variation in structural variants may also be an important piece of the puzzle, and those are still largely uncategorized in global populations and in disease phenotypes.

More recently, new and disruptive ‘long-read’ technologies have emerged, highlighted by the Oxford Nanopore (OxNan) single-molecule sequencers. These sequencers can read thousands, to hundreds of thousands bps, alleviating much of the shortfalls of the in silico reconstruction. Quality has lagged (typically Q10 or less, or 1 error in 100 bps), so resolving SNPs has been difficult. However, the newest upcoming platform forecasts read qualities comparable to Illumina (

The ‘Great GWAS Do-over’?

As older GWAS have been done with early, less capable platforms, this implies that the DNA measurements of many GWAS may need to be redone in order to obtain structural variants and SNPs in low-complexity genomic regions. Prime candidates for reassessment would be studies in which the sum of the statistically significant SNPs does not reach the level of known genetic heritability measured in monozygotic twins. An example would be type 2 diabetes. This complex trait has ~42 SNPs that have been associated with it through GWAS. Interestingly, most SNP risk is cumulative (Manolio et al. 2009). Essentially, one can potentially sum the risk (i.e., odds ratio) of each SNP associated with a complex trait or disease and compare to the level of heritability as measured by H1 = Var (G)/Var (P), where H = Heritability, G = Genetics, and P = Phenotype. The difference between the two may provide a measure of how much variability is remaining that is heritable but has not been accurately measured by SNP arrays. In other words, this ‘missing heritability’ (Fisher 1918) could very likely be comprised of structural variants missed by SNP arrays, which could be measured by OxNan and other long-read sequence typing technologies.

Of final note is the advancement in computational power to be able to assess genomic information and find variants or more complex multi-locus haplotypes. Because of the significant in silico resources required for genomic bioinformatics, most analysis approaches and algorithms are designed to sacrifice rigor for speed and memory conservation (Chen et al. 2010). Analysis can become very computationally intensive when doing pairwise epistatic computations of just SNPs. As an example, a four-way combination of a modest collection of 30 million SNPs could take 2.4x 1020 CPU hours per phenotype. The advent of ultra-high-speed computing, such as the Summit supercomputer at Oak Ridge National Labs (ORNL), provides the opportunity to more robustly evaluate the data. A team led by Dan Jacobson at ORNL has pioneered the use of Ricker wavelets to comprehensively mine large genomic datasets for new features. Briefly, Ricker wavelets provide a coefficient for every scale and translation of a genomic dataset (Rappaport 2016) by a brute force evaluation of all possible permutations of the data. As greater computational power allows more robust analyses of genomic data, such as Ricker wavelets, more features may be discerned in higher complexity genomic data to allow new associations to be made. This will become increasingly important as more and more layers from other ‘omics’ technologies are added to complex trait phenotypes (Raychaudhuri et al. 2009).

Incorporating the Changing Landscape of Genomics into the Clinic over Time

Ultimately, the utility of genomic information will change over time as genomic assessment improves, variation studies are expanded, population traits and disease cohorts are better characterized, and clinical interpretation is improved. Figure 6.3 shows the continuum of improving laboratory characterization of individual genomes and how that will be reflected and assessed in downstream interpretation and clinical reporting. Assessment of the utility of genomic information can be divided into three categories:

  • 1. Analytic validity. Does the test for the allele work in the laboratory?
  • 2. Clinical validity: Does the test for the allele work and provide actionable information in the clinic?
  • 3. Clinical utility: Does providing the results in the clinic have a net positive benefit?

Across each of these areas, health informatics is a foundation providing storage and analyses environments, interoperability, and caching and retrieval in medical records. Several large university health systems, such as Vanderbilt and Johns Hopkins, are beginning to establish health informatics data analytics capabilities encompassing all of these areas (e.g., electronic health records—EHRs, ‘omics’, and others) to perform corporate and academic assessment of all patient data. This application of data science tools to health system data at scale will likely be important to clinical utility assessment in the future.

 
Source
< Prev   CONTENTS   Source   Next >