Uncovering the history of intestinal host–microbiome interactions through vertebrate comparative genomics
Colin R. Lickwar and John F. Rawls
In all forms of life on Earth, the genome encodes the necessary information to build and sustain that life form across generations, including its sustained interactions with other life forms (symbioses). From a scientific perspective, the genome also provides a relatively reliable record of the processes of adaptation and selection that organism has undergone over the course of its natural history. In other words, each organism’s genome serves not only an instruction manual for that lifeform, but also as a historical record of that organismal lineage including its symbioses.
Natural selection acts at all levels of biological complexity—from gene content and genomic functions; to protein, organellar, cellular, tissue, and organismal functions; to symbiotic, community, and ecosystem functions (Ley et al. 2006). Natural selection at each of these levels is intrinsically interconnected with all other levels and applies to microorganisms and macroorganisms alike. Because of these interconnections, selective pressures acting at any of these levels can be expected to impact upon genomic content, structure, and function over evolutionary timescales. The tools of genome science therefore hold tremendous potential to advance our understanding of the different levels at which natural selection has shaped the natural history of organismal lineages, including their symbioses.
Indeed, advances in genome science have already facilitated dramatic advancements in our understanding of the diversity of microbial life on Earth, as well as the symbioses in which microbes engage with each other or animal and plant hosts. By coupling genomic approaches with experimental microbial manipulations (e.g., gnotobiotics, antibiotics, probiotics, prebiotics), powerful new insights have also been gained into the phenotypic consequences of host-microbe symbioses. In many cases, host-microbe symbioses have been found to contribute significantly to emergent traits that could alter fitness and thereby natural selection of the organisms involved (McFall-Ngai et al. 2013). This perspective has led to the concept that all organisms engaged in a given symbiotic relationship can be treated as a single discrete level of biological complexity, called a “holobiont” (Mindell 1992; Kutschera 2018). This concept has been further extended to the collective genomes of all organisms involved in a symbiosis, or a “hologenome” (Zilber-Rosenberg and Rosenberg 2008; Bordenstein and Theis 2015). On one hand, these holobiont/hologenome terms have been used respectively to simply describe the aggregate organisms and their genomes that are engaged in a given symbiotic relationship. On the other hand, and more contentiously, some scientists have augmented the operational definitions of the holobiont/hologenome terms to suggest that these levels of biological complexity act as distinct units of natural selection (Roughgarden et al. 2018). The debates that have ensued are important for the field, and seem to have revolved around three issues: (1) that the term "unit of selection” can be defined in different ways; (2) that host-microbe symbioses display a wide range of duration and fidelity of transmission; and (3) that natural selection acts at all levels of biological complexity, w'ith the holobiont/hologenome representing just one of those potential levels (Moran and Sloan 2015; Douglas and Werren 2016; Roughgarden et al. 2018). This chapter will explore what genome science can contribute to this ongoing discussion and our larger understanding of host-microbe symbiosis.
Genome science offers diverse opportunities to explore the mechanisms by which organisms interact with symbiotic partners. Symbiotic relationships may be mediated by essentially any form of communication between symbiotic partners, including production of a signal by one partner, and the ability of the other partner to perceive, interpret, and respond to the signal. These symbiotic signals can take many different forms, ranging from biosynthetic, nutritional, chemical, and physical, with an equally diverse array of signal response mechanisms. If an organism has engaged in the production of, or response to, a symbiotic signal over the course of their natural history, then there should be evidence of that symbiotic signaling stored within their genome. Genomic evidence of a symbiotic signal could consist of specific gene structure or content, or aspects of genome organization or function that are specifically involved in production or reception of symbiotic signals. Genomic analysis can be used to better understand how an identified symbiotic signal is produced or perceived, but it can also be used as a discovery platform to identify new symbiosis signaling mechanisms.
A history of symbiotic interactions captured within microbial and host genomes
Ultimately, any enduring symbiotic relationship should manifest in attributable changes to the primary genomic DNA sequences of at least one organism within that symbiosis, including alterations, gain, loss, or transfer of DNA sequence. Attributable changes may also include differences in the abundance of a given microbe and its genome within a population. Important differences in the organization and evolution of multicellular eukaryotic genomes and primarily unicellular prokaryotic genomes require different strategies to identify evidence of symbiotic relationships (Koonin and Wolf 20Ю). This is due to significant differences in the lifecycles and properties of DNA sequence evolution in multicellular eukaryotes and prokaryotes. Whereas animals are colonized inexorably by microbiomes in each generation, the microbial lineages found in association with a given animal host may have other host-associated or environmental niches. Microbiome composition in some animals is indicative of a long-standing symbiotic relationship with a “core microbiome” consisting partly of the same microbial lineages across multiple generations. However, substantial microbiome variation can also occur between individual animal hosts and a large proportion of microbes are not necessarily shared across individuals in a particular species (Qin et al. 2010; Roeselers et al. 2011; Human Microbiome Project 2012; Hacquard et al. 2015; Adair and Douglas 2017). Furthermore, microbiome compositions can change as a function of age, developmental stage, environment, diet, genotype, and disease (Yatsunenko et al. 2012; Hacquard et al. 2015). Particularly complicating is that different host species can share or effectively inherit microbial species with no clear temporal boundaries, at times interacting directly through predation, shared environments, or coprophagia (Rosenberg and Zilber-Rosenberg 2016; Moeller et al. 2018). These behaviors substantially complicate the identification of the origin of selective pressures, and where exactly to look for evidence of symbiotic signals in microbial genomes. In this case, different host-associated microbes can contribute similar or identical functions, distributing selective pressures across multiple microorganisms (Louca et al. 2018), or transmitting function through horizontal gene transfer (HGT) (Smillie et al. 2011) or through gain/loss of plasmid or chromosomal DNA. This property of the microbiome presumably allows for relatively transient microbiome members or gene products to contribute within the symbiosis, but not be easily quantified (Booth et al. 2016). Together, these aspects of microbial ecology and DNA sequence evolution can still make it difficult to understand heredity of microbial lineages over evolutionary timescales and also to attribute symbiotic signals in microbial genomes to a particular unit of selection such as a hologenome, so much so that the very premise of a hologenome has been questioned (Moran and Sloan 2015; Douglas and Werren 2016).
Here, we will review how to detect evidence of symbiotic signals within the host genome at three levels of biological complexity—coding genome content, transcriptional responses, and cis-regulatory mechanisms. Though these concepts apply to symbioses broadly, we will focus our remarks here on the most salient form of symbiosis that occurs across bilaterian animals—that between animal hosts and the microbial communities that reside in their digestive tracts. We posit that focusing on host genomics in tissues like the intestine that are in direct contact with microbial communities will enrich for evidence of unique symbiotic signals and opportunities we can use to understand broader genomic bases of host-microbe symbioses, and the levels at which natural selection acts on upon symbiotic organisms. We will also highlight the utility of comparative genomic approaches to discern not just the symbiotic signals that exist in extant animals, but also those that have been conserved during animal evolution.
Capturing symbiotic signals within coding regions of the host genome
Substantial efforts have been directed towards interpreting conserved and evolving coding regions of transcripts in eukaryotic and prokaryotic genomes (Facco et al. 2019). The level of DNA conservation at coding regions is typically high relative to other genomic regions due to the presence of large, often syntenic protein domains with strings of amino acids specified by codons (Zheng et al. 2011). Compared to prokaryotes, rates of horizontal gene transfer in animals is very low, but have contributed to important innovations such as the evolution of placental mammals (Dupressoir et al. 2012; Boto 2014). Identification of coding regions w'ith known functions, including facilitating symbiotic interactions, and interpreting the rate, location, or functional nature of changes in those domains across species is a critical component of understanding symbiotic relationships and phylogeny (Rosenberg and Zilber-Rosenberg 2016; Adrian et al. 2019). However, because coding regions in multicellular eukaryotic hosts are typically used by many cell types in number of contexts difficult to quantify, it is often challenging to attribute a particular change in a coding region as indicative of a host-microbe interaction, or if it is relevant in a particular tissue. However, important context can be gained from host proteins that directly interact w'ith microbial proteins, components, or signals. For example. Toll-like receptors (TLRs), a well-described class of proteins that can directly recognize microbial products, display both substantial conservation in detecting microbial- associated molecular patterns (MAMPs) as well as expansion and evolution of additional family members across species (Roach et al. 2005; Li et al. 2017). TLR family members and components of their dowmstream signaling pathways are found in the genomes of basal bilaterian animals where they mediate recognition of microbial products as well as developmental processes (Tassia et al. 2017; Brennan and Gilmore 2018). Variation at TLR gene loci within a single species can also contribute to risk of infectious and inflammatory diseases (Mukherjee et al. 2019). However, TLR proteins can be expressed by different cells in different tissues at different times to serve different functions in recognizing pathogenic as well as commensal microbes (Abreu 2010). Similar challenges apply to antimicrobial proteins (AMPs), another salient class of proteins that appear to have evolved to assist animals in their encounters with microbes (Tennessen 2005).
The nature of host-microbe interactions also inherently encompasses greater conceptual complexity than simple direct protein domain functions or interactions. For example, coding regions do not inherently provide information on the amount, dynamics, or specificity of transcription of protein coding genes. Furthermore, in eukaryotes, transcript isoform usage or alternative splicing dramatically increases the number of unique transcriptional units that are utilized, but this information is not easily discerned strictly from coding region sequence. As a result, there are strengths but also significant limitations in detecting the origin or nature of the symbiotic signals that shape coding region variation in host genomes. Therefore, the following sections will highlight how additional layers of genomic information outside of coding region sequence encode a history of symbiotic interactions and associated selective pressures.