The genome as a regulatory system

According to the New York Times (Angier 2003), the DNA molecule experienced its "midlife crisis" in 2003, the 50th anniversary of its structure's publication by Watson and Crick and the year that the complete human genome sequence was released. This crisis point has grown even more urgent, as new findings have brought increasingly into question the mainstream view of DNA sequence as a directive blueprint for gene function and hence organismic form. Is phenotypic expression in fact the determinate readout of a coded, self-contained program? Do genes specify all of the essential information for development, or do developmental instructions emerge from a system in which genes play a part? After a decade of extraordinary progress in molecular genetics and developmental biology, we can move forward from this crisis point to an expanded understanding of genes in their genomic, epigenetic, and environmental contexts.

There is a great deal of new and, in some cases, astonishing information to consider as we move toward this new understanding. In just the past decade, complete genomic sequences have become available for over 250 eukaryotes, from the common mouse, chicken, and housefly to the flying lemur and the Chinese pangolin (Kanchisa et al. 2014; The Genome Institute at Washington University 2014). These sequences reveal not, as expected, a straightforward text coding for discrete functional genes with neutral, noncoding sections interspersed, but a surprisingly high proportion of mobile genetic elements (such as transposons), transcription factors, and selectively constrained noncoding sequences, all pointing to complex, interactive, regulatory mechanisms (references in Szathmary et al. 2001; Sarkar 2006; Pagel and Pomiankowski 2008; Garfield and Wray 2010; Lenhard et al. 2012; K. Morris and Mattick 2014). For example, the results of the multiyear ENCODE project, which aimed to identify all functional elements in the human "blueprint of life," show that much of the human genome codes for regulatory elements such as noncoding RNAs (ENCODE Project Consortium

  • 2012) . With the advent of sophisticated biochemical tools to directly investigate the functional protein- protein and DNA-protein interactions that regulate gene transcription and translation, as well as in situ hybridization and immunohistochemistry techniques for visualizing the precise location and timing of gene activity and studies of transcriptomes that permit its quantification, a richly complicated picture has taken shape to replace the "wonderfully simple" one-way informational pathways from genes to proteins envisioned in the early decades of molecular biology (Keller 2000; see also S. B. Carroll et al. 2005). This picture reveals a functional complexity and pliancy, which as yet are far from understood, in the path from gene to organism (ENCODE Project Consortium 2012; Qu and Fang
  • 2013) . At this point, it has become abundantly clear that phenotypic outcomes are not rigidly predetermined by the organism's DNA sequence; indeed, in eukaryotes, the very premise that the presence of a specific DNA sequence necessarily codes for a specific protein has begun to "unravel" (Sarkar 2006; detailed history provided by K. Morris and Mattick
  • 2014). The genome has been revealed not as a string of set instructions but as a remarkably dynamic system of coactive signals and feedbacks.

In her 1983 Nobel prize acceptance remarks (quoted by Keller 2000, 34), the iconoclastic geneticist Barbara McClintock described the genome as "a highly sensitive organ of the cell." Recent insights confirm this view of the genome in its cellular context as a developmental system—a complex of interacting factors that can be both robust and flexible in response to myriad internal and external inputs.1 This system property explains not only epistasis and pleiotropy but also a number of otherwise unaccountable observations: why inactivating single genes has little or no phenotypic impact for the great majority of loci in eukaryotic genomes, including those of yeast, nematodes, angiosperms, and mammals (Pagel and Pom 2008 and references therein); why the expression of most mutations varies in different genetic backgrounds and environmental conditions (Lewontin 2000; Remold and Lenski 2004; Brem et al. 2005); and why there is no correspondence between the number of genes and phenotypic complexity (Szathmary et al. 2001; Gluckman and Hanson 2005). Furthermore, the multiple interacting signals and chemical switches that characterize gene regulatory pathways in eukaryotes lend their developmental systems a diffuse interdependence termed weak linkage to create an inherently flexible type of regulatory organization (Kirschner and Gerhart 1998).

Our point of departure, then, is that phenotypic expression is guided not by DNA sequence per se but by the genome's highly resilient systemic regulatory processes, which shape the extraordinarily precise tissue- and stage-specific expression of genes. In fact, one of the key insights of contemporary [1] molecular genetics is that the majority of genes participate in such regulatory interactions rather than code for proteins (Mattick 2012). This regulatory modulation explains how evolutionarily conserved coding sequences such as specific Hox genes may lead to entirely different phenotypic outcomes, depending on the phylogenetically distinct "developmental system" in which they occur (Gottlieb 2004). The genes of the Ultrabithorax and abdominal-A complexes, for instance, are present in all arthropods (including insects, crustaceans, and myriapods) as well as in their more ancient sister taxon, the Onycophora, or velvet worms. Evolved changes in the regulation of these shared Hox genes along the body axis result in distinct expression domains, and consequently body plans, in these different types of arthropod: Ultrabithorax and abdominal-A are expressed in the abdominal segment in the fruit fly Drosophila, in the thorax of the crustacean Artemia, in all body segments except the head in the centipede, and only in the hindmost tip of the worm Onycophora (Grenier et al. 1997; Figure 1.1). Regulatory changes have also evolved in the effects of these genes on downstream target genes; their products repress the expression of Distall-less to prevent limb formation in the insect abdomen but not in their expression domains in the other arthropod taxa. The contributions these genes make to developmental specification are thus entirely dependent on the regulatory—that is, genomic—context in which they occur.

Such coordinated effects on gene expression result from the combined influence of multiple regulatory elements and events (e.g., Lenhard et al. 2012; for comprehensive discussions, see S. B. Carroll et al. 2005 and Davidson 2006). Identifying these factors and understanding how their enzyme kinetics and other real-time chemical interactions condition gene expression and hence development have proved complex beyond all expectation. Cis- regulatory elements located within or near a structural gene locus serve as regulatory sites to which activator or repressor proteins specifically bind to mediate transcriptional activity at that locus. Simultaneous binding of particular combinations of transcription factors may be needed to activate transcription and hence gene expression (Davidson et al. 2002). Different cis-elements come into play at

The genes of the Ultrabithorax

Figure 1.1 The genes of the Ultrabithorax (Ubx) and abdominal-A (abdA) complexes are present in all arthropods and in their ancient sister taxon, the velvet worms. Body-plan diversity in different arthropod groups reflects evolved regulatory changes that cause different expression domains of these shared Woxgenes along the body axis. Ultrabithorax (indicated in gray) and abdominal-A (indicated in black) are expressed in the abdominal segment in the fruit fly Drosophila; in the thorax of the crustacean Artemia; in all body segments except the head in the centipede (a myriapod); and only in the hindmost tip of the velvet worm, Onycophora. Image courtesy of Jen Grenier and Steve paddock. From Grenier et al. 1997 (caption modified), reproduced with permission of Elsevier publishers.

different developmental stages or locations to influence the gene's transcription level, often depending on signals from multiple transcription factors (S. B. Carroll et al. 2005). At the same time, a single transcription factor or other signaling pathway component can play a number of different roles within a given organism by regulating distinct target genes in diverse tissues and developmental stages (S. B. Carroll 2008; V. Lynch and Wagner 2008). As a result of these complex interactions, subtle changes in cis-regulatory elements can have important and diverse pleiotropic effects (S. B. Carroll et al. 2005; G. Wagner and Zhang 2011) or, conversely, can lead to strong but more modular (i.e., localized) phenotypic impacts (Wray 2007 and references therein).

The astounding complexity of these cis-regulatory networks is illustrated by the exceptionally well- characterized signaling events that activate embryonic expression of the protein-coding gene endo-16 in the sea urchin Strongylocentrotus purpuratus. The

Endo-16 protein contributes to the specification of a particular embryonic cell lineage to form the en- domesoderm tissue in the early embryo, tissue that will eventually form the larval gut lining and skeletal rods (Davidson et al. 2002; Balhoff and Wray 2005; Oliveri et al. 2008). The endomesoderm forms from the differentiation of cell lineages that derive from a ring of very early (sixth-cleavage) embryonic cells (Figure 1.2a). Based on an impressive body of molecular developmental, gene expression, and experimental embryological data, a network model has been constructed showing the cz's-regulatory gene interactions that underlie this single developmental step (Figure 1.2b). Note that this system was chosen as the subject of a complete “genomic regulatory network" study precisely because of its relative simplicity: rather few genes are expressed in the sea urchin embryo; the embryo produces a morphologically simple larva with few distinct cell types; and there are relatively few regulatory steps between gene expression in the embryo and final cell fate (Davidson et al. 2002). The most important result of this intensive collaborative study may be not the specific biochemical events uncovered but the "deep, layered and hierarchical" regulatory complexities (Davidson 2010, 912) revealed to underlie even this simple developmental transition.

Further complexity is introduced to genomic systems by the existence of another major aspect of gene interaction. Trans-regulatory proteins and RNAs derived from more distant sites in the genome can also initiate or block a gene's transcription and can influence its mRNA dynamics and stability (see Lemos et al. 2008 for an excellent overview). Unlike the more modular cis-regulatory elements, these trans-acting enhancers and repressors can occur thousands of base pairs from a gene's promoter site or even on a separate chromosome and are inherited independently of the gene they regulate. But the utility of this cis/trans distinction is limited, because trans-acting factors often interact with or bind to cis-regulatory sequences to jointly regulate gene expression, as has been found in yeast, humans, insects, and plants (Lemos et al. 2008). Even more broadly, a strict distinction between structural and regulatory genes has broken down as the ubiquity of both direct and indirect gene-gene interactions has been revealed (Yukilevich et al. 2008). It is perhaps most accurate to see all genes as "regulatory"—that is, as components of an inherently epistatic developmental system rather than as discrete, fixed bits of information. Interestingly, this is equally true at the level of quantitative trait loci (QTLs), heritable factors typically composed of multiple DNA regions which jointly influence continuously varying (or quantitative) traits: the expression of a particular QTL is contingent on the rest of the genome, and QTL-QTL epistasis is well known (Mackay 2013; e.g., Weinig and Schmitt 2004; Bloom et al. 2013).

These regulatory systems provide for stunning evolutionary lability, in part because their many interacting elements provide numerous sites where mutation can create novel phenotypic effects, and hence where natural selection can act (Sultan and Stearns 2005; Garfield and Wray 2010; Moczek et al. 2011). For example, nucleotide substitutions in cis-regulatory regions can add or remove binding sites for particular transcription factors or change the intensity of protein binding at those sites, and new regulatory functions can be acquired if binding sites are relocated to new target genes due to recombination (Balhoff and Wray 2005). Evidence suggests that sequence variation affecting gene expression is abundant across genomes (Rifkin et al. 2005) and, further, that under artificial selection substantial differences in gene expression can evolve in very few generations (e.g., Toma et al. 2002). In fact, in contrast to expectations for neutral evolution of noncoding sequences, because of their roles in regulatory interactions such sequences can in fact be subject to intense selection (Rifkin et al. 2005; Hemberg et al. 2012; and references therein), which leads either to the evolution of highly conserved developmental mechanisms or to morphological diversification. Although mutations that alter developmental outcomes can arise in either regulatory or coding regions (Galant and Carroll 2002), cis-regulatory regions may constitute evolutionary "hot spots," where diversifying regulatory mutations accumulate while protein-coding regions remain stable (Stern and Orgogozo 2009). Among species of Drosophila, for example, there are many cryptic sequence changes that create different cis-binding sites for a given set of transcription factors that are essential to body patterning (Ludwig et al. 1998).

Based on these insights, a consensus is emerging that the key to patterns of trait conservation and diversity in multicellular organisms is not change in DNA sequence at functional genes but rather the evolution of regulatory interactions (S. B. Carroll 2008; V. Lynch and Wagner 2008; Garfield and Wray 2010). Evolutionary developmental (evo-devo) biologists explicitly study change in regulatory pathways as the basis of morphological innovations, an approach that restores developmental processes to the study of evolution after a long period of exclusion (Amundsen 2001). Evo-devo studies have produced a number of key insights to the origins of phylogenetic diversity (Wagner 2000). For instance, humans share a number of noncoding regulatory elements (such as enhancers that mediate tissue- specific gene expression) with distant vertebrate relations such as zebrafish and pufferfish, teleosts with whom humans last shared a common ancestor

A sea urchin embryo

Figure 1.2 A sea urchin embryo (a) is shown at a very early (sixth-cleavage) stage of development. Four of the embryo's micromere cells (two of which are visible here, marked with asterisks) give rise to the endomesoderm cell lineage that subsequently differentiates into the animal's gut lining and skeletal rods. Image courtesy of Andrew Ransick, California Institute of Technology. In (b), a data-based network model shows the complex os-regulatory gene interactions that lead to the differentiation of these endomesoderm cells. Arrows and solid lines indicate gene activation or repression; rectangles show downstream differentiation genes. The signaling pathways that activate endo-16 expression are shown in the second panel from the right. Details are given by Davidson et al. (2002) and Peter and Davidson (2010, 201 1). Image downloaded 2/22/201 5 from the Davidson Lab website (; reproduced (with modified caption) by permission of Eric Davidson, California Institute of Technology (copyright Hamid Bolouri and Eric Davidson). For the color image, see Plate 1.

almost 450 million years ago (Venkatesh et al. 2006; Figure 1.3a). The extraordinary evolutionary stability of these regulatory regions provides evidence that such regions can indeed be selectively maintained. Surprisingly, though, humans share an even larger number of conserved noncoding elements with our even more remote relatives, the cartilaginous fishes (exemplified by the elephant shark, Callorhinchus milii; Figure 1.3b), although we last shared a common ancestor with this early group of jawed vertebrates well over 500 million years ago (Venkatesh et al. 2006). Evidently the radiation of the teleost fishes, the largest and most diverse vertebrate group, was characterized by diversification of these various regulatory elements.

The shift from studying macroevolution as change in DNA sequence to a focus instead on changes in gene regulation is both exciting and daunting. Perhaps an even more difficult step is to incorporate gene regulatory dynamics into population-genetic studies of microevolution. The particular challenges of studying selection on regulatory variation arise from the epistatic complexity and pliancy of developmental systems (Garfield and Wray 2010). Genetic variation in trans-acting molecules such as transcription factors can be particularly difficult to identify, since these effects often arise jointly from several sites across the genome (Lemos et al. 2008). Even sophisticated genomic mapping approaches are problematic in traits involving this

Noncoding, regulatory genetic elements can be highly evolutionarily conserved

Figure 1.3 Noncoding, regulatory genetic elements can be highly evolutionarily conserved. (a) The pufferfish fugu rubripes, a teleost fish, shares with humans many noncoding enhancer regions that mediate tissue-specific gene expression. These regions are evidently under strong stabilizing selection. Image courtesy of Byrappa Venkatesh. (b) The genome of the elephant shark, Callorhinchus milii, a member of the yet more evolutionarily remote class of cartilaginous fishes, contains an even greater number of regulatory elements in common with the human genome. Image credit and copyright Doug perrine,

kind of complex genetic architecture (Shimizu and Purugganan 2005).

One approach to this "confusing maze of revealed connections" (Koonin and Wolf 2008, 15) is to analyze networks of molecular gene regulation as complex systems of sequential interconnections and feedbacks (Davidson 2010 and references therein). Perhaps paradoxically, these analytical network studies can illuminate certain fundamental properties of genomes as evolved, biological systems. When gene regulatory networks are modeled as sets of "nodes" (i.e., functional genes) and "edges" (i.e., interactions with other genes or shared regulatory factors), a rich diversity of possible outcomes is generated, just as a given genome can result in diverse individual phenotypes. This diversity is produced even when model regulatory networks are greatly simplified by allowing only on and off nodal states and one-way edges (Lemos et al. 2008). Network approaches to actual gene expression data also provide key insights to the nature and evolution of genomes. A meta-analysis of network topologies, based on published genome sequence data and expression profiles across a broad taxonomic sample, showed significantly more highly connected gene loci compared with random gene networks (S. Bergmann et al. 2004). Using available knockout data, this study also found that the most highly connected genes tended to be implicated in essential aspects of function and were evolutionarily the most conserved (S. Bergmann et al. 2004). This result confirmed a model by A. Wagner (1996) showing that more densely connected networks were less sensitive to disruption by mutation, that is, more evo- lutionarily stable. Both theoretical and empirical studies thus indicate that dense interconnectedness of regulatory interactions may be a fundamental property of genomes as evolving, robust developmental systems.

A further layer of dynamic complexity must be considered in the initial step from gene sequences to phenotypic outcomes: at the posttranscriptional stage, alternative splicing and editing of exons (coding sequences) can produce different mRNA transcripts and hence different proteins from a single gene at different times and locations (Maniatis 1991; Mazin et al. 2013; and references therein).

Remarkably, then, even the amino acid sequence in a polypeptide product may not be determined by a gene's DNA sequence (Lewontin 2000). This mechanism can produce considerable functional diversity: for example, alternative splicing of exons composed of tandem arrays in a single Drosophila axon guidance receptor gene, Dscam, can potentially generate 38,016 different protein isoforms (Crayton et al. 2006), and even more extreme cases are known. Although initially considered a rare genetic quirk, alternative splicing is now recognized as a widespread mechanism for tissue-, sex-, and stage-specific regulation of gene expression in eukaryotes (V. Lynch and Wagner 2008). In humans, for example, estimates of the proportion of genes that produce multiple (and largely tissue- specific) products due to alternative mRNA splicing have risen from approximately 10% a decade ago to between 92% and 94% (E. Wang et al. 2008). In particular, studies of the human brain reveal that, during both postnatal development and aging, widespread splicing changes occur in genes associated with ontogenetically specific events such as synapse formation and with brain function and neurodegenerative conditions (Mazin et al. 2013).

The regulation of these functionally precise "variable readings" of eukaryote genes is not yet fully understood, although several mechanisms have been implicated. These include RNA binding factors that interact with cis-acting RNA regions, "ri- boswitches" that bind to small metabolites to act as regulatory sensors, and noncoding small RNAs that silence or degrade specific mRNA sequences (Lem- os et al. 2008; E. Wang et al. 2008; Mazin et al. 2013; also see Section 1.2 on epigenetic regulation by noncoding RNAs). Alternative splicing introduces a wholly unexpected openness to the process of DNA-based information transfer: the final mRNA transcripts produced by editing and splicing are emergent, short-lived entities that "do not reside on the chromosome" and may even be assembled after the transcript has entered the cytoplasm rather than in the nucleus (Keller 2000, 64). In addition, functional diversity can be generated from a given coding DNA sequence by posttranscriptional editing of RNA base-pair sequences (Mattick and Mehler 2008 and references therein). Although it has been found in diverse organisms (and indeed was first identified in a protozoan; Covello and Gray 1989), this type of RNA editing appears to be most common in vertebrates, where it may play an important role in regulating gene expression associated with developmental and functional brain plasticity (Mattick and Mehler 2008). However, most RNA editing appears to occur in noncoding RNAs, which (although still poorly understood) have been increasingly implicated as regulators of gene expression (Ha and Kim 2014). In this light, it seems likely that RNA editing is of primary importance as a modulator of RNA-mediated epigenetic regulation (Nishikura 2010 and references therein; also see Section 1.2).

  • [1] This recognition by biologists follows two decades ofprescient argument by philosophers of biology in favor of a"developmental systems approach" to both heredity and evolution, an approach built on a growing critique of gene-centricconceptual models; for an excellent introduction to this literature, see the edited volume by Oyama et al. 2001.
< Prev   CONTENTS   Source   Next >