DNA Sequencing—Advent of the Genomics Era
DNA sequencing is a powerful tool used to determine the exact chemical make-up (i.e., nucleotides A, T, G, C) of purified genetic material. The first DNA sequencing methods were independently developed by Gilbert-Maxam and Sanger in 1980. In the Gilbert method, DNA is radiolabeled while being replicated in E. coli, carefully fragmented, and separated for size analysis. The Sanger, or terminator, method has been more widely adopted, and therefore is described in detail in Chapter 2 (see 2.2.7). Sanger’s technique involves the in vitro replication of DNA in the presence of radiolabeled dideoxynucleotides, also called terminator nucleotides. They get this name because they terminate strand synthesis when incorporated into DNA. In 1986, Leroy Hood updated the Sanger method by substituting fluorescent, in lieu of radioactive, labels of the terminator nucleotides (Institute 2000-2004b). Doing so collapsed sequencing into a single reaction, the products of which are detected by computer-aided lasers and fluorescent sensors (Institute 2000-2004b).
The development of automated sequencers ushered in a new biotechnological era—genomics—by making the sequencing of the human genome possible. The highly publicized and federally-funded Human Genome Project (HGP) began in October 1990 and was estimated to take 15 years and cost three billion dollars (Institute 2000-2004b). Just 11 years later, the U.S. White House celebrated the “completion” of the whole human genome when two draft sequences were published in Nature and Science (Venter et al. 2001; Lander et al. 2001). By 2003 the draft was polished into a final version (Green, Watson, and Collins 2015; Institute 2016b). All told, thanks to the unprecedented collaboration between thousands of international scientists, the HGP took just 13 years and ~$2.7 billion13 to complete (Institute 2016b). Some 25 years after this grand project began, researchers are still assigning meaning to the sequences obtained (Green, Watson, and Collins 2015). Being the first large-scale project of its kind, the HGP modeled a new kind of research—consortium-based and interdisciplinary—what some are calling “big” science (Green, Watson, and Collins 2015).
As the HGP unfolded, sequencing and computational technologies were invented and modified to meet the needs of the project. For example, whole genome sequencing of less complicated genomes was used to practice and fine-tune the technologies employed by the HGP. The genome of the first free-living organism—a strain of the Haemophilus influenza bacterium—was published by J. Craig Venter14 and 39 others in the July 1995 issue of Science (Fleischmann et al. 1995). Their work was the first to use shotgun cloning, in which a whole genome is fragmented, cloned, sequenced, and reassembled computationally through the identification of overlapping sequences (Fleischmann et al. 1995). In doing so, individual reads are assembled into one contiguous (or contig) sequence (see Figure 2.11). Just a year later, the whole genome sequence of the first eukaryotic organism—Saccharomyces cerevisiae—was also published in Science (Goffeau et al. 1996). The sequencing of other significant model organisms15 followed soon after.
Whole genome sequencing really took off in 2005, with the advent of Next Generation Sequencing (i.e., NGS, or more generally, Next Gen) methods. 454 Life Sciences published a sequencing-by-synthesis method16 that year in Nature (Margulies et al. 2005). This approach involves the direct measure of released pyrophosphate molecules as single nucleotides are added to a growing DNA strand.17 Doing so mitigates the need for terminator nucleotides and labor-intensive sequencing gels. Sequencing-by-synthesis methods are described in more detail in Chapter 2. Around the same time, George Church’s lab from Harvard Medical School described a multiplex polony sequencing method18 in their 2005 Science publication (Shendure et al. 2005). Both the sequencing-by-synthesis and multiplex polony methods resulted in significant savings, due to reduced reaction volumes and enhanced throughput. Authors of the multiplex polony strategy cited a one-ninth reduction in cost per base, as compared to conventional methods (Shendure et al. 2005). As Next Gen sequencing methods have evolved to increase accuracy and sequence read lengths, the cost of genome sequencing has plummeted. If the HGP
had been done in 2015, the cost would have been just $1,500, assuming that the same rate of technological innovation occurred in the absence of the HGP-fostered collaboration of the 1990s and early 2000s (Institute 2016b). Biotechnologists today work to actualize a $1,000 human genome sequence—the price required to routinely use whole genome sequencing in the clinic.