Sanger and Shotgun Sequencing

After Crick and his collaborators deciphered the first genetic code in 1961, it became clear that sequencing DNA molecules was the key to many biological questions. In 1971, Wu and Taylor [304] published the first DNA sequence made of 12 bases printed in the abstract of their paper. Two years later, Gilbert and Maxam [95] published a DNA sequence made of 24 bases

Table 2.2

Number of citations (JV) of methods based on DNA (source: Web of Science; accessed 2019-11-04)

Keyword search

N

Reference search

N

AFLP AND gen*

11,630

Vos et al. 1995 [286]

9216

DNA Microarray

36,120

RAD-Seq OR RADSeq

757

Miller et al. 2007 [188]

528

Peterson et al. 2012 [221]

954

RFLP AND gen*

34,686

Botstein et al. 1980 [22]

5257

minisatellites AND gen*

710

Wyman and White 1980 [305]

590

microsatellites AND gen*

21,101

Weller et al. 1984 [295]

122

Jeffreys et al. 1985 [130]

2894

DNA sequenc*

426,877

Sanger et al. 1977 [243]

67,595

(also printed in the abstract of their paper) obtained with a new method. Four years later, Sanger and his collaborators [243] published a simpler and more efficient method which became extremely popular—and eventually gave Sanger his second Nobel Prize [242]. The idea is to reproduce the process of DNA replication in vitro providing the required nucleotides but with a small proportion of dideoxynucleotides that lack the oxygen atom at the З'-position (see Fig. 1.1) so no nucleotide can be further bound to it. The dideoxynucleotides are labelled (with radioactivity or fluorescence), so it is possible to identify the positions of the different bases after migrating the final fragments on a gel. The reactions are conducted separately with dideoxynucleotides containing each base and the fragments are introduced in four different wells of the gel. The final results are four columns of bands which positions give the sequence of bases (Fig. 2.5).

PCR combined with Sanger sequencing became the major approach to acquire DNA sequence data during decades [58]. This approach has known few variants: one is known as shotgun sequencing, first proposed in 1979, where overlapping DNA fragments are sequenced (the “reads”) and then assembled to reconstruct the whole sequence [257]. Another important innnovation was brought by the first automatic sequencers in the mid 1980s [255]. These two innnovations set the way for high-throughput sequencing (Sect. 2.3.2).

In the early 1980s, the scientific community realized that it would be useful to provide a public database of DNA sequences acquired throughout the world: GenBank was first released in 1982 with 606 sequences and a total of 680,338 bases (see Sect. 2.3.8 for updated numbers).

DNA Methylation and Bisulfite Sequencing

The study of the chemical transformations of nucleic acids has a long history. Methylation of cytosine has been shown to occur in bacterial DNA as early as

Figure 2.5

Sketch of the Sanger sequencing method. (A) The DNA to be sequenced (template) usually after PCR. (B) The template is replicated with the four nucleotides and a small proportion of one of them with the 3'-0H removed. The single letters represent the deoxynucleotides, so dN are the dideoxvnucleotides normally written ddNTP. with N = {A,C,G,T}. (C) In each tube, replication ends randomly when a dideoxynucleotide is incorporated so the lengths of the DNA fragments give the positions of the bases.

1925 [131]. Since then, a lot of research has been done on DNA methylation showing the ubiquity and complexity of this phenomenon [21, 308]. Frommer et al. [86] developed a method, known as bisulfite sequencing, that can identify the nucleotides carrying methyl-cytosine (Met-C). The DNA is first treated with sodium bisulfite (NallSCfy), a salt used in the food industry as additive. This treatment changes C into U (like in RNA but with deoxyribose) whereas Met-C are unchanged. DNA is then amplified by PCR so U is copied as T and Met-C as C, and the PCR products are sequenced with the Sanger method. In many genomes, a nucleotide with C is sometimes followed by one with G to form what is called “CpG islands” where the “p” is for the phosphate linking the two nucleotides on the same strand [73]. DNA methylation and CpG islands are thought to play a role in the regulation of gene expression [135]. Other variants make use of the high-throughput technologies introduced in the next section [179].

 
Source
< Prev   CONTENTS   Source   Next >