Sanger and Shotgun Sequencing
After Crick and his collaborators deciphered the first genetic code in 1961, it became clear that sequencing DNA molecules was the key to many biological questions. In 1971, Wu and Taylor [304] published the first DNA sequence made of 12 bases printed in the abstract of their paper. Two years later, Gilbert and Maxam [95] published a DNA sequence made of 24 bases
Table 2.2
Number of citations (JV) of methods based on DNA (source: Web of Science; accessed 2019-11-04)
Keyword search |
N |
Reference search |
N |
AFLP AND gen* |
11,630 |
Vos et al. 1995 [286] |
9216 |
DNA Microarray |
36,120 |
||
RAD-Seq OR RADSeq |
757 |
Miller et al. 2007 [188] |
528 |
Peterson et al. 2012 [221] |
954 |
||
RFLP AND gen* |
34,686 |
Botstein et al. 1980 [22] |
5257 |
minisatellites AND gen* |
710 |
Wyman and White 1980 [305] |
590 |
microsatellites AND gen* |
21,101 |
Weller et al. 1984 [295] |
122 |
Jeffreys et al. 1985 [130] |
2894 |
||
DNA sequenc* |
426,877 |
Sanger et al. 1977 [243] |
67,595 |
(also printed in the abstract of their paper) obtained with a new method. Four years later, Sanger and his collaborators [243] published a simpler and more efficient method which became extremely popular—and eventually gave Sanger his second Nobel Prize [242]. The idea is to reproduce the process of DNA replication in vitro providing the required nucleotides but with a small proportion of dideoxynucleotides that lack the oxygen atom at the З'-position (see Fig. 1.1) so no nucleotide can be further bound to it. The dideoxynucleotides are labelled (with radioactivity or fluorescence), so it is possible to identify the positions of the different bases after migrating the final fragments on a gel. The reactions are conducted separately with dideoxynucleotides containing each base and the fragments are introduced in four different wells of the gel. The final results are four columns of bands which positions give the sequence of bases (Fig. 2.5).
PCR combined with Sanger sequencing became the major approach to acquire DNA sequence data during decades [58]. This approach has known few variants: one is known as shotgun sequencing, first proposed in 1979, where overlapping DNA fragments are sequenced (the “reads”) and then assembled to reconstruct the whole sequence [257]. Another important innnovation was brought by the first automatic sequencers in the mid 1980s [255]. These two innnovations set the way for high-throughput sequencing (Sect. 2.3.2).
In the early 1980s, the scientific community realized that it would be useful to provide a public database of DNA sequences acquired throughout the world: GenBank was first released in 1982 with 606 sequences and a total of 680,338 bases (see Sect. 2.3.8 for updated numbers).
DNA Methylation and Bisulfite Sequencing
The study of the chemical transformations of nucleic acids has a long history. Methylation of cytosine has been shown to occur in bacterial DNA as early as

Figure 2.5
Sketch of the Sanger sequencing method. (A) The DNA to be sequenced (template) usually after PCR. (B) The template is replicated with the four nucleotides and a small proportion of one of them with the 3'-0H removed. The single letters represent the deoxynucleotides, so dN are the dideoxvnucleotides normally written ddNTP. with N = {A,C,G,T}. (C) In each tube, replication ends randomly when a dideoxynucleotide is incorporated so the lengths of the DNA fragments give the positions of the bases.
1925 [131]. Since then, a lot of research has been done on DNA methylation showing the ubiquity and complexity of this phenomenon [21, 308]. Frommer et al. [86] developed a method, known as bisulfite sequencing, that can identify the nucleotides carrying methyl-cytosine (Met-C). The DNA is first treated with sodium bisulfite (NallSCfy), a salt used in the food industry as additive. This treatment changes C into U (like in RNA but with deoxyribose) whereas Met-C are unchanged. DNA is then amplified by PCR so U is copied as T and Met-C as C, and the PCR products are sequenced with the Sanger method. In many genomes, a nucleotide with C is sometimes followed by one with G to form what is called “CpG islands” where the “p” is for the phosphate linking the two nucleotides on the same strand [73]. DNA methylation and CpG islands are thought to play a role in the regulation of gene expression [135]. Other variants make use of the high-throughput technologies introduced in the next section [179].