Exome Sequencing

The exome is the set of exons in a genome. By targeting these sequences, one is more likely to find variation with functional effects, for instance linked with diseases. In practice, a technique called “capture” is used: the genome is first fragmented in small DNA molecules, and then mixed with probes so that only exons will hybridize. The probes may be on a microarray, or in a solution. The excess DNA is washed out, so that only the DNA of interest is kept, and then sequenced. Similarly to microarray methods, and by contrast to RNA-Seq, the probes must be built using prior knowledge on the genome.

Sequencing of Pooled Individuals

Sequencing of pooled individuals (pool-Seq) is an approach where individual samples are pooled [248]. This approach is attractive because it reduces considerably the overall costs of the lab work (the sequencing run of an HTS platform is typically far more expensive than the other steps in a genomic study such DNA extraction or PCR). Furthermore, the approach is useful for pooled samples where the individuals are difficult to analyze separately such as swarms of fish larvae [176] or microbes (see Sect. 10.3). A relatively large number of studies were devoted to develop statistical and computational tools to analyze pool-Seq data [e.g., 113].

Designing a Study With HTS

Designing a population genomic study based on HTS is not trivial because of the many parameters to take into account. Lowry et al. developed a model- based approach to help set up such designs [172]. Interestingly, they provide R code with their article to do the calculations for several methods: RADseq, RNA-Seq, exome sequencing, whole genome sequencing (WGS), and pool-Seq [248]. Their model include technical details related to the specific sequencing technologies as well as considerations on the size of linkage blocks in the genome (see also the follow-up paper by the same authors for further discussion [173]). Currently, this model considers sequencing outputs made of short reads (« 100 bp) and it will be interesting to see how it can be extended to consider long reads (> 1 kb).

The Future of DNA Sequencing

HTS technology is a fast moving field, and innovation will not slow down soon. Current progress in nanotechnology will very likely bring new technologies in the near future [e.g., 57, 62]. In the last few years, HTS has contributed to a substantial acceleration of the quantity of genomic data: GenBank is now hosted by the National Center of Biotechnology Information (NCBI) together with the “Whole Genome Projects” started in 2002. GenBank now contains « 380 Gb in more than 216 million sequences, and the Whole Genome Projects repository hosts > 5.9 Tb in 630,128 projects with more than 1 billion sequences.[1]

Table 2.4

Main file formats used in this book

Data

Format

Text or binary

Extensions

Genomic

positions

Allelic

Tabular

both

.txt .tab .csv .xls ...

No

Specific

text

.dat .gen .gtx .str ..,

No

VCF

text

.vcf

Yes

BCF

binary

.bcf

Yes

DNA

FASTA

text

.fa .fas .fasta .fna

Yesa/No

FASTQ

text

.fastq

No

SAM

text

.sam

Yes

BAM

binary

.bam

Yes

SNP

PED

text

.ped

Yes

BED

binary

.bed

Yes

Annotations

GFF

text

.gff. gtf

Yes

a Implicit if a whole genome sequence is stored

  • [1] :ihttps://www.ncbi.nlm. nih.gov/genbank/statistics/ (accessed 2019-10-24).
 
Source
< Prev   CONTENTS   Source   Next >