Beyond mCG: DNA Methylation in Noncanonical Sequence Context
E.A. Mukamel1, R. Lister2,3
University of California San Diego, La Jolla, CA, United States; 2The University of Western Australia, Perth, WA, Australia; 3The Harry Perkins Institute of Medical Research, Perth, WA, Australia
INTRODUCTION: BEYOND CG METHYLATION
The genetic code based on the sequence of DNA nucleotides A, C, G, and T is thought to be a universal and invariant feature of life on Earth. Advances in genome sequencing have enabled comprehensive approaches to epigenome profiling that increasingly point to a diverse set of extensions to the genomic code in different cell types, even within the same species. Symbolic diversity is a familiar feature of human languages: French rarely uses the letters k or w, but it has access to a range of diacritical marks such as e that are not used in English. Just as we can easily recognize walked as English and marche as French based on the symbols in each word, recent epigenomic profiling efforts allow a new appreciation of how biological cells and tissues use distinct alphabets of epigenomic marks to regulate their specialized cellular functions. The new findings increasingly show that the mammalian epigenome is a symbolic system that encodes, stores, and transmits information through development and, potentially, across generations. Epigenomic marks, including DNA methylation and covalent modification of histone proteins, enhance the coding capacity of the genome by expanding the number of symbols available for representing gene regulatory information. To understand genomic information processing, we must make sense of the variety of symbolic elements used by particular cell types in each species.
Methylation of cytosine in genomic DNA is an essential epigenetic modification that primarily represses transcription and regulates other genomic processes across most, although not all, plant and animal species. The widespread presence and functional role of methylcyto- sine at CG dinucleotides has long been recognized (Suzuki & Bird, 2008); however, methylation at CA, CT, and CC positions (collectively called non-CG methylation, or mCH) in mammalian cells has also been established (Ramsahoye et al., 2000). By combining modern whole-genome shotgun DNA sequencing with sodium bisulfite conversion, in a technique called MethylC-seq (Lister & Ecker, 2009), the methylation status of more than 90% of genomic cytosines can now be experimentally determined at single-base resolution. Although MethylC-seq detects both methyl- and hydroxymethyl-cytosine (mC and hmC), techniques
Copyright © 2017 Elsevier Inc.
All rights reserved.
DNA Modifications in the Brain ISBN 978-0-12-801596-4
such as Tet-assisted bisulfite sequencing (TAB-seq) profiling (Yu et al., 2012) enable the two modifications to be distinguished at base resolution throughout the genome.
This advance in methylome profiling technology first showed that although IMR90 human fetal lung fibroblast cells contain <0.02% of their methylcytosine in the non—CG context, human embryonic stem (ES) cells harbor nearly a quarter of their methylcytosines at non-CG positions (Lister et al., 2009). Subsequent surveys of a range of cells initially seemed to confirm this pattern, showing abundant non-CG methylation in pluripotent cells (Laurent et al., 2010; Lister et al., 2011), but little or no non-CG methylation across differentiated cell types including primary tissue samples and differentiated cells derived from pluripotent cells (Xie et al., 2013; Ziller et al., 2011). It was surprising, then, that MethylC-seq profiling of brain tissue from mouse (Xie et al., 2012) and human (Lister et al., 2013;Varley et al., 2013; Zeng et al., 2012) revealed a substantial amount of non-CG methylation. By purifying nuclei of neurons expressing the marker NeuN, cell type-specific profiling showed that non-CG methylation accounts for roughly half of all methylcytosine in adult frontal cortex neurons (Lister et al., 2013). This represents the most abundant level of non-CG methylation of any cell type yet observed. TAB-seq profiling in mouse frontal cortex and human ES cells showed that almost all of the non-CG methylation is in the form of mC and not hmC (Lister et al., 2013;Yu et al., 2012).
To appreciate the potential significance of non-CG methylation, it is important to consider the density of CG and non-CG positions in the human genome (Fig. 5.1). CG
Figure 5.1 The 16 dinucleotides in the human genome are unevenly distributed, with CG dinucleotides (green) greatly depleted compared with non-CG positions (blue). As a result, the average spacing between CG positions is -100 bp, whereas non-CG positions occur every -2.1 bp.
dinucleotides occur at around 1 in 100 positions in the genome, far less than the 1 in 16 positions expected in a random sequence. CG sequences have been lost during evolution due to the higher rate of mutation of methylcytosine (Saxonov, Berg, & Brutlag, 2006). Around 11% of CG positions are concentrated in CG islands, a small genomic compartment associated with gene promoters and covering ~1.4% of the genome. By contrast, non-CG positions occur at approximately 1 of every 2.1 bp. Thus, even a low rate of methylation at non-CG sites may have a substantial impact by virtue of the 50-fold increased density of these sites relative to CG positions.