MAPPING AND QUANTIFICATION OF THE DERIVATIVES OF 5mC
To understand the mechanism and significance of the dynamic balance between cytosine methylation and demethylation in numerous biological processes in both normal and disease states, it is imperative to grasp collective ideas of the whole-genome-wide cytosine modification landscape and the quantification of differential enrichment or depletion of each derivative at single-base resolution. Over the past few years, researchers have endeavored to develop several techniques for mapping and quantifying the derivatives of 5mC (5hmC, 5fC, and 5caC); they have achieved striking technical advances in next- generation sequencing based on deep sequencing and grandly raised the scale of study from a single locus to the whole genome, with resolution at a single-base level. Here we discuss widely used methods in two different, but interdependent, categories: pulldown-based methods and bisulfite sequencing (BS-seq)-based methods (Fig. 3.3). Simply, the principle behind all these mapping methods is based on selective modification, enrichment, or both using specific affinity alterations, differential alterations, or a combination with chemicals or enzymes on specific residue(s) for selective readout. In general, affinity-based profiling is relatively cost-effective, but it has lower resolution and lacks collective information on the relative enrichment at each modification locus. The low resolution of affinity or pull-down-based methods and recognition of the importance of derivatives of 5mC in biological processes drive the field to develop additional technologies to map 5mC and its oxidative forms at single-base resolution, and to
Figure 3.3 Mapping of 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, and5-carbox- ylcytosine with bisulfite sequencing and modified bisulfite sequencing methods at single-base resolution. In the standard bisulfite (BS) sequencing, both 5-methylcytosine (5mC) and 5-hydroxy- methylcytosine (5hmC) can be identified as C. In the oxidative BS (oxBS) method, KRuO4 leads to oxidization of 5hmC to 5-formylcytosine (5fC), which can be read as T in the subsequent BS treatment. Thus, 5hmC from the original genomic DNAs can be determined by subtraction from the BS (ie, T from oxBS- C from the standard BS sequencing). In the TET-assisted BS (TAB) sequencing, first, p-glucosyltransferase (|3-GT) can convert 5hmC to 5-glucosylmethylcytosine (5gmC). In the subsequent treatment with TET, 5mC and 5fC become 5-carboxylcytosine (5caC). Thus, upon BS treatment, the resulting C, 5caC, can be read as T, whereas only 5gmC (ie, 5hmC in the original genomic DNA) can be read as C. The signal of 5hmC can be identified by direct reading the results. In the chemical modification-assisted 5fC- assisted bisulfite (fCAB) sequencing, O-ethylhydroxylamine (EtONH2)-treated 5fC can be protected from the subsequent BS treatment, and thus read as C. Similarly, in the reduced BS method, treatment of NaBH4 can lead to reduction of 5fC to 5hmC. Thus, in the fCAB or the reduced bisulfite (redBS) sequencing, 5fC in the original genomic DNA can be identified as C. In the chemical modification- assisted bisulfate (caCAB) sequencing, 5caC treated with 1-ethyl-3-[3-dimethylaminopropyl]-carbodi- imide hydrochloride (EDC) can be protected from the subsequent BS treatment, and thus read as C. In the methylation-assisted bisulfite (MAB) sequencing, unmodified C can be methylated (5mC) by the S-adenosyl-methionine-dependent CpG methyltransferase M.SssI. Upon BS treatment, 5fC and 5caC, not other forms, can be read as T. Signals specific for 5fC or 5caC can be determined after subtracting the standard BS results.
distinguish individual forms quantitatively. The principle of BS-seq-based methods is conversion of cytosine with methyl moiety to cytosine with selective treatment for conversion protection and subtraction of readout from BS-seq signals. Combined with the standard BS-seq, oxidative bisulfite sequencing (oxBS-seq), and Tet-assisted bisulfite sequencing (TAB-seq) provide detailed insights into the genome-wide distribution of 5hmC with single-base resolution.
Here we list only some representative reagents as examples to better clarify the procedure. Sodium bisulfite (NaHSO3) is for BS-seq to specifically deaminate unmethylated cytosine, but not others. Antibodies against 5mC, 5hmC, 5fC, and 5caC are used for DNA immunoprecipitation sequencing (Ficz et al., 2011; Jin, Wu, Li, & Pfeifer, 2011; Shen et al., 2013; Stroud, Feng, Morey Kinney, Pradhan, & Jacobsen, 2011; Williams et al., 2011; Wu et al., 2011; Xu et al., 2011). For selective chemical labeling or glucosyl- ation, periodate oxidation, biotinylation [glucosylation, periodate oxidation, biotinylation (GLIB)], or glycosylated 5-hydroxymethylcytosine (g5hmC)-binding protein 1 (JBP1), there is T4 bacteriophage P-glucosyltransferase (в-GT) for the addition of azide- modified or -unmodified glucose to 5hmC (g5hmC), JBP1 to pull down g5hmC, and biotin probe for the addition of biotin to 5hmC (Pastor et al., 2011; Raiber et al., 2012; Robertson et al., 2011; Song et al., 2013, 2011; Terragni, Bitinaite, Zheng, & Pradhan, 2012). Another great value of BS-seq is that it can provide subtractive readout in combination with a multitude of sequencing methods modified by diverse selective chemical treatments. Some of the reported methods to map 5fC and 5caC also adopted the power of the merge with BS-seq (Fig. 3.3). In the reduced bisulfite sequencing (redBS-seq) method, with the selective reduction of 5fC to 5hmC by sodium borohydride (NaBH4), followed by bisulfite treatment, 5fC is read as C in the redBS-seq and T in the BS-seq. 5fC can be elucidated quantitatively at single-base resolution by redBS readout subtraction (Booth, Marsico, Bachman, Beraldi, & Balasubramanian, 2014). Booth also developed oxBS-seq, which selectively oxidizes 5hmC to 5fC with the same principle of readout subtraction (Booth et al., 2014). In TAB-seq, 5hmC is glucosylated by P-GT into g5hmC, treated with TET, converting all other derivatives to C or caC, and then g5hmC is the only C readout. Identification of 5caC throughout the genome with single-base resolution is made possible with the chemical modification-assisted bisulfite sequencing method (Lu et al., 2013). 5caC is protected from the bisulfite treatment with 1-ethyl-3-[3-dimethylamniopropyl]-carbodiimide hydrochloride and reads as C. Since 5caC reads as T in the conventional BS-seq, 5caC can be identified by subtracting C of the CAB-seq output from T of the conventional BS-seq output.