Global Analysis of Protein Phosphorylation by Mass Spectrometry
With the advancement of MS instrumentation that allows for the rapid and sensitive sequencing of peptides and the concurrent development of phospho- peptide enrichment strategies, we now have the ability to analyze changes in protein phosphorylation at an unprecedented scale. These studies have led to a rethinking of the prevalence of phosphorylation throughout the proteome and have revealed the true complexity of signaling pathways in living cells.
Global phosphoproteome studies initially focused on qualitative cataloging of phosphosites in a number of model organisms. One of the earliest global studies analyzed the phosphoproteome in the budding yeast S. cerevisiae and identified 383 phosphosites, 365 of which had never been identified before . Since that time, the yield of global phosphoproteomics studies has expanded to the point where it is not uncommon to identify >20,000 phosphosites from a single biologic sample [5, 33, 132, 134, 135]. From hundreds of studies, more than 300,000 phosphorylation sites have been identified across a multitude of both prokaryotic and eukaryotic species, with more than 200,000 of these coming from mammals alone. Greater than 95% of these sites have been identified using MS-based phosphoproteomics . Prior to the onset of global phosphoproteomics, it was often estimated that about 30% of proteins could be phosphorylated , but these studies now suggest that up to 75% of the eukaryotic proteome is phosphorylated in at least some cell or tissues , and the true percentage may be even higher. Based on the natural abundance of serine, threonine, and tyrosine, there are approximately 700,000 potential sites of phosphorylation in the human genome alone .
For a true comparative analysis of global phosphorylation, a quantitative proteomics strategy is necessary. Quantitative phosphoproteomics allows researchers to investigate signaling pathways in different model systems to identify phosphorylation events that vary in terms of abundance and duration as a result of a given stimulation . A particularly challenging aspect of signal transduction research is the identification of the protein kinase and phosphatases that are activated by a given stimulus. Global quantitative phos- phoproteomics, in conjunction with a suite of bioinformatic tools, has been particularly useful in addressing this challenge . The integration of kinase consensus motifs that are enriched among the phosphopeptides that are identified to be modulated by the stimuli of interest with known protein-protein interactions can lead to the initial mapping of the pathways affected (see following section on bioinformatics).
The success of these large-scale studies often relies on choosing the right biological system and the correct controls. Several different approaches have been used to map signaling pathways . A common technique is to expose a cell to an extracellular agonist of a receptor of interest and measure changes to the cellular phosphoproteome (Figure 2.5a), sometimes in combination with small-molecule inhibitors to determine the roles of individual pathway components. One example of this approach is to compare starved cells treated with serum and starved cells treated with serum and a MAPK inhibitor to map the MAPK-dependent phosphoproteome . A similar approach is to compare cells with aberrant signaling (i.e., as a result of expression of a constitutively active kinase) to the same cells treated with a chemical inhibitor of that pathway. This method has been used to study signaling downstream of mTOR and BRAF [141-143]. When specific inhibitors are not available, RNAi-mediated knockdown or CRISPR-based genome editing can be used to inactivate components of signaling pathways to see how the phosphopro- teome is affected . It is important to remember that with chemical inhibition, direct, short-term effects on signaling can be studied, whereas with genetic knockdown, the effects will be more long term and may be more likely to be indirect.
Spatial contexts can also be assessed by using subcellular fractionation to study how discrete components of the cell respond to a stimulus , and moreover this can help to enrich for relevant phosphorylation sites. Time- course studies can add another layer of resolution to signaling pathway maps (Figure 2.5b). By collecting and analyzing samples at several time points after a stimulus, it can be possible to map waves of phosphorylation events and more
Figure 2.5 Quantitative global phosphoproteomics studies. (a) Static comparisons of phosphoproteomes can come in a variety of forms. This could include (1) a comparison of a normal cell to a cancerous one or to the normal cell where a constitutively activated kinase is overexpressed, (2) a comparison of a cell line with an activated signaling pathway to one where a component of that pathway has been either genetically or pharmacologically ablated, and (3) a comparison of 2 different cell types or tissues. (b) Time-course studies allows phosphorylation events downstream of a given stimulus to be ordered. By mapping waves of phosphorylation as a function of time, signaling pathways can be organized from stimulus to phenotype, and hypotheses about epistatic relationships between pathway members can be generated.
easily trace how early phosphorylation events lead to later ones. The choice of time points for a quantitative phosphoproteomics experiment should be carefully considered. For example, tyrosine phosphorylation of transmembrane receptors immediately downstream of external stimuli often peaks at 5-10 min after stimulus , but phosphorylation of transcription factors as a result of a stimulus can often last for many hours .
Although global phosphoproteomics studies can generate testable hypotheses and allow inferences to be made about the pathways that are modulated in a particular cell state, understanding the functional significance of any given phosphorylation site requires further validation , which usually involves mutation of the phosphosite. Deciding how to measure the effect of this mutation may depend on prior knowledge of the function of the protein. Disruption of the phosphosite may affect the protein structure or localization or affect interactions with other proteins. Phosphosite mutations can also have effects on other PTMs, as it has been shown in multiple studies that phosphorylation can both positively and negatively regulate other modifications such as ubiquitylation, acetylation, and glycosylation [127, 138, 147]. Finally, when a
phosphosite mutation causes a loss of function, it is important to rule out that the mutation did not result in nonspecific unfolding of the protein.
Quantitative phosphoproteomics studies usually follow similar strategies to conventional global protein-centric quantitative proteomics experiments. Both label-free and stable isotope labeling strategies have been successfully applied to phosphoproteomics studies. Metabolic labeling as typified by stable isotope labeling by amino acids in cell culture (SILAC) is an efficient strategy for both expression proteomics and phosphoproteomics studies , particularly for cell culture experiments (Figure 2.6a, right). It has been demonstrated that whole organisms, up to the rat, can be metabolically labeled [149, 150], though this approach is time consuming and expensive. In recent years isotope-labeled chemical tags have become more common, as these approaches are more easily adapted to primary cells and tissues (Figure 2.6b and c). Chemical tags include nonisobaric tags, such as dimethyl labeling and mass tags for relative and absolute quantification (mTRAQ), where quantitation is done using the MS1 scan [151, 152], and isobaric tags, such as isobaric tags for relative and absolute quantitation (iTRAQ) and tandem mass tags (TMT), where quantitation is
Figure 2.6 Quantitative strategies for phosphoproteomics studies. (a) Metabolic labeling approaches as typified by SILAC are most efficiently utilized with samples from cell culture (right side). In simple binary comparisons the control (?) cells are grown in heavy label (H) and the experimental cells (?) in light (L) label. After harvesting, the populations are mixed to produce one sample (?). Three biological replicates of the experiment would require three final samples (1 x 3=3). A third experimental condition can be evaluated using an intermediate or medium (M) labeled population. By using a common condition in a triple-label experiment, a 5-point time course can be evaluated with two final samples. Final number of samples for three biological replicates would be six. In cases where it is difficult or impossible to metabolically label a sample such as primary cells or tissues (left side), a labeled sample (?) can be produced from cell culture or whole organism labeling and spiked into the experimental samples including the controls. In this case, peptides from the labeled sample act as internal references permitting comparisons across all the samples. While enabling accurate relative quantification across multiple samples, this approach has no option for multiplexing, and the number of final samples is equal to the number of experimental conditions. (B) Chemical labeling strategies where the quantitation is done using precursor ion intensities from the MS1 scans are set up similar to the way metabolically labeled cell culture experiments are performed. This is largely due to the limited number of labeling options for MS1 readouts. The major advantage of this approach is that it can be used with any type of sample. (c) Chemical labeling where the quantitation is derived from reporter ions in the MS2 spectrum have a big advantage in terms of multiplexing. Each sample, up to ten, is separately labeled with a different reporter ion (1, 2, 3, ...) and the samples combined. In the simple binary comparison, all three biological replicates can be evaluated in one final 6plex sample. The five-point time course can be done as three 5plex samples, as shown (*). Additional time points (up to ten) might be evaluated in the approach, with no additional cost in instrument time. Alternatively, an additional replicate might be added and all four samples analyzed in two 10plex experiments. (d) Label-free quantitation has the advantage that the sample is analyzed with any prior manipulations. However, there are no options for multiplexing. Each sample is analyzed independently.
done on the MS2 scan [153, 154]. Although there are few studies that have directly compared these methods specifically for phosphoproteomics studies, at least one indicates that iTRAQ is a better choice than mTRAQ for these types of experiments . Using a HeLa cell-derived sample, iTRAQ-based quantitation resulted in threefold more phosphopeptides being identified when compared to mTRAQ. This is likely due to less complex MS1 scans in iTRAQ experiments, as all labeled peptides are isobaric. An advantage that MS2-based isobaric labeling approaches have over the various MS1-based approaches is in multiplexing (Figure 2.6c). MS1-based approaches are typically 2plex experiments, though it is possible to perform a 3plex experiment with triple-label SILAC (Figure 2.6a, right) or differentially labeled chemical tags (Figure 2.6b). The increase from 2- to 3plex comes with a loss in sensitivity, however, as the sample becomes proportionally more complex. Isobaric tagging methods, on the other hand, are commonly available in 6, 8, and 10plex formats, with no significant loss in sensitivity since the complexity in the MS1 scan is unchanged.
On the other hand, reporter ion compression is a problem that is particular to isobaric labeling approaches. If another precursor (peptide or otherwise) is coisolated with the targeted peptide precursor ion in the MS isolation window, the resulting MS/MS spectrum will have fragment ions and reporter ions from both peptides. The targeted peptide will likely still be identifiable using this spectrum, but the reporter ion ratios may have been diluted or distorted by the reporter ions arising from the other peptide . Since most peptides are present at a 1:1 ratio in the given samples being studied, ratios tend to be compressed by this interference toward 1, resulting in an underestimation of actual protein/peptide abundance differences. It has even been suggested that fragmentation of this background noise can lead to this compression problem . This problem can be magnified for quantification of post-translationally modified peptides, such as phosphopeptides. For protein-level quantitation, ratio compression effects on individual peptides can be diluted out by measurements of other peptides where there is no compression. Phosphopeptide quantification, however, often relies on just one or two spectra.
Several methods have been adopted to minimize this challenging issue. Narrowing the isolation window and targeting a peptide for MS/MS when it reaches its apex of elution have both been applied to improve isobaric quantitation . Reducing sample complexity using high-resolution sample fractionation has been shown to somewhat alleviate this problem, although this can significantly increase analysis time . Removal of coisolated peptide ions of different charge states than the precursor improves the quantification accuracy as well; however this requires specialized instrumentation . MS3 methods (described earlier) has been shown to drastically decrease the reporter ion contamination but significantly affects the sensitivity of the analysis . More recently, coisolation and cofragmentation of multiple fragments have been shown to increase the reporter ion signals and increase sensitivity and accuracy of quantitation compared to standard MS3 methods [134, 161]. As of now, this feature is only available on the most state-of-the-art instrumentation, although in theory it could be applied to most instruments that are amenable to isobaric quantitation. Finally, a computational approach has been developed that estimates the degree of ratio compression for each tandem mass spectrum based on potential contaminating peaks observed in the preceding and subsequent MS1 spectra and corrects for this interference accordingly . This computational approach is more amenable to protein-based quantitative studies rather than phosphoproteomics studies, however, given the variability seen with this correction on a peptide-to-peptide basis.
Label-free quantitation is an alternative to isotope labeling strategies (Figure 2.6d). For this approach, the ion intensity of a peptide is measured over its chromatographic elution profile . The integrated intensities across the peak for a given peptide are compared between LC-MS runs of different samples to measure a relative abundance of the peptide between samples . Sample from any biological source can be analyzed using this approach, without the significant extra cost of stable isotope labeling. While this seems like a relatively straightforward approach, it is essential that individual chromatograms can be unambiguously assigned to a given peptide to ensure accurate quantitation. In addition, ionization efficiency of a peptide is affected by the presence of coeluted peptides, and therefore changes in retention time can have dramatic effects on ionization and measured intensity . Therefore, it is imperative to have a highly reproducible LC-MS system for these experiments, with narrow peak widths and robust retention time stability. High mass accuracy and resolution of the instrument can significantly help with unambiguous peptide assignment. In addition, chromatographic peak alignment software is necessary to define peptide elution profiles across multiple data files . These types of experiments often require multiple technical replicates to make sure that the LC-MS and data analysis tools are robust. Unlike metabolic labeling strategies, the number of samples that can be compared is not limited. However, each sample that is added to the experiment dramatically increases instrument time, as each must be analyzed individually (Figure 2.6d). Nevertheless, this approach has been successfully utilized for a number of global quantitative phosphoproteomics studies [167, 168].
Quantitative proteomics expression studies almost always rely on quantitative data from multiple peptides that are used together to infer the relative protein abundance. This is an important distinction from quantitative phos- phoproteomics studies, where usually quantitation of a phosphosite is based on a single, and at most a few, phosphopeptides. Therefore, there is usually less confidence built into a quantitative phosphosite experiment than there would be for a protein abundance experiment, since there are significantly less intensity measurements used for calculating these numbers. One can help increase the confidence in quantitative phosphoproteomics experiments by including more technical and biological replicates in the study, so that more measurements of relative abundance can be made.
A very important consideration for quantitative global phosphoproteomics studies is that any measured changes in phosphopeptide abundance not only could be the result of a change to that PTM itself but also could be due to a more general change in the protein level [127, 138]. Therefore, it is almost always required to include a quantitative proteomics experiment to measure changes in protein abundance from the same samples used in the quantitative phosphoproteomics experiment. This is particularly important for longer- term studies, such as siRNA treatments, or comparisons of different tissues. At these time scales, effects on gene and protein expression can confound measurements of phosphosite abundance. However, even for short-term manipulations such as growth factor stimulations, effects on protein stability can lead to changes on the protein level that cannot be measured in a phosphoproteomics experiment. Normalization of phosphosite abundance to protein abundance needs to be considered in all experiments of this type .