Sulfation Site Predictions

Protein or rather peptide identification from mass spectrometry data has become straightforward, and for “obscure” species the process is aided by homology and related sequences and/or transcriptome data. However, PTMs cannot be predicted from genome sequences.

Identification of consensus motifs, that is, structural details that are recognized by the modifying enzyme(s), may aid the analytical process. For example, the most promising cleavage method can be selected for generating the “best” modified peptide for chromatographic separation and mass spectrometric characterization. Similarly, if the motif to be considered can be specified, the search space can be properly restricted; for example, Byonic (www. or ProteinProspector ( offer this option for N-glycosylation. Thus, a database search with the MS/MS data becomes significantly faster, and the resulting identifications much more reliable. Narrowing down the options and predicting which sites can be modified has been a desire whenever a new PTM has been reported.

As far as Tyr sulfation is concerned, the first such studies investigated the primary structure of proteins, the amino acid sequences around the sulfated Tyr residues. Several physical attributes were tested, and the presence of acidic residues seemed to be the most distinctive feature for Tyr sulfation [37, 68]. It soon became obvious that considering only the adjacent residues does not provide sufficient information for reliable modification prediction. A wider net was cast, with the assumption that secondary structure may influence the target recognition: a ±5 amino acid-wide sequence stretch was investigated using a position-specific scoring matrix (PSSM) [69]. Despite the fact that no unambiguous motif could be identified by this approach, conclusions were drawn about the potential Tyr sulfation in biologically significant transmembrane receptors [70] and viral proteins [71].

Table 9.2 Prediction softwares tested with a few proteins with reliable site assignments.




Sulfinator (cutoff E = 55) (default value)

sulfosites (prediction sensitivity: 90%)



probability = 0.5)



75, 78, 282, 417, 420

75, 78, 282, 417, 420

54, 75, 78, 239

75, 78

Histatin 1 (P15515)

46, 49, 53, 55



46, 49, 53



22, 25, 31, 39, 51, 58, 77

8, 39

31, 39, 51

39, 416, 417

Lumican (P51884)

20, 21, 23, 30






20, 38, 39, 45, 47, 53, 55







313 or 314

259, 263, 265, 271, 278, 290, 293, 297, 299

265, 305


Still sticking to the analysis of the linear amino acid sequence using four different hidden Markov models, a new prediction tool was established that is available as part of ExPASy's Proteomics Tool package ( sulfinator/) [72]. We have tested the predictive power of this tool with six proteins with reliably assigned modification sites: vitronectin, fibromodulin, osteomodu- lin, histatin 1, lumican, and bone sialoglycoprotein (see Table 9.2). Sulfinator was able to identify all the assigned sites for vitronectin, but did not find any potential sites in fibromodulin and lumican; produced 10 candidates in bone sialoglycopro- tein, but the assigned site was not among them; predicted 2 sites for osteomodu- lin, 1 was correct; and indicated a single (assigned) site for histatin 1. The ExPASy toolbox features an additional piece of software, sulfosites (http://sulfosite.mbc. [73]. This prediction tool was developed using support vector machine (SVM) learning, and both the linear sequence and the secondary structure around the sulfated sites were considered. This predictor was also tested and missed the marks approximately as much as Sulfinator (Table 9.2). The newest prediction website - - has been developed along the same lines, using SVM and considering both the secondary and primary structures of the protein [74]. It did not perform significantly better than the previous two when tested with the same proteins (Table 9.2).

< Prev   CONTENTS   Source   Next >