Prediction of Protein Aggregation from the Primary Sequence

We provide here a detailed description on how the elucidation of the physico-chemical, sequential, and structural determinants of protein aggregation into amyloid-like structures has been exploited to develop a variety of mathematical tools intended for the accurate prediction of the deposition propensities of polypeptides. Although these methods have also been employed for the analysis of the aggregation and propagation of prions and prion-like proteins, the singular features of this particular kind of amyloids have led to the development of specific tools for its prediction (Alberti et al. 2009; Toombs et al. 2012; Espinosa Angarica et al. 2013; Lancaster et al. 2014; Sabate et al. 2015; Zambrano et al. 2015a), whose underlying rationale lies out of the scope of this chapter.

The improved understanding of the determinants of protein aggregation described above, and the realization that they are mostly encoded in the primary sequence, has inspired the development of a variety of mathematical methods that aim to predict in silico the propensity of a given polypeptide chain to aggregate, requiring solely the knowledge of its primary structure. To date, more than 20 such computational tools have been made public (Table 7.2), each of them focusing on the analysis of a particular set of determinants of protein aggregation to perform its prediction. Depending on the nature of the determinants of protein aggregation evaluated and on the rationale of the approach employed in order to implement their predictions, the methods can be classified into three main families (Caflisch 2006; Belli et al. 2011). Empirical or phenomenological predictors are based on the experimental assessment of the different intrinsic determinants of protein aggregation. On the other hand, structure-based approaches rely on the analysis of the conformational compatibility of sequence stretches within the evaluated polypeptide against the structural determinants of amyloid-like structures. Most methods in this second class approximate such suitability by focusing on the assessment of the specific features p-strands or p-sheets adopt when they assemble into a cross-p supersecondary conformation. Finally, consensus methods depart from the premise that the analysis of a particular determinant, or a discrete set of determinants, is not sufficient for an accurate prediction of APRs. Therefore, these predictors attempt to identify these “Hot Spot” by defining a consensus prediction from the outcome of other methods, both phenomenological and structure-based.

Aside from the differences in the properties under evaluation (which define the class they are ascribed to), and in their mathematical implementation, the predictors may also vary in the type of output they provide—though, it commonly comprises the identification of APRs along the polypeptide sequence together with their

Method

Underlying principle

Level of development

URL

References

Phenomenological

Chiti et al.

(2003)

Rationalization of the impact of mutations on the aggregation kinetics, based on hydrophobicity, secondary structure propensity, and net charge

Equation

Chiti et al. (2003)

Dubay et (al. 2004)

Refinement of the equation by Chiti et al. in order to predict aggregation rates by considering hydrophobicity, net charge, hydrophobic/hydrophilic patterns, pH, ionic strength, and polypeptide concentration

Equation

DuBay et al. (2004)

Pawar et al.

(2005)

Adaptation of the expression by Dubay et al. so as to derive intrinsic aggregation propensity scales at different pH for the 20 naturally-occurring proteinogenic amino acids, on the basis of hydrophobicity, secondary structure propensity, hydrophobic/hydrophilic patterning, and net charge

Equation

Pawar et al. (2005)

Tartaglia et al. (2004)

Rationalization of the impact of mutations on the aggregation rate, according to tysheet propensity, accessible surface area, ^-stacking interactions, and dipolar moment of side chains

Equation

Tartaglia et al.(2004)

Tartaglia et al. (2005a)

Adaptation of the equation by Tartaglia et al. in order to predict tyaggregating stretches along the sequence, with discrimination of their preferred (either parallel or antiparallel) orientation

Equation

Tartaglia et al. (2005a)

Zyggregator

Development of the equation by Pawar et al. in order to implement the impact of gatekeeper residues along the sequence and the influence of structural protection against aggregation

Server

www-

mvsoftware.ch.

cam.ac.uk/

Tartaglia and Vendruscolo (2008); Tartaglia et al. (2008)

TANGO

Estimation of the population of different states, including tyaggegates, according to a partition function that considers amino acid physico-chemical properties and conformational preferences, as well as extrinsic physico-chemical parameters

Server

tango.crg.es/

Femandez-Escamilla et al. (2004)

Idicula-Thomas & Balaji

Calculation of an amyloidogenic propensity score on the basis of tripeptide-based secondary structure propensity along the sequence, compositional bias towards order-promoting residues, and estimated protein half-life and thermostability

Equation

Idicula-Thomas and Balaji

(2005)

Prediction of Protein Aggregation and Amyloid Formation 217

Method

Underlying principle

Level of development

URL

References

AGGRESCAN

Experimental determination of an in vivo aggregation propensity scale for the 20 naturally-occurring proteinogenic amino acids

Sever

bioinf.uab.es/

aggrescan/

Conchillo-Sole et al. (2007)

SALSA

Calculation of an averaged tendency to adopt tystrand conformation, according to the Chou and Fasman secondary structure propensity scale

Equation

Available through AmylPred 2

Zibaee et al. (2007)

Pafig

Statistical selection of physico-chemical properties allowing to discriminate hexapeptides forming amyloid-like structure

Software

mobioinfor.cn/

pafig/index.

htm

Tian et al. (2009)

Structure-based

NetCSSP

Detection of hidden ty propensity through contact-dependent secondary structure prediction, employing artificial neural networks

Server

cssp2.

sookmyung.ac.

kr/

Yoon and Welsh (2004, 2005); Yoon et al. (2007); Kim et al. (2009)

SecStr

Consensus detection of coequal a and )) conformation propensities by at least 3 of 6 different secondary structure predictors

Software

Available

through

AmylPred2

Hamodrakas et al. (2007)

FoldAmyloid

Determination, for the 20 naturally-occurring proteinogenic amino acids, of an “average packing density” and different H-bonding probability scales derived from protein structural data

Server

bioinfo.protres.

ru/fold-

amyloid/

Galzitskaya et al.( 2006a); Garbuzynskiy et al. (2010)

PASTA 2.0

Calculation of (^-pairing probability between polypeptide stretches, on the basis of interaction potentials statistically derived from amino acid pairs occurrences in experimentally-resolved tysheets, either parallel or antiparallel

Server

protein.bio.

unipd.it/pasta2/

Trovato et al. (2006); Walsh et al. (2014)

Saiki et al.

Calculation of a suitability score for sequence stretches to fit a predefined amyloid structural template, according to hydrophobic and H-bonding interactions between contiguous side chains in hydrogen-bonded (1-strands (computed employing presupposed hydrophobic and H-bonding parameters for different groups of amino acids)

Equation

Saiki et al. (2006)

PIMA

Calculation of jl-pairing interaction energy for polypeptide segments of variable length threaded onto an in-register (either parallel or antiparallel) (1-sheet template, employing a physics-based energy potential

Equation

Bui et al. (2008)

  • (continued)
  • 218 R. Grana-Montes et al.

Method

Underlying principle

Level of development

URL

References

BETASCAN

Estimation of (3-strand pairing propensity, according to probabilities of residue pairs to be H-bonded in amphiphilic )i-sheets

Server

groups, csail.

mit.edu/cb/

betascan/

Bryan et al. (2009)

3D Profile

Conformational modelling to structural templates derived from hexapeptides forming amyloid-like structure, employing a physics-based force field

Server

services .mbi.

ucla.edu/

zipperdb/

Thompson et al.( 2006); Goldschmidt et al. (2010)

Pre-Amyl

Conformational modelling to structural templates derived from the coordinates of the amyloid-like crystal formed by NNQQNY, employing statistically derived interaction potentials

Server

Available

through

AmylPred2

Zhang et al. (2007)

Amyloidogenic

Pattern

Determination of a sequential pattern for amyloidogenicity, based on the intensive mutational analysis of the STVIIE peptide able to form amyloid-like structure

Amino acid pattern

Available

through

AmylPred2

Lopez de la Paz and Serrano (2004)

Waltz

Identification of amyloidogenic polypeptide regions based on a PSSM allowing to differentiate hexapeptides forming amyloid-like structure from non-forming ones; complemented with a parameter evaluating physico-chemical properties important for amyloid-like assembly, and another structural factor assessing conformational fitting to an amyloid-like template

Server

waltz.

switchlab.org/

Maurer-Stroh et al. (2010)

FISH Amyloid

Discrimination between patterns of position-specific amino acid co-occurrence associated to amyloidogenic or non-amyloidogenic polypeptide stretches, employing a machine learning approach

Server

comprec.pwr.

wroc.pl/fish/

fish.php

Gasior and Kotulska (2014)

GAP

Differential potentials for amino acid pairs in amyloid-like or (^-amorphous hexapeptides, derived from position-specific pairing frequencies with discrimination of their relative orientation along the |3-strand

Server

http://www.

iitm.ac.in/

bioinfo/GAP/

Gromiha et al. (2012); Thangakani et al. (2013); Thangakani et al. (2014)

AmyloidMutants

Evaluation of the population of different accessible states (restricted topologically to conform with known amyloid structural models) employing a partition function, according to statistically derived amino acid interaction potentials

Server

amyloid.csail.

mit.edu/

O’Donnell et al.( 2011)

(continued)

Prediction of Protein Aggregation and Amyloid Formation 219

Method

Underlying principle

Level of development

URL

References

STITCHER

Calculation of (1-strand pairing probability (employing BETASCAN scores) and their most likely assembly into a topologically constrained model of natural amyloids following additional energetic rules, including Q/N- and ?rt-stacking, van der Waals interactions, and inter-strand linker entropy

Server

stitcher.cs ail.

mit.edu/

(broken)

Bryan et al. (2012)

Consensus

AmylPred2

Consensus between the output from at least n/2 of n different aggregation predictors selected

Server

aias.biol.uoa.

gr/

AMYLPRED2/

Frousios et al. (2009); Tsolis et al. (2013)

Met Amyl

Statistical derivation of weighting parameters for a linear combination of outputs from other aggregation predictors

Server

metamyl.

genouest.org/

Emily et al. (2013)

220 R. Grana-Montes et al.

aggregative potential, as well as the relative or virtually absolute tendency to aggregate for the whole protein (Fig. 7.2a). Some approaches provide additional valuable estimates such as the nature of the pairing between p-strands (either parallel or antiparallel) in the p-sheets that could form the cross-p core of an amyloid-like fibril; or even attempt to forecast the quaternary assembly of the polypeptide chain in the amyloid-like structure.

In this section, we provide a brief description of the prediction methods that have been most widely exploited by the scientific community working in the field of protein aggregation (Fig. 7.2b), and of all those which have been employed to build up consensus predictors.

 
Source
< Prev   CONTENTS   Source   Next >