176
The Open Biology Journal, 2009, 2, 176-184
Open Access
Multiple Evolutionary Mechanisms Reduce Protein Aggregation
Joke Reumers1,2, Frederic Rousseau*,1,2 and Joost Schymkowitz*,1,2
1
VIB Switch Laboratory, Brussels, Belgium
2
Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
Abstract: The folding of polypeptides into stable globular protein structures requires protein sequences with a relatively
high hydrophobicity and secondary structure propensity. These biophysical properties, however, also favor protein
aggregation via the formation of intermolecular beta-sheets and, as a result, globular structure and aggregation are
inextricable properties of protein polypeptides. Aggregates that are enriched in beta-sheet structures have been found in
diseased tissues in association with at least twenty different human disorders and the effect of aggregation on protein
function include simple loss-of-function but also often a gain of toxicity. Given both the ubiquity and the potentially lethal
consequences of protein aggregation, negative selective pressure strongly minimizes aggregation. Various evolutionary
strategies keep aggregation in check, including (1) the optimisation of the thermodynamic stability of the protein, which
precludes aggregation by burial of the aggregation prone regions in solvent inaccessible regions of the structure, (2)
segregation between folding nuclei and aggregation nuclei within a protein sequence, (3) the placement of so-called
gatekeeper residues at the flanks of aggregating segments, that reduce the aggregation rate of (partially) unfolded proteins,
and (4) molecular chaperones that target aggregation nucleating sequences directly, thereby further suppressing
aggregation in a cellular environment. In this review we describe the intrinsic features built into protein sequence and
structure that protect against aggregation.
Keywords: Amyloid, protein aggregation, protein aggregation and evolution, protein evolution, protein folding and
aggregation, evolutionary pressure.
INTRODUCTION
Misfolding and the associated aggregation of proteins
have been the object of intensive study in the last decade, as
they appear to be the molecular basis of neurodegenerative
disorders such as Alzheimer's and Parkinson's disease, and
other diseases such as type 2 diabetes [1]. To date, circa 40
disorders have been linked to protein aggregation [2]. Aggregation is unavoidable in globular proteins, because nonnative conformations can be adopted during or immediately
after synthesis, under stress conditions or as a consequence
of mutations or proteolysis. Although it seems that almost all
proteins are able to form aggregates when expressed at high
concentrations in vitro, they differ substantially in their
intrinsic propensity to do so under physiological conditions
[3]. Most importantly, aggregation is nucleated by short
sequence segments with specific physical properties and the
amino acid residues involved in aggregation are usually
segregated in the primary structure from the residues that are
critical for proper folding [4]. The major contributors to
aggregation propensity have been identified as hydrophobicity, net charge and propensity to form secondary structure, i.e. a predisposition for beta-sheet formation and an
aversion for alpha-helical structures [4]. The identification of
these determinants of aggregation facilitated the development of prediction algorithms that assess the effect of mutations on aggregation, identify the regions in the protein sequence that promote aggregation, and quantify the aggrega*Address correspondence to this author at the Vrije Universiteit Brussel,
Pleinlaan 2, 1050 Brussels, Belgium; Tel: +321 6346227; Fax: 321
6347181; E-mail:
[email protected] or
[email protected]
1874-1967/09
tion rates of unfolded proteins [5-9]. These computational
methods have enabled the large-scale analyses of the
aggregation behavior of full proteomes [10-12], which have
confirmed the ubiquity of aggregation propensity in proteomes of all kingdoms of life. Protein aggregation represents an enormous burden for cellular organisms: not only
the loss-of function of the individual aggregating proteins
imposes stress on the cell, but also the energy consumed by
the ATP-dependent protection mechanisms of the protein
quality control machinery. Hence, proteomes are subject to
strong evolutionary pressure to minimize aggregation [13].
The different mechanisms hindering effective protein aggregation in the cell are illustrated in Fig. (1) with the example
of alpha-1-antitrypsin (A1AT). The deficiency of antitrypsin
has been associated with aggregation of this enzyme and
results in liver dysfunction [14]. The enzyme has two
predicted aggregation-prone regions, which are buried in the
correctly folded form of the protein (i). These two regions
are flanked by so-called gatekeeper residues, which in the
unfolded state of the protein will prevent self-association
through charge repulsion or steric hindrance (ii). In addition
to these two mechanisms embedded in the protein’s
sequence and structure, the cell has developed a highly
advanced protein quality control system (iii) [15]. A large
variety of chaperones in the cell hinder the formation of
aggregates, not only by shielding the aggregation-nucleating
regions in the nascent chain, but also by sequestering
unfolded proteins from other identical proteins, and by
untangling partial aggregates [16]. In this review we will
focus on the intrinsic protein characteristics that counteract
aggregation.
2009 Bentham Open
Multiple Evolutionary Mechanisms Reduce Protein Aggregation
The Open Biology Journal, 2009, Volume 2
177
Fig. (1). The different strategies used to oppose the formation of protein aggregates. The structure and sequence shown is alpha-1antritrypsin (AAT), of which the deficiency, caused by aggregation, is associated with liver disease [14]. 1) Folding buries the sticky regions
in the core of the protein. 2) Well-placed gatekeeper residues prevent self-association by charge or steric repulsion and therefore inhibit the
aggregation process. 3) The protein quality control system has evolved to oppose and invert aggregation. For example members of the
Hsp70- family recognize the positive charged residues at the flanks of aggregation-nucleating regions. In the case of secreted AAT, quality
control is performed in the ER by the Hsp70 family member BiP [14].
AGGREGATION-PRONE SEQUENCES ARE BURIED
INSIDE PROTEIN STRUCTURES
Protein folding and aggregation are competing conformational reactions. As a result, the first defense mechanism
against aggregation is the stability of the native protein conformation itself: in a folded protein the backbone is locked in
the tertiary structure of the protein and therefore not accessible to form the inter-chain hydrogen bonds that are a determining factor in cross-beta aggregated structures [17].
Cooperativity in folding is related with resistance against
aggregation [18], and studies of the folding of a computationally designed protein suggest that the smooth folding
pathways of small polypeptides are the result of negative
selection against aggregation, and not a general property of
proteins that fold into a unique stable structure [19].
Although it has been discovered that globular native structures can also form aggregates through intermolecular strand interactions at edges of individual -sheets [20] or
three dimensional domain swapping [21-23], it is still
universally accepted that unfolded or partially unfolded
proteins generally have a higher propensity to aggregate than
the fully native states [24].
In a large-scale study using experimentally determined
stability measurements of 2351 mutations in globular proteins, Serrano and colleagues showed that stability is the
main evolutionary pressure in the absence of other factors
such as binding and catalysis [25]. Their analysis revealed
that misfolding is avoided primarily by selection for
stability, and also that avoiding misfolding-prone sequences
compromises stability, emphasizing the inextricable tie
between protein structure and aggregation. However, the
maintenance of an aggregating segment within a sequence
does not have negative consequences if the aggregation load
is not too high. There exists a “permissive” window for
aggregation: highly aggregating sequences are prevented but
moderately aggregating ones are tolerated [26]. This is
confirmed in proteome-wide studies of aggregation
propensities where the majority of the proteins have low
predicted aggregation scores, and only a small portion have
very high tendency to aggregate [10-12].
The relation between the tolerance for aggregation-prone
regions and the burial of these regions in folded proteins is
underlined by the differences in aggregation propensities between globular and intrinsically disordered proteins (IDPs).
Proteome-wide studies of aggregation propensities showed
that globular proteins from all-alpha, all-beta and mixed
alpha/beta SCOP classes showed similar levels of aggregation propensity, while natively unstructured proteins show
much lower average aggregation loads [10, 27]. In a Monte
Carlo simulation of small hydrophobic peptides with and
without disordered flanks, Abeln & Frenkel showed that
disordered flanks next to aggregating regions even prevent
aggregation [28]. Small hydrophobic peptides without disor-
178
The Open Biology Journal, 2009, Volume 2
Reumers et al.
Fig. (2). Interplay between stability and net charge determines the age of onset in familial ALS. The average survival time after
diagnosis is plotted in function of the protein stability changes (G) in SOD1. G for ALS-associated mutations that do not alter the net
charge of SOD shows a high correlation with survival time (R = 0.91). Increasing the net charge of the protein causes a shift toward longer
survival time, whereas decreasing the charge has the opposite effect. Reprinted with permission from [32].
dered flanks aggregated, while the peptides with unstructured flanks were stable as monomers or small micelle-like
clusters. The disordered flanks have no effect on the native
function of the motif, i.e. binding energy is not affected.
POINT MUTATIONS CAN MODULATE PROTEIN
STABILITY AND AGGREGATION TENDENCY
The amyloidogenicity of a protein can be reduced by
stabilization of the native structure (reviewed in [29]); conversely, many mutations associated with increased aggregation have been shown to destabilize the native structure.
This has been shown experimentally for several (disease
related) proteins, such as transthyretin in amyloidosis [30]
and Cu/Zn-superoxide-dismutase (SOD1) for amyotrophic
lateral sclerosis (ALS) [31]. In the latter case, Oliveberg and
co-workers could link experimentally determined stability
differences of apo-SOD1 to survival time of ALS patients
[32]. The stability change for ALS-associated mutations that
do not alter the net charge of SOD shows a high correlation
with survival time (Fig. 2). An additional large scale study
showed that the combination of increased aggregation propensity and decreased protein stability can account for 69%
of the variability in familial ALS patient survival times [31].
The link between increased destabilization and more aggressive disease development can also be found among transthyretin mutations in amyloidosis [30], where the rate of
tetramer dissociation needed for amyloid formation influences both disease penetrance and age of onset.
Chiti and coworkers examined the interplay between
decreased stability and increased aggregation in vivo: the
solubility of mutations in the N-terminal domain of
Escherichia coli HypF protein in the cell was compared with
their effect on stability of the protein. HypF-N has been
shown to convert to amyloid fibrils in vitro that are
morphologically similar to those found in amyloid disease
[33]. HypF-N variants carrying destabilizing mutations
aggregate after expression, whereas mutants with stability
similar to the wild type protein remain soluble in the E. coli
cytosol [34]. Although these studies show that destabilisation
is the major factor contributing to misfolding, it is certainly
not the only factor. Destabilisation does not always imply
misfolding and vice versa, as demonstrated by mutations that
affect aggregation independent of stability [35, 36].
GATEKEEPER RESIDUES DISRUPT STRETCHES
OF HYDROPHOBIC RESIDUES TO MINIMIZE
AGGREGATION PROPENSITY
The term structural gatekeeper was first introduced in
the context of the two-state folding pathway of protein S6
[37], as residues that steer the folding process by blocking
certain paths. It was later introduced in the context of the A
amyloid peptide aggregation by the same researchers as
``charged side chains that prevent aggregation by interrupting contiguous stretches of hydrophobic residues in the
primary sequence'' [38]. A computational analysis of the
aggregation properties of 26 proteomes by Rousseau and coworkers [12] with the TANGO algorithm [6] revealed a
strong enrichment of charged residues (arginine, lysine,
aspartate and glutamate) and proline at the flanks of aggregation prone regions. Their study showed that 90% of aggregation-prone regions are capped with at least one gatekeeper
residue, with a bias for positively charged residues at regions
with the highest aggregation propensities. A similar result
was obtained by Chiti and co-workers in the analysis of the
human proteome [10] with a different computational method
[7]. In accordance with the aforementioned study by
Rousseau et al., they found that Arg, Lys and Pro had higher
frequencies at the flanks of regions with high aggregation
propensity. In a follow-up study of the human proteome,
Rousseau et al. investigated the composition of the three
Multiple Evolutionary Mechanisms Reduce Protein Aggregation
The Open Biology Journal, 2009, Volume 2
179
Fig. (3). Disruption of aggregation motifs by polar residues and structure breakers. A+B. Enrichment of gatekeeper residues at the
flanks of aggregating regions. The ratio of amino acid frequency in the flanks versus the frequency of amino acids in the full data set is
shown for each gatekeeper type, considering 1 position (A) or three positions (B) before and after each aggregation-nucleating region. All
gatekeepers (P, R, K, D, E) are enriched in the flanks (ratio >1). The pattern is very distinct at the first position before and after the regions,
but the broader flanks also show this enrichment. When taking into account three positions (B) we also see an enrichment of Histidine (H)
and Asparagine (N), and to a lesser extent glycine (G) and glutamine (Q). Adapted from [11]. C+D. Opposition of aggregation by conserved
structure breakers. C. Aggregation of fibronectin type III domains is limited by conserved proline residues. Adapted from [40]. D. Conserved
glycines in human muscle acylphosphatase slow down the formation of aggregates. Adapted from [41].
amino acid positions before and after aggregation prone
regions [11]. Due to the long-range effects of electrostatic
interactions, the boundaries of aggregation nucleating zones
may not be strictly defined. The elevated usage of the 5
previously identified gatekeepers (P, R, K, D, E) on the
direct flanks of aggregation-prone regions (Fig. 3A) was
confirmed in the three C-terminal and three N-terminal
flanking positions (Fig. 3B). The enrichment was most prominent for the charged residues and less pronounced for
proline. Another feature of gatekeeper motifs that was highlighted in this study is the use of multiple gatekeepers: nearly
75% of all aggregation nucleating regions in the human proteome uses two or more gatekeepers. The type of gatekeeper
used varies between single and multiple gatekeeper motifs:
when using one single gatekeeper residue, proline is used
most often, but its usage decreases with the introduction of
more gatekeepers. Using multiple gatekeepers may be a
protection mechanism against mutation: redundancy in the
gatekeeper motif reduces the risk of initiating aggregation by
a single point mutation.
GLOBAL NET CHARGE AND STERIC HINDRANCE
PROTECT AGAINST AGGREGATION
In addition to safeguarding the flanks of aggregation
nuclei, charged residues and structure breakers such as
180
The Open Biology Journal, 2009, Volume 2
Reumers et al.
glycine and proline provide protection on the overall protein
sequence. For instance, the use of multiple structure breakers
to oppose aggregation was also found in Huntingtin, from
which the aggregation is associated to Huntington’s disease,
where the polyglutamine stretch is flanked by a proline-rich
region that keeps aggregation in check [39]. Other studies
show examples of prolines [40] and glycines [41] that were
evolutionary conserved to modulate aggregation. In the
investigation of three highly conserved prolines in fibronectin type III domains (Fig. 3C), no obvious structural or
functional role could be appointed to these conserved
residues. The stability of alanine mutations of these three
prolines in the 10th domain of human fibronection was
similar to that of the wild-type domain, but the aggregation
rate of the mutant proteins was significantly higher than that
of the original domain [40]. Analogous results were obtained
in the study of conserved glycine residues in human muscle
acylphosphatase (AcP, Fig. 3D) [41]: mutating these
glycines to alanine does not affect stability more than
mutating non-conserved positions, but it does accelerate
amyloid formation of AcP. Furthermore, an earlier extensive
mutation study of the same enzyme already demonstrated
that the aggregation behaviour of AcP could be modified by
the mutation of single amino acids. More specifically, the
authors showed an inverse correlation between the net
charge of the protein and its aggregation rate [42]. This anticorrelation between charge and aggregation is also illustrated
in amyotrophic lateral sclerosis, where the majority of
disease-related SOD1 mutations reduce the net charge of the
protein [43]. Oliveberg et al. performed a computational
analysis of 100 ALS-associated mutations in SOD1 and
showed that in comparison with other well-described disease
related genes and mutations, the charge bias of SOD1 is
significantly higher. In a similar comparison of SOD1 with
all disease-related proteins in the SwissProt database with
more than 50 causal mutations, the average charge difference
of SOD1 mutations was ranked second. The generality of
these protection mechanisms has been shown in mutational
studies of several proteins, where the introduction of charged
residues, proline and glycine resulted in reduced aggregation
kinetics or compromised stability of the formed aggregates
[34, 42, 44, 45]. The protection against aggregation provided
by these charged residues and structure breakers is also
employed in intrinsically disordered proteins, where a higher
proline content [46] and higher net charge [27, 47] contributes to lower aggregation.
lysine for positive supercharging, or by glutamate or
aspartate for negative supercharging. These supercharged
variants displayed their native functionality in vitro but also
remained soluble in conditions that normally cause the
proteins to aggregate. This approach might solve some of the
unwanted behavior of de novo designed proteins, and may
contribute to the adaptation of natural proteins to thrive in
non-natural conditions, such as increased temperature or the
presence of denaturing chemical additives [49]. These combined results suggest a strong evolutionary pressure on the
flanks of aggregation-prone regions, and confirms the use of
structural gatekeepers as a universal mechanism against
aggregation.
In addition to the involvement in the pathology of
misfolding disorders, protein aggregation also poses a problem in vitro, in biotechnology and biomedical research.
Building further on the aforementioned observations of
opposing aggregation with charges, Liu and colleagues set
out to design supercharged versions of naturally occurring
proteins [48]. By replacing solvent-exposed residues of a
monomeric (green fluorescent protein, GFP), a dimeric (glutathione-S-transferase, GST) and a tetrameric protein (streptavidin, SAV) with charged amino acids, they demonstrated
that by supercharging proteins it is feasible to obtain
correctly folded variants of the natural protein. The design of
GST and SAV mutants was performed with an automated
mutagenesis strategy: residues were ranked by increasing
solvent exposure calculated from the crystallographic structure, and then the highest ranked residues were replaced by
Although gatekeeper residues appear to be very effective
in guarding stretches of aggregation-prone residues, they
also imply a risk for disease development. As seen in several
examples, such as mutations of tau [58], the Alzheimer betapeptide [59] and -synuclein [60], mutating a single amino
acid can substantially change aggregation propensity and can
have dramatic effects on disease etiology. The TANGO
algorithm was used to study difference in aggregation caused
by known human disease mutations and neutral single nucleotide polymorphisms (SNPs) from the UniProt database
[11]. The two main observations of this analysis were that i)
the distribution of differences in the TANGO aggregation
scores for disease mutations showed more extreme differences and a smaller fraction of neutral changes than the
distribution for the neutral SNPs; and ii) the fraction of
disease mutations that cause a significant increase of protein
CHAPERONE BINDING
GATEKEEPER RESIDUES
IS
MODULATED
BY
However different in mechanism, most chaperones display a remarkable resemblance in substrate specificity and
prefer binding to hydrophobic stretches flanked by positive
charges. This has been shown by affinity studies for Hsp70
[50, 51], Hsp90 [52], and many other chaperones [53-55].
Although Hsp60 substrate specificity studies on GroEL have
not revealed such clear charge preferences, it is suggested
that proteins with negative charges fold rapidly by repulsion
forces in the negatively charged cage [56]. Application of the
substrate specificity preferences for DnaK and trigger factor
developed by Bukau and co-workers [50, 54] on the aggregating segments of the Escherichia coli proteome showed
that together these two chaperones target almost 100% of the
strongly aggregating sequences in E. coli [12]. This suggests
that chaperones recognize aggregation-prone regions by the
double criterion of having a hydrophobic stretch flanked by
(mostly positive) charges. The high prevalence of these
motifs in proteomes in the various kingdoms of life suggest
that the evolutionary pressure on proteomes to counteract
aggregation with charges and structure breakers also shaped
the specificity for chaperones to recognize these patterns.
These findings are in accordance with the observation that
intrinsically disordered proteins, which have significantly
lower aggregation propensities than globular proteins [27],
also bind less to chaperones [57].
GATEKEEPER MUTATIONS
HUMAN DISEASE
CONTRIBUTE
TO
Multiple Evolutionary Mechanisms Reduce Protein Aggregation
aggregation due to the disruption of a gatekeeper motif was
almost twice as large as the fraction of these mutations found
among SNPs (3.5% of the disease mutations versus 1.9% of
the SNPs). These findings suggest that indeed gatekeeper
residues are crucial for correct protein function and that
disruption of the gatekeeper pattern introduces a risk of
disease.
NEGATIVE SELECTION AGAINST UNWANTED
SELF-ASSOCIATION IN CORRECTLY FOLDED
PROTEINS
There are some examples where proteins with native
structure, more specifically -sheet rich proteins, can form
aggregates through intermolecular interactions of peripheral
-strands [20]. These external -strands propose a risk for
possible self-interaction and thus -aggregation, but the
placement of -bulges, superposition of short loops, helices
or distorted -strands on the peripheral strand, and other
ways to distort the -structure are used to avoid inducing
aggregation of peripheral strands with those of other
molecules [20]. Two types of (functional) self-interactions of
identical or homologous sequences that can be found in
nature are homo-oligomeric complex formation and domain
repeats in multidomain proteins. Especially in the former
case, formation of functional homo-oligomers and nonfunctional aggregates are competing processes, as has been
shown in studies of the C-Src SH3 domain [61]. Using
protein-protein interaction data of fly, yeast and worm, Chen
& Dokholyan showed proteins that have native self-interactions patterns (such as homo-oligomeric complex formation)
have overall lower aggregation scores than proteins without
these patterns [62], suggesting negative selection for aggregation in these proteins. Dobson and co-workers investigated
the multidomain constructs of immunoglobulin domains in
human cardiac titin and the ability of these homologous
domains to co-aggregate [63]. Their conclusion was that the
efficiency of co-aggregation lowers with decreasing
sequence identity, with a lower bound at 30-40% sequence
identity. Further computational analysis of homologous
The Open Biology Journal, 2009, Volume 2
181
domains in large multidomain proteins (i.e. the immunoglobulin and fibronection type III superfamilies) showed that
the sequence identity between repeats remains largely below
this threshold. Comparison of the sequence identity between
adjacent and non-adjacent domain pairs also revealed that
there is a higher evolutionary pressure on adjacent domains:
sequence identity between adjacent pairs is significantly
lower.
DETAILED FEATURE ANALYSIS OF AGGREGATING PROTEINS REVEALS ADDITIONAL SELECTION AGAINST AGGREGATION
Besides the apparent evolutionary pressure related to
folding, gatekeeper patterns, chaperone binding and native
self-interaction, various studies have revealed additional
evidence for selection against aggregation-prone segments.
Simple patterns that favor aggregation, such as the alternation of polar and non-polar stretches are rare in natural
proteins [64]. This is a first example that not only the amino
acid composition itself is a determinant for aggregation
propensity, but also the order of the residues. Further proof
was provided in a detailed study using horse heart apomyoglobin (apoMb) [26]. The core of the amyloid fibrils formed
by apoMb is the region spanning from residue 7 to 18.
Independent from the full length protein, the N-terminal
region of apoMb (residue 1-29) is soluble at neutral pH but
self-assembles into fibrils at pH 2. Keeping the same amino
acid composition and length, four scrambled versions of the
N-terminus were designed and their aggregation properties
were investigated. The naturally occurring sequence is at the
lower boundary of aggregation. Comparing the aggregation
profile of the scrambled sequence with that of 745 peptides
from the globin family homologous to the apoMb Nterminus showed that the former had significantly higher
aggregation tendencies than their natural counterparts,
confirming that the prevention of aggregation has been a
driving force in protein evolution. Another piece of evidence
that corroborates this evolutionary pressure was provided by
investigating the aggregation propensities of essential versus
Fig. (4). Ranking of the aggregation propensity in different subcellular regions in human and yeast. The average aggregation
propensity of proteins in different subcellular locations in Homo sapiens and Saccharomyces cerevisae are very similar: the aggregation
propensity of intracellular districts such as the nucleus and ribosome is much lower than that of secreted proteins or those located in the
endoplasmatic reticulum. Adapted from [10] and [67].
182
The Open Biology Journal, 2009, Volume 2
non-essential proteins in Saccharomyces cerevisiae and
Caenorhabditis elegans [62] . Essential genes were defined
as those genes of which the knockdown led to lethality. Both
in yeast and worm it was shown that essential proteins have
lower aggregation propensity than non-essential ones, which
is consistent with a higher evolutionary pressure on essential
proteins.
EXPRESSION LEVELS AND SUBCELLULAR LOCALIZATION ARE OTHER DETERMINANTS IN PROTECTION AGAINST AGGREGATION
As almost all proteins can be driven to aggregate when
overexpressed in vitro, the high divergence in the expression
levels of proteins in the cell is another determinant in the risk
for aggregation [17]. Vendruscolo et al examined the in vivo
expression levels, as measured by DNA microarray technology, of 12 human proteins of which experimentally determined aggregation rates were available. The expression
levels of these human genes were anti-correlated with the
aggregation rates of the corresponding proteins in vitro [65].
This suggests that polypeptide chains have co-evolved with
their cellular environments to be soluble as far as is needed
to effectively perform their functional role. These results are
in accordance with previous observations that even small
perturbations of expression levels can have dramatic pathological consequences in misfolding diseases [66]. In eukaryotes, not only the individual expression levels of proteins
but also the overall biochemical properties in different cellular compartments can vary greatly. Studies on the differences
in aggregation property between proteins of different subcellular locations in the yeast [67] and human proteome [10]
agree on the observation that the aggregation propensity of
secreted and ER proteins is on average higher than that of
intracellular districts such as the nucleus and ribosome (Fig.
4). This evolutionary pressure against aggregation in cellular
organelles is expected because on the one hand overall
protein concentrations are high in these compartments, and
on the other hand it has been shown that these compartments
contain a large portion of unfolded molecules [68].
Reumers et al.
nuclei, but also global net charge and conservation of well
placed prolines and glycines can limit the aggregation
propensity of a protein. The charged-hydrophobic-charged
pattern that characterizes the regions with high aggregation
propensity flanked by gatekeepers is recognized by molecular chaperones, and optimizes chaperone binding to
potentially dangerous motifs. Negative selection of aggregation-prone regions in multimeric proteins and within protein
families further illustrates the evolutionary pressure against
unwanted self-association. Selective pressure within the cell
can vary between the different cellular compartments, related
to variability in concentration, partial unfolding, and presence of chaperones in these compartments. Modulation of
the protection level against aggregation in these varying
situations can be achieved by combining different types of
protection. This type of redundancy in protection can also
serve as a fail-safe if mutation disrupts one of the protection
mechanisms.
ACKNOWLEDGEMENTS
Joke Reumers was supported by the Institute for the
Encouragement of Scientific Research and Innovation of
Brussels (ISRIB), Belgium.
The VIB Switch laboratory was supported by a grant
from the Federal Office for Scientific Affairs, Belgium
(IUAP P6/43) and the Fund for Scientific Research (FWO
Vlaanderen).
REFERENCES
[1]
[2]
[3]
[4]
[5]
CONCLUSION
Since the biophysical properties underlying the correct
folding of globular proteins and the formation of protein
aggregates are alike, the two processes are inescapably
linked. The combination of the generality of protein aggregation propensity of globular proteins and the putative detrimental effects of protein aggregation on the cell has resulted
in negative selective pressure to minimize aggregation. In
this review we have described the intrinsic features of
protein sequence and structure that keep aggregation in
check. The main contributor to the avoidance of aggregation
is correct folding: aggregation nuclei are buried within the
hydrophobic core of globular proteins. However, during the
lifetime of a protein (partial) unfolding cannot always be
avoided. Charge repulsion and steric hindrance are used to
disrupt the formation of intermolecular -sheets by placing
so-called structural gatekeepers (aspartate, glutamate, lysine,
proline) at the flanks of aggregation nuclei. The use of these
charged residues and structure breakers to minimize aggregation is not only observed at the flanks of aggregation
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Stefani M. Protein misfolding and aggregation: new examples in
medicine and biology of the dark side of the protein world.
Biochim Biophys Acta 2004; 1739(1): 5-25.
Chiti F, Dobson CM. Protein misfolding, functional amyloid, and
human disease. Annu Rev Biochem 2006; 75: 333-66.
Dobson CM. Principles of protein folding, misfolding and
aggregation. Semin Cell Dev Biol 2004; 15(1): 3-16.
Chiti F, Stefani M, Taddei N, Ramponi G, Dobson CM.
Rationalization of the effects of mutations on peptide and protein
aggregation rates. Nature 2003; 424(6950): 805-8.
Conchillo-Sole O, de Groot NS, Aviles FX, et al. AGGRESCAN: a
server for the prediction and evaluation of "hot spots" of
aggregation in polypeptides. BMC Bioinformatics 2007; 8: 65.
Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L.
Prediction of sequence-dependent and mutational effects on the
aggregation of peptides and proteins. Nat Biotechnol 2004; 22(10):
1302-6.
Pawar AP, Dubay KF, Zurdo J, et al. Prediction of ``aggregationprone'' and ``aggregation-susceptible'' regions in proteins associated
with neurodegenerative diseases. J Mol Biol 2005; 350(2): 379-92.
Tartaglia GG, Cavalli A, Pellarin R, Caflisch A. Prediction of
aggregation rate and aggregation-prone segments in polypeptide
sequences. Protein Sci 2005; 14(10): 2723-34.
Trovato A, Seno F, Tosatto SC. The PASTA server for protein
aggregation prediction. Protein Eng Des Sel 2007; 20(10): 521-3.
Monsellier E, Ramazzotti M, Taddei N, Chiti F. Aggregation
propensity of the human proteome. PLoS Comput Biol 2008; 4(10):
e1000199.
Reumers J, Maurer-Stroh S, Schymkowitz J, Rousseau F. Protein
sequences encode safeguards against aggregation. Hum Mutat
2009; 30(3): 431-7.
Rousseau F, Serrano L, Schymkowitz JW. How evolutionary
pressure against protein aggregation shaped chaperone specificity. J
Mol Biol 2006; 355(5): 1037-47.
Monsellier E, Chiti F. Prevention of amyloid-like aggregation as a
driving force of protein evolution. EMBO Rep 2007; 8(8): 737-42.
Multiple Evolutionary Mechanisms Reduce Protein Aggregation
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
Knaupp AS, Bottomley SP. Serpin polymerization and its role in
disease--the molecular basis of alpha1-antitrypsin deficiency.
IUBMB Life 2009; 61(1): 1-5.
Bukau B, Weissman J, Horwich A. Molecular chaperones and
protein quality control. Cell 2006; 125(3): 443-51.
Hartl FU, Hayer-Hartl M. Converging concepts of protein folding
in vitro and in vivo. Nat Struct Mol Biol 2009; 16(6): 574-81.
Dobson CM. Protein misfolding, evolution and disease. Trends
Biochem Sci 1999; 24(9): 329-32.
Clark LA. Protein aggregation determinants from a simplified
model: cooperative folders resist aggregation. Protein Sci 2005;
14(3): 653-62.
Watters AL, Deka P, Corrent C, et al. The highly cooperative
folding of small naturally occurring proteins is likely the result of
natural selection. Cell 2007; 128(3): 613-24.
Richardson JS, Richardson DC. Natural beta-sheet proteins use
negative design to avoid edge-to-edge aggregation. Proc Natl Acad
Sci USA 2002; 99(5): 2754-9.
Rousseau F, Schymkowitz JWH, Wilkinson HR, Itzhaki LS. Threedimensional domain swapping in p13suc1 occurs in the unfolded
state and is controlled by conserved proline residues. Proc Natl
Acad Sci USA 2001; 98(10): 5596-601.
Liu Y, Eisenberg D. 3D domain swapping: As domains continue to
swap. Protein Sci 2002; 11(6): 1285-99.
Liu Y, Gotte G, Libonati M, Eisenberg D. Structures of the two 3D
domain-swapped RNase A trimers. Protein Sci 2002; 11(2): 37180.
Uversky VN, Fink AL. Conformational constraints for amyloid
fibrillation: the importance of being unfolded. Biochim Biophys
Acta 2004; 1698(2): 131-53.
Sanchez IE, Tejero J, Gomez-Moreno C, Medina M, Serrano L.
Point mutations in protein globular domains: contributions from
function, stability and misfolding. J Mol Biol 2006; 363(2): 422-32.
Monsellier E, Ramazzotti M, de Laureto PP, et al. The distribution
of residues in a polypeptide sequence is a determinant of
aggregation optimized by evolution. Biophys J 2007; 93(12): 438291.
Linding R, Schymkowitz J, Rousseau F, Diella F, Serrano L. A
comparative study of the relationship between protein structure and
beta-aggregation in globular and intrinsically disordered proteins. J
Mol Biol 2004; 342(1): 345-53.
Abeln S, Frenkel D. Disordered flanks prevent peptide aggregation.
PLoS Comput Biol 2008; 4(12): e1000241.
Uversky VN, Fernandez A, Fink AL. Structural and conformational
prerequisites for amyloidogenesis. In: Uversky VN, Fink AL, Eds.
Protein misfolding, aggregation, and conformational diseases.
Berlin: Springer Verlag 2006.
Hammarstrom P, Jiang X, Hurshman AR, Powers ET, Kelly JW.
Sequence-dependent denaturation energetics: A major determinant
in amyloid disease diversity. Proc Natl Acad Sci USA 2002; 99:
16427-32.
Wang Q, Johnson JL, Agar NY, Agar JN. Protein aggregation and
protein instability govern familial amyotrophic lateral sclerosis
patient survival. PLoS Biol 2008; 6(7): e170.
Lindberg MJ, Bystrom R, Boknas N, Andersen PM, Oliveberg M.
Systematically perturbed folding patterns of amyotrophic lateral
sclerosis (ALS)-associated SOD1 mutants. Proc Natl Acad Sci
USA 2005; 102(28): 9754-9.
Chiti F, Bucciantini M, Capanni C, et al. Solution conditions can
promote formation of either amyloid protofilaments or mature
fibrils from the HypF N-terminal domain. Protein Sci 2001; 10(12):
2541-7.
Calloni G, Zoffoli S, Stefani M, Dobson CM, Chiti F. Investigating
the effects of mutations on protein aggregation in the cell. J Biol
Chem 2005; 280(11): 10607-13.
Ramirez-Alvarado M, Merkel JS, Regan L. A systematic exploration of the influence of the protein stability on amyloid fibril
formation in vitro. Proc Natl Acad Sci USA 2000; 97(16): 8979-84.
Chiti F, Taddei N, Bucciantini M, et al. Mutational analysis of the
propensity for amyloid formation by a globular protein. EMBO J
2000; 19(7): 1441-9.
Otzen DE, Oliveberg M. Salt-induced detour through compact
regions of the protein folding landscape. Proc Natl Acad Sci USA
1999; 96(21): 11746-51.
The Open Biology Journal, 2009, Volume 2
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
183
Otzen DE, Kristensen O, Oliveberg M. Designed protein tetramer
zipped together with a hydrophobic Alzheimer homology: a
structural clue to amyloid assembly. Proc Natl Acad Sci USA
2000; 97(18): 9907-12.
Dehay B, Bertolotti A. Critical role of the proline-rich region in
Huntingtin for aggregation and cytotoxicity in yeast. J Biol Chem
2006; 281(47): 35608-15.
Steward A, Adhya S, Clarke J. Sequence conservation in Ig-like
domains: the role of highly conserved proline residues in the
fibronectin type III superfamily. J Mol Biol 2002; 318(4): 935-40.
Parrini C, Taddei N, Ramazzotti M, et al. Glycine residues appear
to be evolutionarily conserved for their ability to inhibit
aggregation. Structure 2005; 13(8): 1143-51.
Chiti F, Calamai M, Taddei N, et al. Studies of the aggregation of
mutant proteins in vitro provide insights into the genetics of
amyloid diseases. Proc Natl Acad Sci USA 2002; 99: 16419-26.
Sandelin E, Nordlund A, Andersen PM, Marklund SS, Oliveberg
M. Amyotrophic lateral sclerosis-associated copper/zinc superoxide
dismutase mutations preferentially reduce the repulsive charge of
the proteins. J Biol Chem 2007; 282(29): 21230-6.
Calamai M, Tartaglia GG, Vendruscolo M, Chiti F, Dobson CM.
Mutational Analysis of the Aggregation-Prone and DisaggregationProne Regions of Acylphosphatase. J Mol Biol 2009; 387: 965-74.
Fowler SB, Poon S, Muff R, et al. Rational design of aggregationresistant bioactive peptides: reengineering human calcitonin. Proc
Natl Acad Sci USA 2005; 102(29): 10105-10.
Tompa P. Intrinsically unstructured proteins. Trends Biochem Sci
2002; 27(10): 527-33.
Uversky VN. Natively unfolded proteins: a point where biology
waits for physics. Protein Sci 2002; 11(4): 739-56.
Lawrence MS, Phillips KJ, Liu DR. Supercharging proteins can
impart unusual resilience. J Am Chem Soc 2007; 129(33): 10110-2.
Vendruscolo M, Dobson CM. Chemical biology: More charges
against aggregation. Nature 2007 Oct 4; 449(7162): 555.
Rudiger S, Germeroth L, SchneiderMergener J, Bukau B. Substrate
specificity of the DnaK chaperone determined by screening
cellulose-bound peptide libraries. EMBO J 1997; 16(7): 1501-7.
Rudiger S, Mayer MP, Schneider-Mergener J, Bukau B.
Modulation of substrate specificity of the DnaK chaperone by
alteration of a hydrophobic arch. J Mol Biol 2000; 304(3): 245-51.
Xu W, Yuan X, Xiang Z, et al. Surface charge and hydrophobicity
determine ErbB2 binding to the Hsp90 chaperone complex. Nat
Struct Mol Biol 2005; 12(2): 120-6.
Knoblauch NT, Rudiger S, Schonfeld HJ, et al. Substrate
specificity of the SecB chaperone. J Biol Chem 1999; 274(48):
34219-25.
Patzelt H, Rudiger S, Brehmer D, et al. Binding specificity of
Escherichia coli trigger factor. Proc Natl Acad Sci USA 2001;
98(25): 14244-9.
Schlieker C, Weibezahn J, Patzelt H, et al. Substrate recognition by
the AAA+ chaperone ClpB. Nat Struct Mol Biol 2004; 11(7): 60715.
Tang Y-C, Chang H-C, Roeben A, et al. Structural features of the
GroEL-GroES nano-cage required for rapid folding of encapsulated
protein. Cell 2006; 125(5): 903-14.
Hegyi H, Tompa P. Intrinsically disordered proteins display no
preference for chaperone binding in vivo. PLoS Comput Biol 2008;
4(3): e1000017.
von Bergen M, Barghorn S, Li L, et al. Mutations of tau protein in
frontotemporal dementia promote aggregation of paired helical
filaments by enhancing local beta-structure. J Biol Chem 2001;
276(51): 48165-74.
Hardy J. Testing times for the amyloid cascade hypothesis.
Neurobiol Aging 2002; 23(6): 1073-4.
Conway KA, Harper JD, Lansbury PT, Jr. Fibrils formed in vitro
from alpha-synuclein and two mutant forms linked to Parkinson's
disease are typical amyloid. Biochemistry 2000; 39(10): 2552-63.
Ding F, Dokholyan NV, Buldyrev SV, Stanley HE, Shakhnovich
EI. Molecular dynamics simulation of the SH3 domain aggregation
suggests a generic amyloidogenesis mechanism. J Mol Biol 2002;
324(4): 851-7.
Chen Y, Dokholyan NV. Natural selection against protein
aggregation on self-interacting and essential proteins in yeast, fly,
and worm. Mol Biol Evol 2008; 25(8): 1530-3.
184
[63]
[64]
[65]
The Open Biology Journal, 2009, Volume 2
Reumers et al.
Wright CF, Teichmann SA, Clarke J, Dobson CM. The importance
of sequence diversity in the aggregation and evolution of proteins.
Nature 2005; 438(7069): 878-81.
Broome BM, Hecht MH. Nature disfavors sequences of alternating
polar and non-polar amino acids: implications for amyloidogenesis.
J Mol Biol 2000; 296(4): 961-8.
Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M. Life on
the edge: a link between gene expression levels and aggregation
rates of human proteins. Trends Biochem Sci 2007; 32(5): 204-6.
Received: April 21, 2009
[66]
[67]
[68]
Revised: July 07, 2009
Lansbury PT, Lashuel HA. A century-old debate on protein
aggregation and neurodegeneration enters the clinic. Nature 2006;
443(7113): 774-9.
Tartaglia GG, Caflisch A. Computational analysis of the S.
cerevisiae proteome reveals the function and cellular localization of
the least and most amyloidogenic proteins. Proteins 2007; 68(1):
273-8.
Hageman J, Vos MJ, van Waarde MA, Kampinga HH. Comparison
of intra-organellar chaperone capacity for dealing with stressinduced protein unfolding. J Biol Chem 2007; 282(47): 34334-45.
Accepted: July 09, 2009
© Reumers et al.; Licensee Bentham Open.
This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/bync/3.0/), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.