• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of molbiolevolLink to Publisher's site
Mol Biol Evol. Jan 2011; 28(1): 59–62.
Published online Oct 29, 2010. doi:  10.1093/molbev/msq291
PMCID: PMC3108607

Evolution of Structurally Disordered Proteins Promotes Neostructuralization

Abstract

Protein structure is generally more conserved than sequence, but for regions that can adopt different structures in different environments, does this hold true? Understanding how structurally disordered regions evolve altered secondary structure element propensities as well as conformational flexibility among paralogs are fundamental questions for our understanding of protein structural evolution. We have investigated the evolutionary dynamics of structural disorder in protein families containing both orthologs and paralogs using phylogenetic tree reconstruction, protein structure disorder prediction, and secondary structure prediction in order to shed light upon these questions. Our results indicate that the extent and location of structurally disordered regions are not universally conserved. As structurally disordered regions often have high conformational flexibility, this is likely to have an effect on how protein structure evolves as spatially altered conformational flexibility can also change the secondary structure propensities for homologous regions in a protein family.

Keywords: protein structure evolution, structural disorder, gene duplication, neostructuralization, adaptive evolution, conformational ensemble

Introduction

Protein structure is generally regarded to be more conserved than sequence (Chothia and Lesk 1986). High sequence divergence, far beyond detectable sequence similarity, often still results in the same protein structure or fold (Orengo et al. 1994). The concept of protein structure conservation among homologous proteins provides the foundation for three dimensional protein structure modeling. However, not all homologous proteins have a conserved structure (Grishin 2001), but how frequent these structural swaps are remains to be determined. Moreover, many proteins are found in different conformational states regulated by an allosteric effector. Traditionally, allosteric proteins have been viewed as having two states, a tense and a relaxed state (Monod et al. 1965; Koshland et al. 1966). Recent studies of thermodynamics have improved our understanding of allosterism to include a model of different conformational populations in equilibrium, where the equilibrium can be shifted in response to allosteric signals (Gunasekaran et al. 2004). However, the redistribution of conformations is not exclusive for allostery, but a physical property of proteins (Gunasekaran et al. 2004). If viewed from the folding energy landscape, the conformational dynamics increase from globular proteins with a well-defined global minimum to those that are present as highly dynamic ensembles of interconverting conformational states separated by low-energy barriers, as the intrinsically disordered proteins (IDPs) are (Turoverov et al. 2010). The IDPs are (fully or partly) structurally disordered. IDPs are prone to adopt different conformations in different environments. Different conformational states are favored in interactions with different structural scaffolds, and posttranslational modifications are often involved in regulating conformational ensembles. An example of conformational flexibility was recently shown for p53; the same sequence stretch can adopt a β-sheet, an α-helix, or two distinct coils when interacting with different proteins (Oldfield et al. 2008). Disordered regions within proteins show variation of sequence conservation (Chen et al. 2006), and the fraction of structurally disordered protein (disorder length >30) increases from a few percent in prokaryotes to one-third in multicellular eukaryotes (Ward et al. 2004). So far, the main focus of studying IDPs has been centered on the conserved disordered regions, whereas the regions that are not conserved have been ignored. The fundamental properties of structurally disordered regions make them interesting candidates for studying how protein structure evolves, especially in regard to neostructuralization. We have investigated the evolutionary dynamics of structural disorder in two different protein families after gene duplication. Gene duplication yields functional redundancy, and for retained duplicate genes, subfunctionalization and neofunctionalization are common events (He and Zhang 2005; Hughes and Liberles 2007). These events enable the possibility of retaining a certain set of the conformational states, especially if the duplicated gene has many roughly equally populated conformations. The proteins studied are the clusterin and synuclein protein families.

Patterns of Structural Disorder among Paralogs Show Variation

The clusterin protein family is a molten globule-like protein, which has been found to be partially disordered in mammals (Bailey et al. 2001). Furthermore, clusterin is a functionally promiscuous protein (Wilson and Easterbrook-Smith 2000). Phylogenetically, clusterin is a vertebrate specific protein that has undergone at least one gene duplication in early vertebrates, resulting in two, now distantly related, copies, clusterin and the clusterin-like protein (fig. 1A). These paralogs show two distinct patterns of protein structure disorder, but within each orthologous family, the patterns of disorder are similar (fig. 1B). The regions that show high disorder in either clusterin or clusterin-like tend to vary between the two groups, whereas low disorder propensity regions tend to be more conserved. Secondary structure prediction reveals conserved secondary structures within each paralogous group, with exceptions only from Danio rerio (supplementary fig. S3, Supplementary Material online). When comparing disorder and secondary structure propensity changes, these appear to change in concert (fig. 1B). The region between 151 and 201 shows higher disorder in clusterin-like and an additional β-strand and α-helix are predicted in clusterin. Similarly, the region around 251 shows higher disorder in clusterin-like, whereas a β-strand is predicted in clusterin. The region from 301 to 351 has had two indel events and high disorder is predicted for both paralogs, and there are differences in the secondary elements predicted for the different paralogs in this region.

FIG. 1.
The clusterin protein family has two paralogs, clusterin (C) and clusterin-like (CL). The phylogenetic tree (A) is well supported, and all posterior probabilities are 1.0 unless denoted asterisks where the posterior probability ranges between 0.54 and ...

Conserved Pattern of Structural Disorder but Variation in Secondary Structure Element Prediction among Paralogs

The synuclein protein family consists of three paralogs, alpha (αS), beta (βS), and gamma (γS). All human synucleins are fully disordered (Uversky 2003; Bertoncini et al. 2007; Singh et al. 2007). Similar to the clusterin family, αS, βS, and γS are found throughout the vertebrate clade (fig. 2A). The synucleins show fairly similar patterns of protein structure disorder for all paralogs (fig. 2B), yet interesting differences appear in the predicted secondary structures. Conserved secondary structures are predicted for each paralogous group, but there are differences between groups (fig. 2). From 66 to 86, γS has a long α-helical segment, whereas αS has β-strands and a short α-helix, and βS has a small β-strand in addition to a gap. Similar differences are seen for the region around 110–120.

FIG. 2.
The synuclein protein family has three paralogs, α-synuclein (αS), β-synuclein (βS), and γ-synuclein (γS). The phylogenetic tree (A) is well supported, here midpoint rooted (Zmasek and Eddy 2001). The disorder ...

Concluding Remarks

These results imply that disordered regions are not universally conserved but interconvert with secondary structural elements on evolutionary timescales. As regions of structural disorder often are involved in protein–protein interactions, frequently mediated by posttranslational modifications (reviewed in Gsponer and Babu 2009), changes in disorder propensity indicate functional differences between paralogs. Nonconserved structural disorder in domain–domain linkers yields possibilities for alternative domain–domain packing in multidomain proteins, whereas variation of structural disorder within a domain increases the chance of fold transitions or secondary structural element propensity changes among homologs. Here, we aimed to focus on the regions with changes in secondary structure propensity that occur in the nonconserved disordered regions. However, also in the seemingly conserved disorder patterns, we find the same secondary structure transitions, indicating that similar patterns of disorder may not correspond to the same type of conformational interplay. This study is based upon computational predictions of secondary structure and disorder, using methods approaching 80% accuracy (Cole et al. 2008; Szappanos et al. 2010). Although this study is too limited for identifying general trends that these transitions are found in both protein families analyzed implies the importance of structural disorder for protein structure evolution. This research opens the ground for important questions about the coevolution of posttranslational modification and disorder as well as the role of gene duplication in altering the selective pressures on conformational ensembles.

Methods

Based on two sequences from the Disprot database of experimentally verified disordered proteins (Sickmeier et al. 2007), clusterin from Rattus norvegicus and αS from Homo sapiens, two data sets were gathered by Blast against selected genomes in the RefSeq protein database. Both data sets were aligned using Muscle (Edgar 2004). A test for evolutionary model was performed for both data sets in Prottest v2.4 (Abascal et al. 2005) with all substitution matrices there implemented and with the possible combinations of invariant sites, a gamma distribution with four rate categories, and observed amino acid frequencies. Phylogenies were reconstructed using MrBayes (Ronquist and Huelsenbeck 2003) with the JTT substitution matrix (Jones et al. 1992) and gamma as proposed by Prottest to be the best evolutionary model according to the Akaike's information criterion (Akaike 1973). Both data sets ran for 50 million generations (2 by 4 chains). The consensus trees were built with a default burnin phase, disregarding the first 25% of the samples. Protein structure disorder was predicted using IUPred (Dosztanyi et al. 2005) (disorder length >30), and secondary structure was predicted using Jpred (Cole et al. 2008) as implemented in JalView (Waterhouse et al. 2009).

Supplementary Material

Supplementary figures S1S8 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Supplementary Data:

Acknowledgments

Thanks to David Liberles for carefully reading the manuscript. This project was supported by Award Number P20RR016474 from the National Center for Research Resources. The content is solely the responsibility of the author and does not necessarily represent the official views of the National Center for Research Resources or the National Institutes of Health.

References

  • Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21:2104–2105. [PubMed]
  • Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki BF, editors. Proceedings of 2nd International Symposium on Information Theory. Budapest (Hungary): 1973. pp. 267–281.
  • Bailey RW, Dunker AK, Brown CJ, Garner EC, Griswold MD. Clusterin, a binding protein with a molten globule-like region. Biochemistry. 2001;40:11828–11840. [PubMed]
  • Bertoncini CW, Rasia RM, Lamberto GR, Binolfi A, Zweckstetter M, Griesinger C, Fernandez CO. Structural characterization of the intrinsically unfolded protein beta-synuclein, a natural negative regulator of alpha-synuclein aggregation. J Mol Biol. 2007;372:708–722. [PubMed]
  • Chen JW, Romero P, Uversky VN, Dunker AK. Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions. J Proteome Res. 2006;5:879–887. [PMC free article] [PubMed]
  • Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. Embo J. 1986;5:823–826. [PMC free article] [PubMed]
  • Cole C, Barber JD, Barton GJ. The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 2008;36:W197–W201. [PMC free article] [PubMed]
  • Dosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics. 2005;21:3433–3434. [PubMed]
  • Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. [PMC free article] [PubMed]
  • Grishin NV. Fold change in evolution of protein structures. J Struct Biol. 2001;134:167–185. [PubMed]
  • Gsponer J, Babu MM. The rules of disorder or why disorder rules. Prog Biophys Mol Biol. 2009;99:94–103. [PubMed]
  • Gunasekaran K, Ma B, Nussinov R. Is allostery an intrinsic property of all dynamic proteins? Proteins. 2004;57:433–443. [PubMed]
  • He X, Zhang J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005;169:1157–1164. [PMC free article] [PubMed]
  • Hughes T, Liberles DA. The pattern of evolution of smaller-scale gene duplicates in mammalian genomes is more consistent with neo- than subfunctionalisation. J Mol Evol. 2007;65:574–588. [PubMed]
  • Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992;8:275–282. [PubMed]
  • Koshland DE, Jr., Nemethy G, Filmer D. Comparison of experimental binding data and theoretical models in proteins containing subunits. Biochemistry. 1966;5:365–385. [PubMed]
  • Monod J, Wyman J, Changeux JP. On the nature of allosteric transitions: a plausible model. J Mol Biol. 1965;12:88–118. [PubMed]
  • Oldfield CJ, Meng J, Yang JY, Yang MQ, Uversky VN, Dunker AK. Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners. BMC Genomics. 2008;9(Suppl 1):S1. [PMC free article] [PubMed]
  • Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature. 1994;372:631–634. [PubMed]
  • Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. [PubMed]
  • Sickmeier M, Hamilton JA, LeGall T, et al. (12 co-authors) DisProt: the database of disordered proteins. Nucleic Acids Res. 2007;35:D786–D793. [PMC free article] [PubMed]
  • Singh VK, Zhou Y, Marsh JA, Uversky VN, Forman-Kay JD, Liu J, Jia Z. Synuclein-gamma targeting peptide inhibitor that enhances sensitivity of breast cancer cells to antimicrotubule drugs. Cancer Res. 2007;67:626–633. [PubMed]
  • Szappanos B, Suveges D, Nyitray L, Perczel A, Gaspari Z. Folded-unfolded cross-predictions and protein evolution: the case study of coiled-coils. FEBS Lett. 2010;584:1623–1627. [PubMed]
  • Turoverov KK, Kuznetsova IM, Uversky VN. The protein kingdom extended: ordered and intrinsically disordered proteins, their folding, supramolecular complex formation, and aggregation. Prog Biophys Mol Biol. 2010;102:73–84. [PMC free article] [PubMed]
  • Uversky VN. A protein-chameleon: conformational plasticity of alpha-synuclein, a disordered protein involved in neurodegenerative disorders. J Biomol Struct Dyn. 2003;21:211–234. [PubMed]
  • Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004;337:635–645. [PubMed]
  • Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. [PMC free article] [PubMed]
  • Wilson MR, Easterbrook-Smith SB. Clusterin is a secreted mammalian chaperone. Trends Biochem Sci. 2000;25:95–98. [PubMed]
  • Zmasek CM, Eddy SR. ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics. 2001;17:383–384. [PubMed]

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...