Markov models of amino acid substitution to study proteins with intrinsically disordered regions

PLoS One. 2011;6(5):e20488. doi: 10.1371/journal.pone.0020488. Epub 2011 May 27.

Abstract

Background: Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.

Results: Two empirical Markov models of amino acid substitutions were estimated, based on a large set of multiple sequence alignments with experimentally verified annotations of disordered regions from the DisProt database of IDPs. We applied new methods to detect differences in Markovian evolution and evolutionary rates between IDRs and the corresponding ordered protein regions. Further, we investigated the distribution of IDPs among functional categories, biochemical pathways and their preponderance to contain tandem repeats.

Conclusions: We find significant differences in the evolution between ordered and disordered regions of proteins. Most importantly we find that disorder promoting amino acids are more conserved in IDRs, indicating that in some cases not only amino acid composition but the specific sequence is important for function. This conjecture is also reinforced by the observation that for of our data set IDRs evolve more slowly than the ordered parts of the proteins, while we still support the common view that IDRs in general evolve more quickly. The improvement in model fit indicates a possible improvement for various types of analyses e.g. de novo disorder prediction using a phylogenetic Hidden Markov Model based on our matrices showed a performance similar to other disorder predictors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Substitution*
  • Animals
  • Evolution, Molecular
  • Glycine N-Methyltransferase / chemistry
  • Glycine N-Methyltransferase / genetics
  • Glycine N-Methyltransferase / metabolism
  • Markov Chains*
  • Mice
  • Models, Molecular
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / genetics*
  • Proteins / metabolism
  • Rats
  • Suppressor of Cytokine Signaling 3 Protein
  • Suppressor of Cytokine Signaling Proteins / chemistry
  • Suppressor of Cytokine Signaling Proteins / genetics
  • Suppressor of Cytokine Signaling Proteins / metabolism
  • Tandem Repeat Sequences

Substances

  • Proteins
  • Socs3 protein, mouse
  • Suppressor of Cytokine Signaling 3 Protein
  • Suppressor of Cytokine Signaling Proteins
  • Glycine N-Methyltransferase
  • Gnmt protein, rat