Covariation analysis of local amino acid sequences in recurrent protein local structures

J Bioinform Comput Biol. 2005 Dec;3(6):1391-409. doi: 10.1142/s0219720005001648.

Abstract

Local structural information is supposed to be frequently encoded in local amino acid sequences. Previous research only indicated that some local structure positions have specific residue preferences in some particular local structures. However, correlated pairwise replacements for interacting residues in recurrent local structural motifs from unrelated proteins have not been studied systematically. We introduced a new method fusing statistical covariation analysis and local structure-based alignment. Systematic analysis of structure-based multiple alignments of recurrent local structures from unrelated proteins in representative subset of Protein Databank indicates that covarying residue pairs with statistical significance exist in local structural motifs, in particular beta-turns and helix caps. These residue pairs are mostly linked through polar functional groups with direct or indirect hydrogen bonding. Hydrophobic interaction is also a major factor in constraining pairwise amino acid residue replacement in recurrent local structures. We also found correlated residue pairs that are not clearly linked with through-space interactions. The physical constrains underlying these covariations are less clear. Overall, covarying residue pairs with statistical significance exist in local structures from unrelated proteins. The existence of sequence covariations in local structural motifs from unrelated proteins indicates that many relics of local relations are still retained in the tertiary structures after protein folding. It supports the notion that some local structural information is encoded in local sequences and the local structural codes could play important roles in determining native state protein folding topology.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computer Simulation
  • Models, Chemical*
  • Models, Molecular*
  • Molecular Sequence Data
  • Protein Conformation
  • Proteins / analysis
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*
  • Statistics as Topic

Substances

  • Proteins