[Prediction of protein domain boundaries based on statistics of appearance of amino acid residues]

Mol Biol (Mosk). 2006 Jan-Feb;40(1):111-21.
[Article in Russian]

Abstract

We have created a database of two-domain proteins with homology less than 25% (452 proteins). Based on one half of this set of proteins statistics of appearance of amino acid residues on the domain boundaries of multiple domain proteins has been obtained. Small and hydrophilic amino acids (proline, glycine, asparagine, glutamic acid, arginine and others) appear on the domain boundaries more often than in the whole protein. Opposite, hydrophobic amino acid residues (tryptophane, methionine, phenylalanine and others) appear on the domain boundaries more rarely. The obtained scales of the appearance of amino acid residues on the boundary regions from the statistics have been used for calculation of domain boundaries in the proteins of the second half of the database. The probability scale obtained by averaging the appearance of amino acid residues on the domain boundary region including 8 residues (+/-4 residues from the real domain boundary) gives the best result: for 57% of proteins the predicted boundary was closer than 40 residues to the boundary assigned from three-dimensional structures, for 41% it was closer than 20 residues from the real boundary. The probability scale was used to predict domain boundaries for proteins with unknown three-dimensional structure (international competition CASP6).

Publication types

  • English Abstract

MeSH terms

  • Algorithms
  • Amino Acids / chemistry*
  • Computational Biology
  • Databases, Protein
  • Models, Molecular*
  • Protein Structure, Tertiary*
  • Sequence Analysis, Protein

Substances

  • Amino Acids