PFIT and PFRIT: bioinformatic algorithms for detecting glycosidase function from structure and sequence

Protein Sci. 2004 Jan;13(1):221-9. doi: 10.1110/ps.03274104.

Abstract

The identification of the enzymes involved in the metabolism of simple and complex carbohydrates presents one bioinformatic challenge in the post-genomic era. Here, we present the PFIT and PFRIT algorithms for identifying those proteins adopting the alpha/beta barrel fold that function as glycosidases. These algorithms are based on the observation that proteins adopting the alpha/beta barrel fold share positions in their tertiary structures having equivalent sets of atomic interactions. These are conserved tertiary interaction positions, which have been implicated in both structure and function. Glycosidases adopting the alpha/beta barrel fold share more conserved tertiary interactions than alpha/beta barrel proteins having other functions. The enrichment pattern of conserved tertiary interactions in the glycosidases is the information that PFIT and PFRIT use to predict whether any given alpha/beta barrel will function as a glycosidase or not. Using as a test set a database of 19 glycosidase and 45 nonglycosidase alpha/beta barrel proteins with low sequence similarity, PFIT and PFRIT can correctly predict glycosidase function for 84% of the proteins known to function as glycosidases. PFIT and PFRIT incorrectly predict glycosidase function for 25% of the nonglycosidases. The program PSI-BLAST can also correctly identify 84% of the 19 glycosidases, however, it incorrectly predicts glycosidase function for 50% of the nonglycosidases (twofold greater than PFIT and PFRIT). Overall, we demonstrate that the structure-based PFIT and PFRIT algorithms are both more selective and sensitive for predicting glycosidase function than the sequence-based PSI-BLAST algorithm.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Computational Biology*
  • Conserved Sequence
  • Databases, Protein
  • Evolution, Molecular
  • Genomics
  • Glycoside Hydrolases / chemistry*
  • Glycoside Hydrolases / genetics
  • Glycoside Hydrolases / metabolism*
  • Hydrogen Bonding
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Folding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Sequence Homology, Amino Acid
  • Structure-Activity Relationship

Substances

  • Amino Acids
  • Glycoside Hydrolases