Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs

Appl Environ Microbiol. 2013 Jun;79(11):3380-91. doi: 10.1128/AEM.03803-12. Epub 2013 Mar 22.

Abstract

Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics*
  • Base Sequence
  • Catalytic Domain / genetics*
  • Conserved Sequence / genetics
  • DNA Primers / genetics
  • Databases, Protein
  • Electrophoresis, Agar Gel
  • Fungi / enzymology*
  • Glycoside Hydrolases / classification*
  • Glycoside Hydrolases / genetics*
  • Glycoside Hydrolases / physiology*
  • Likelihood Functions
  • Models, Genetic
  • Molecular Sequence Data
  • Polymerase Chain Reaction
  • Sequence Analysis, DNA / methods

Substances

  • DNA Primers
  • Glycoside Hydrolases

Associated data

  • GENBANK/HF565034
  • GENBANK/HF565035
  • GENBANK/HF565036
  • GENBANK/HF565037
  • GENBANK/HF565038
  • GENBANK/HF565039
  • GENBANK/HF565040
  • GENBANK/HF565041
  • GENBANK/HF565042
  • GENBANK/HF565043
  • GENBANK/HF565044
  • GENBANK/HF565045
  • GENBANK/HF565046
  • GENBANK/HF565047