Bioinformatic flowchart and database to investigate the origins and diversity of clan AA peptidases

Biol Direct. 2009 Jan 27:4:3. doi: 10.1186/1745-6150-4-3.

Abstract

Background: Clan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity.

Results: In this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database http://gydb.uv.es/gydb/phylogeny.php?tree=caard.

Conclusion: The pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Aspartic Acid Endopeptidases / chemistry
  • Aspartic Acid Endopeptidases / genetics*
  • Computational Biology / methods*
  • Consensus Sequence
  • Databases, Protein*
  • Genetic Variation*
  • Markov Chains
  • Molecular Sequence Data
  • Phylogeny*
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Sequence Analysis, Protein
  • Software Design*
  • Templates, Genetic

Substances

  • Aspartic Acid Endopeptidases