The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity

PLoS One. 2012;7(3):e34064. doi: 10.1371/journal.pone.0034064. Epub 2012 Mar 29.

Abstract

New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Computers
  • Genetic Variation
  • Genome
  • Internet
  • Likelihood Functions
  • Markov Chains
  • Peptide Biosynthesis, Nucleic Acid-Independent
  • Peptide Synthases / genetics
  • Peptide Synthases / metabolism
  • Phylogeny
  • Polyketide Synthases / genetics
  • Polyketide Synthases / metabolism
  • Protein Structure, Tertiary
  • Sequence Analysis, DNA
  • Soil Microbiology
  • Streptomyces / metabolism

Substances

  • Polyketide Synthases
  • Peptide Synthases
  • non-ribosomal peptide synthase