Woods: A fast and accurate functional annotator and classifier of genomic and metagenomic sequences

Genomics. 2015 Jul;106(1):1-6. doi: 10.1016/j.ygeno.2015.04.001. Epub 2015 Apr 8.

Abstract

Functional annotation of the gigantic metagenomic data is one of the major time-consuming and computationally demanding tasks, which is currently a bottleneck for the efficient analysis. The commonly used homology-based methods to functionally annotate and classify proteins are extremely slow. Therefore, to achieve faster and accurate functional annotation, we have developed an orthology-based functional classifier 'Woods' by using a combination of machine learning and similarity-based approaches. Woods displayed a precision of 98.79% on independent genomic dataset, 96.66% on simulated metagenomic dataset and >97% on two real metagenomic datasets. In addition, it performed >87 times faster than BLAST on the two real metagenomic datasets. Woods can be used as a highly efficient and accurate classifier with high-throughput capability which facilitates its usability on large metagenomic datasets.

Keywords: Functional annotation; Machine learning; Metagenome; Random Forest.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genomics / methods*
  • Humans
  • Machine Learning*
  • Metagenomics / methods*
  • Molecular Sequence Annotation / methods*
  • Proteins / chemistry
  • Proteins / classification
  • Proteins / genetics*
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Proteins