Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2014 Jan 1;30(1):38-9. doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

Author information

  • 1Singapore Centre on Environmental Life Sciences Engineering, School of Biological Sciences, Nanyang Technological University, Singapore 637551, Center for Bioinformatics, University of Tübingen, 72076 Tübingen, Germany and Life Sciences Institute, National University of Singapore, Singapore 117456.



In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ~10,000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800,000 CPU hours, leading to the same clustering of samples by functional profiles.


PAUDA is freely available from: Also supplementary method details are available from this website.

[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Write to the Help Desk