Send to

Choose Destination
See comment in PubMed Commons below
Nucleic Acids Res. 2011 Nov 1;39(20):8792-802. doi: 10.1093/nar/gkr576. Epub 2011 Jul 19.

Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies.

Author information

  • 1Department of Ocean Sciences, University of California, Santa Cruz, CA 95064, USA.


In the course of analyzing 9,522,746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create a spurious protein family. Detailed examination of the annotation history of each seed sequence in the spurious Pfam protein family (PF10695, 'Cw-hydrolase') uncovered issues in the standard operating procedures and quality assurance programs of major sequencing centers, and other issues relating to the curation practices of those managing public databases such as GenBank and SwissProt. We offer recommendations for all these issues, and recommend as well that workers in the field of metatranscriptomics take extra care to avoid including false positive matches in their datasets.

[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Support Center