Format

Send to

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2012 Jun 21;13:141. doi: 10.1186/1471-2105-13-141.

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.

Author information

1
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA.

Abstract

BACKGROUND:

Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.

DESCRIPTION:

We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank.

CONCLUSIONS:

The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.

PMID:
22720753
PMCID:
PMC3410781
DOI:
10.1186/1471-2105-13-141
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Support Center