Format

Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2001 Mar;17(3):282-3.

Clustering of highly homologous sequences to reduce the size of large protein databases.

Author information

  • 1San Diego Supercomputer Center, La Jolla, CA 92093, USA. liwz@sdsc.edu

Abstract

We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560,000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.

PMID:
11294794
[PubMed - indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire
    Loading ...
    Write to the Help Desk