Format

Send to

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 1998 Jun;14(5):430-8.

A set-theoretic approach to database searching and clustering.

Author information

1
Deutsches Krebsforschungszentrum (DKFZ), Theoretische Bioinformatik, Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany. a.krause@dkfz-heidelberg.de

Abstract

MOTIVATION:

In this paper, we introduce an iterative method of database searching and apply it to design a database clustering algorithm applicable to an entire protein database. The clustering procedure relies on the quality of the database searching routine and further improves its results based on a set-theoretic analysis of a highly redundant yet efficient to generate cluster system.

RESULTS:

Overall, we achieve unambiguous assignment of 80% of SWISS-PROT sequences to non-overlapping sequence clusters in an entirely automatic fashion. Our results are compared to an expert-generated clustering for validation. The database searching method is fast and the clustering technique does not require time-consuming all-against-all comparison. This allows for fast clustering of large amounts of sequences.

AVAILABILITY:

The resulting clustering for the PIR1 (Release 51) and SWISS-PROT (Release 34) databases is available over the Internet from http://www.dkfz-heidelberg.de/tbi/services/modest/b rowsesysters.pl.

CONTACT:

a.krause@dkfz-heidelberg.de; m.vingron@dkfz-heidelberg.de

PMID:
9682056
[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Loading ...
    Support Center