Display Settings:

Format

Send to:

Choose Destination
    BMC Bioinformatics. 2005 Jan 22;6:15.

    Large scale hierarchical clustering of protein sequences.

    Source

    Max Planck Institute for Molecular Genetics, Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin, Germany. akrause@igw.tfh-wildau.de

    Abstract

    BACKGROUND:

    Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.

    RESULTS:

    We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at http://systers.molgen.mpg.de/.

    CONCLUSIONS:

    Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.

    PMID:
    15663796
    [PubMed - indexed for MEDLINE]
    PMCID:
    PMC547898
    Free PMC Article

    Images from this publication.See all images (5) Free text

    Figure 2
    Figure 4
    Figure 1
    Figure 3
    Figure 5

      Supplemental Content

      Icon for BioMed Central Icon for PubMed Central

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk