Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
Pac Symp Biocomput. 2001:384-95.

Textquest: document clustering of Medline abstracts for concept discovery in molecular biology.

Author information

  • 1Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK. ioannis@ebi.ac.uk


We present an algorithm for large-scale document clustering of biological text, obtained from Medline abstracts. The algorithm is based on statistical treatment of terms, stemming, the idea of a 'go-list', unsupervised machine learning and graph layout optimization. The method is flexible and robust, controlled by a small number of parameter values. Experiments show that the resulting document clusters are meaningful as assessed by cluster-specific terms. Despite the statistical nature of the approach, with minimal semantic analysis, the terms provide a shallow description of the document corpus and support concept discovery.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Write to the Help Desk