Display Settings:

Format

Send to:

Choose Destination
    Pac Symp Biocomput. 2003:391-402.

    Evaluation of the vector space representation in text-based gene clustering.

    Source

    Department of Electrical Engineering, ESAT-SISTA, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.

    Abstract

    Thanks to its increasing availability, electronic literature can now be a major source of information when developing complex statistical models where data is scarce or contains much noise. This raises the question of how to deeply integrate information from domain literature with experimental data. Evaluating what kind of statistical text representations can integrate literature knowledge in clustering still remains an unsufficiently explored topic. In this work we discuss how the bag-of-words representation can be used successfully to represent genetic annotation and free-text information coming from different databases. We demonstrate the effect of various weighting schemes and information sources in a functional clustering setup. As a quantitative evaluation, we contrast for different parameter settings the functional groupings obtained from text with those obtained from expert assessments and link each of the results to a biological discussion.

    PMID:
    12603044
    [PubMed - indexed for MEDLINE]
    Free full text

      Supplemental Content

      Icon for Pacific Sympsium On Biocomputing

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk