Display Settings:

Format

Send to:

Choose Destination
    BMC Bioinformatics. 2005 Apr 7;6:88.

    Building a protein name dictionary from full text: a machine learning term extraction approach.

    Source

    Institute for Computational Biomedicine, Dept. of Physiology and Biophysics, Weill Cornell Medical College, 1300 York Ave, New York, NY 10021, USA. les2007@med.cornell.edu

    Abstract

    BACKGROUND:

    The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature.

    RESULTS:

    We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text.

    CONCLUSION:

    This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.

    PMID:
    15817129
    [PubMed - indexed for MEDLINE]
    PMCID:
    PMC1090555
    Free PMC Article

    Images from this publication.See all images (4) Free text

    Figure 2
    Figure 4
    Figure 1
    Figure 3

      Supplemental Content

      Icon for BioMed Central Icon for PubMed Central

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk