Display Settings:

Format

Send to:

Choose Destination
BMC Bioinformatics. 2005;6 Suppl 1:S5. Epub 2005 May 24.

Exploring the boundaries: gene and protein identification in biomedical text.

Author information

  • 1Department of Computer Science, Stanford University, Stanford, CA 94305-9040, USA. jrfinkel@stanford.edu

Abstract

BACKGROUND:

Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools.

METHODS:

We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts.

RESULTS:

This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation.

CONCLUSION:

Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

PMID:
15960839
[PubMed - indexed for MEDLINE]
PMCID:
PMC1869019
Free PMC Article

Images from this publication.See all images (1)Free text

Figure 1
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Write to the Help Desk