Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
Pac Symp Biocomput. 1998:707-18.

Toward information extraction: identifying protein names from biological papers.

Author information

  • 1Human Genome Center, University of Tokyo, Japan. ichiro@ims.u-tokyo.ac.jp


To solve the mystery of the life phenomenon, we must clarify when genes are expressed and how their products interact with each other. But since the amount of continuously updated knowledge on these interactions is massive and is only available in the form of published articles, an intelligent information extraction (IE) system is needed. To extract these information directly from articles, the system must firstly identify the material names. However, medical and biological documents often include proper nouns newly made by the authors, and conventional methods based on domain specific dictionaries cannot detect such unknown words or coinages. In this study, we propose a new method of extracting material names, PROPER, using surface clue on character strings. It extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Write to the Help Desk