Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
Gene. 2000 Dec 23;259(1-2):245-52.

Using BLAST for identifying gene and protein names in journal articles.

Author information

  • 1Department of Medical Informatics, Columbia University, New York, NY, USA. mk651@columbia.edu


We describe a system which automatically identifies gene and protein names in journal articles, an important and non-trivial first step in knowledge extraction of protein and gene actions. Our system uses a database of gene and protein names and is based on BLAST [Altschul et al., Nucleic Acids Res. 25 (1997) 3389-3402], a popular tool for DNA and protein sequence comparison. We describe a method that consists of mapping sequences of text characters into sequences of nucleotides that can be processed by BLAST. We demonstrate that this approach is feasible: the system matches gene and protein names with a recall of 78.8% and a precision of 71.7%, which includes names that are not part of the system database. An analysis of the results suggests techniques that can be used to improve performance further.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Icon for Elsevier Science
    Loading ...
    Write to the Help Desk