Send to

Choose Destination
See comment in PubMed Commons below
Gene. 2000 Dec 23;259(1-2):245-52.

Using BLAST for identifying gene and protein names in journal articles.

Author information

Department of Medical Informatics, Columbia University, New York, NY, USA.


We describe a system which automatically identifies gene and protein names in journal articles, an important and non-trivial first step in knowledge extraction of protein and gene actions. Our system uses a database of gene and protein names and is based on BLAST [Altschul et al., Nucleic Acids Res. 25 (1997) 3389-3402], a popular tool for DNA and protein sequence comparison. We describe a method that consists of mapping sequences of text characters into sequences of nucleotides that can be processed by BLAST. We demonstrate that this approach is feasible: the system matches gene and protein names with a recall of 78.8% and a precision of 71.7%, which includes names that are not part of the system database. An analysis of the results suggests techniques that can be used to improve performance further.

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Support Center