Automated extraction of information on protein-protein interactions from the biological literature

Bioinformatics. 2001 Feb;17(2):155-61. doi: 10.1093/bioinformatics/17.2.155.

Abstract

Motivation: To understand biological process, we must clarify how proteins interact with each other. However, since information about protein-protein interactions still exists primarily in the scientific literature, it is not accessible in a computer-readable format. Efficient processing of large amounts of interactions therefore needs an intelligent information extraction method. Our aim is to develop an efficient method for extracting information on protein-protein interaction from scientific literature.

Results: We present a method for extracting information on protein-protein interactions from the scientific literature. This method, which employs only a protein name dictionary, surface clues on word patterns and simple part-of-speech rules, achieved high recall and precision rates for yeast (recall = 86.8% and precision = 94.3%) and Escherichia coli (recall = 82.5% and precision = 93.5%). The result of extraction suggests that our method should be applicable to any species for which a protein name dictionary is constructed.

Availability: The program is available on request from the authors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electronic Data Processing
  • Information Storage and Retrieval*
  • Literature
  • Proteins / metabolism*

Substances

  • Proteins