Extracting protein interactions from text with the unified AkaneRE event extraction system

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):442-53. doi: 10.1109/TCBB.2010.46.

Abstract

Currently, relation extraction (RE) and event extraction (EE) are the two main streams of biological information extraction. In 2009, the majority of these RE and EE research efforts were centered around the BioCreative II.5 Protein-Protein Interaction (PPI) challenge and the "BioNLP event extraction shared task." Although these challenges took somewhat different approaches, they share the same ultimate goal of extracting bio-knowledge from the literature. This paper compares the two challenge task definitions, and presents a unified system that was successfully applied in both these and several other PPI extraction task settings. The AkaneRE system has three parts: A core engine for RE, a pool of modules for specific solutions, and a configuration language to adapt the system to different tasks. The core engine is based on machine learning, using either Support Vector Machines or Statistical Classifiers and features extracted from given training data. The specific modules solve tasks like sentence boundary detection, tokenization, stemming, part-of-speech tagging, parsing, named entity recognition, generation of potential relations, generation of machine learning features for each relation, and finally, assignment of confidence scores and ranking of candidate relations. With these components, the AkaneRE system produces state-of-the-art results, and the system is freely available for academic purposes at http://www-tsujii.is.s.u-tokyo.ac.jp/satre/akane/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Data Mining / methods*
  • Databases, Genetic
  • Information Storage and Retrieval
  • Natural Language Processing*
  • Protein Interaction Mapping / methods*