Extracting protein interactions from text with the unified AkaneRE event extraction system

Rune Saetre; Kazuhiro Yoshida; Makoto Miwa; Takuya Matsuzaki; Yoshinobu Kano; Jun'ichi Tsujii

doi:10.1109/TCBB.2010.46

Extracting protein interactions from text with the unified AkaneRE event extraction system

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):442-53. doi: 10.1109/TCBB.2010.46.

Authors

Rune Saetre¹, Kazuhiro Yoshida, Makoto Miwa, Takuya Matsuzaki, Yoshinobu Kano, Jun'ichi Tsujii

Affiliation

¹ Department of Information Science, University of Tokyo, Tokyo, Japan. rune.saetre@is.s.u-tokyo.ac.jp

PMID: 20671316
DOI: 10.1109/TCBB.2010.46

Abstract

Currently, relation extraction (RE) and event extraction (EE) are the two main streams of biological information extraction. In 2009, the majority of these RE and EE research efforts were centered around the BioCreative II.5 Protein-Protein Interaction (PPI) challenge and the "BioNLP event extraction shared task." Although these challenges took somewhat different approaches, they share the same ultimate goal of extracting bio-knowledge from the literature. This paper compares the two challenge task definitions, and presents a unified system that was successfully applied in both these and several other PPI extraction task settings. The AkaneRE system has three parts: A core engine for RE, a pool of modules for specific solutions, and a configuration language to adapt the system to different tasks. The core engine is based on machine learning, using either Support Vector Machines or Statistical Classifiers and features extracted from given training data. The specific modules solve tasks like sentence boundary detection, tokenization, stemming, part-of-speech tagging, parsing, named entity recognition, generation of potential relations, generation of machine learning features for each relation, and finally, assignment of confidence scores and ranking of candidate relations. With these components, the AkaneRE system produces state-of-the-art results, and the system is freely available for academic purposes at http://www-tsujii.is.s.u-tokyo.ac.jp/satre/akane/.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computational Biology / methods*
Data Mining / methods*
Databases, Genetic
Information Storage and Retrieval
Natural Language Processing*
Protein Interaction Mapping / methods*