Format

Send to

Choose Destination
Bioinformatics. 2018 May 1;34(9):1547-1554. doi: 10.1093/bioinformatics/btx815.

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

Author information

1
National Science Foundation Center for Big Learning, University of Florida, Gainesville, FL 32611, USA.
2
Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA.
3
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA.
4
Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
5
Genomics of Gene Expression Laboratory, Centro de Investigación Príncipe Felipe, Valencia 42012, Spain.

Abstract

Motivation:

Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models.

Results:

We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems.

Availability and implementation:

The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN.

Contact:

andyli@ece.ufl.edu or aconesa@ufl.edu.

Supplementary information:

Supplementary data are available at Bioinformatics online.

PMID:
29272325
PMCID:
PMC5925775
DOI:
10.1093/bioinformatics/btx815
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center