Format

Send to

Choose Destination
Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.

Author information

1
School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China, School of Information Science & Engineering, Central South University, Changsha 410083, China, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan and Center for Computational System Biology, Fudan University, Shanghai 200433, China School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China, School of Information Science & Engineering, Central South University, Changsha 410083, China, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan and Center for Computational System Biology, Fudan University, Shanghai 200433, China.
2
School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China, School of Information Science & Engineering, Central South University, Changsha 410083, China, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan and Center for Computational System Biology, Fudan University, Shanghai 200433, China.
3
School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China, School of Information Science & Engineering, Central South University, Changsha 410083, China, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan and Center for Computational System Biology, Fudan University, Shanghai 200433, China School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China, School of Information Science & Engineering, Central South University, Changsha 410083, China, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan and Center for Computational System Biology, Fudan University, Shanghai 200433, China School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China, School of Information Science & Engineering, Central South University, Changsha 410083, China, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan and Center for Computational System Biology, Fudan University, Shanghai 200433, China.

Abstract

MOTIVATION:

Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation.

METHODS:

We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using 'learning to rank'. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy.

RESULTS:

MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI.

AVAILABILITY AND IMPLEMENTATION:

The software is available upon request.

PMID:
26072501
PMCID:
PMC4765864
DOI:
10.1093/bioinformatics/btv237
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center