Format

Send to

Choose Destination
See comment in PubMed Commons below
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):925-37. doi: 10.1136/amiajnl-2014-002767. Epub 2014 Jun 13.

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

Author information

1
Department of Pediatrics, University of Michigan Medical School, Ann Arbor, Michigan, USA.
2
Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, USA.
3
Department of Health Management and Policy, University of Michigan School of Public Health, Ann Arbor, Michigan, USA School of Information, University of Michigan, Ann Arbor, Michigan, USA.
4
School of Information, University of Michigan, Ann Arbor, Michigan, USA Department of Electronic Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA.
5
Center for Statistical Consultation and Research, University of Michigan, Ann Arbor, Michigan, USA.
6
Lister Hill Center, National Library of Medicine, Bethesda, Maryland, USA.
7
Department of Computer Science, Discovery Analytics Center, Virginia Tech, Arlington, Virginia, USA.

Abstract

OBJECTIVE:

We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel.

METHODS:

Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations.

RESULTS:

The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations.

DISCUSSION:

Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations.

CONCLUSIONS:

In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility.

KEYWORDS:

Data Mining; Electronic Health Records; International Classification of Diseases; Medline; Natural Language Processing; Unified Medical Language System

PMID:
24928177
PMCID:
PMC4147617
DOI:
10.1136/amiajnl-2014-002767
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems Icon for PubMed Central
    Loading ...
    Support Center