Send to

Choose Destination
J Biomed Inform. 2015 Jun;55:206-17. doi: 10.1016/j.jbi.2015.04.006. Epub 2015 Apr 24.

Toward a complete dataset of drug-drug interaction information from publicly available sources.

Author information

Department of Computer Science, Kent State University, 241 Math and Computer Science Building, Kent, OH 44242, USA. Electronic address:
Department of Pharmacy, School of Pharmacy and University of Washington Medicine, Pharmacy Services, University of Washington, H375V Health Sciences Bldg, Box 357630, Seattle, WA 98195, USA. Electronic address:
IBM T.J. Watson Research Center, 1101 Kitchawan Rd Route 134, P.O. Box 218, Yorktown Heights, NY 10598, USA. Electronic address:
Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD 21250, USA. Electronic address:
Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA. Electronic address:
Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, 622 West 168th St VC5, New York, NY 10032, USA. Electronic address:
Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, 622 West 168th St VC5, New York, NY 10032, USA. Electronic address:
Division of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W. Markham St, #782, Little Rock, AR 72205-7199, USA. Electronic address:
Section for Medical Expert and Knowledge-Based Systems, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria. Electronic address:
Biomedical Statistics & Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. Electronic address:
Stanford Center for Biomedical Informatics Research, Stanford, CA 94305, USA. Electronic address:
Department of Biomedical Informatics, Suite 419, 5607 Baum Blvd, Pittsburgh, PA 15206-3701, USA. Electronic address:


Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data.


Drug–drug interaction; Natural language processing; Pharmacovigilance; Record linkage

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center