Learning disease relationships from clinical drug trials

J Am Med Inform Assoc. 2017 Jan;24(1):13-23. doi: 10.1093/jamia/ocw003. Epub 2016 May 17.

Abstract

Objective: Our objective is to test the limits of the assumption that better learning from data in medicine requires more granular data. We hypothesize that clinical trial metadata contains latent scientific, clinical, and regulatory expert knowledge that can be accessed to draw conclusions about the underlying biology of diseases. We seek to demonstrate that this latent information can be uncovered from the whole body of clinical trials.

Materials and methods: We extract free-text metadata from 93 654 clinical drug trials and introduce a representation that allows us to compare different trials. We then construct a network of diseases using only the trial metadata. We view each trial as the summation of expert knowledge of biological mechanisms and medical evidence linking a disease to a drug believed to modulate the pathways of that disease. Our network representation allows us to visualize disease relationships based on this underlying information.

Results: Our disease network shows surprising agreement with another disease network based on genetic data and on the Medical Subject Headings (MeSH) taxonomy, yet also contains unique disease similarities.

Discussion and conclusion: The agreement of our results with other sources indicates that our premise regarding latent expert knowledge holds. The disease relationships unique to our network may be used to generate hypotheses for future biological and clinical research as well as drug repurposing and design. Our results provide an example of using experimental data on humans to generate biologically useful information and point to a set of new and promising strategies to link clinical outcomes data back to biological research.

Keywords: clinical trials; machine learning; networks.

MeSH terms

  • Area Under Curve
  • Clinical Trials as Topic*
  • Data Mining / methods*
  • Disease*
  • Drug Evaluation*
  • Drug Therapy
  • Humans
  • Machine Learning*
  • Medical Subject Headings
  • Metadata
  • Vocabulary, Controlled