Send to

Choose Destination
Pac Symp Biocomput. 2019;24:439-443.

Merging heterogeneous clinical data to enable knowledge discovery.

Author information

Department of Biomedical Data Science, Stanford University, 1265 Welch Rd, Stanford, CA 94305, United States,


The vision of precision medicine relies on the integration of large-scale clinical, molecular and environmental datasets. Data integration may be thought of along two axes: data fusion across institutions, and data fusion across modalities. Cross-institutional data sharing that maintains semantic integrity hinges on the adoption of data standards and a push toward ontology-driven integration. The goal should be the creation of query-able data repositories spanning primary and tertiary care providers, disease registries, research organizations etc. to produce rich longitudinal datasets. Cross-modality sharing involves the integration of multiple data streams, from structured EHR data (diagnosis codes, laboratory tests) to genomics, imaging, monitors and patient-generated data including wearable devices. This integration presents unique technical, semantic, and ethical challenges; however recent work suggests that multi-modal clinical data can significantly improve the performance of phenotyping and prediction algorithms, powering knowledge discovery at the patient- and population-level.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for World Scientific Publishing Company Icon for PubMed Central
Loading ...
Support Center