Format

Send to

Choose Destination
Sci Rep. 2019 Sep 27;9(1):13954. doi: 10.1038/s41598-019-50346-2.

A multi-source data integration approach reveals novel associations between metabolites and renal outcomes in the German Chronic Kidney Disease study.

Author information

1
Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Am Biopark 9, 93053, Regensburg, Germany.
2
Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
3
Department of Physics, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany.
4
Institute of Genetic Epidemiology, Department of Biometry, Epidemiology, and Medical Bioinformatics, Faculty of Medicine and Medical Center, University of Freiburg, 79106, Freiburg, Germany.
5
Department of Nephrology, Medical Center, University of Freiburg, 79106, Freiburg, Germany.
6
Institute for Functional Genomics, University of Regensburg, Am Biopark 9, 93053, Regensburg, Germany.
7
Institute of Computational Biomedicine, Weill Cornell University, New York, NY, 10021, USA.
8
Institute for Functional Genomics, University of Regensburg, Am Biopark 9, 93053, Regensburg, Germany. Wolfram.Gronwald@klinik.uni-regensburg.de.

Abstract

Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To this end, omics data are integrated with other data types, e.g., clinical, phenotypic, and demographic parameters of categorical or continuous nature. We exemplify this data integration issue for a chronic kidney disease (CKD) study, comprising complex clinical, demographic, and one-dimensional 1H nuclear magnetic resonance metabolic variables. Routine analysis screens for associations of single metabolic features with clinical parameters while accounting for confounders typically chosen by expert knowledge. This knowledge can be incomplete or unavailable. We introduce a framework for data integration that intrinsically adjusts for confounding variables. We give its mathematical and algorithmic foundation, provide a state-of-the-art implementation, and evaluate its performance by sanity checks and predictive performance assessment on independent test data. Particularly, we show that discovered associations remain significant after variable adjustment based on expert knowledge. In contrast, we illustrate that associations discovered in routine univariate screening approaches can be biased by incorrect or incomplete expert knowledge. Our data integration approach reveals important associations between CKD comorbidities and metabolites, including novel associations of the plasma metabolite trimethylamine-N-oxide with cardiac arrhythmia and infarction in CKD stage 3 patients.

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center