Send to

Choose Destination
Pharmacoepidemiol Drug Saf. 2016 Mar;25(3):307-16. doi: 10.1002/pds.3893. Epub 2015 Nov 3.

A normalization method for combination of laboratory test results from different electronic healthcare databases in a distributed research network.

Author information

Department of Biomedical Informatics, Ajou University School of Medicine, Ajou University, Suwon, Korea.
Observational Health Data Sciences and Informatics, New York, NY, USA.
Janssen Research and Development LLC, Titusville, FL, USA.
Seoul National University Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea.
Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea.
Mibyeong Research Center, Korea Institute of Oriental Medicine, Daejeon, Korea.
Centre for u-Healthcare, Gachon University Gil Hospital, Korea.
Center for Medical Informatics, Seoul National University Bundang Hospital, Seongnam, Korea.



Distributed research networks (DRNs) afford statistical power by integrating observational data from multiple partners for retrospective studies. However, laboratory test results across care sites are derived using different assays from varying patient populations, making it difficult to simply combine data for analysis. Additionally, existing normalization methods are not suitable for retrospective studies. We normalized laboratory results from different data sources by adjusting for heterogeneous clinico-epidemiologic characteristics of the data and called this the subgroup-adjusted normalization (SAN) method.


Subgroup-adjusted normalization renders the means and standard deviations of distributions identical under population structure-adjusted conditions. To evaluate its performance, we compared SAN with existing methods for simulated and real datasets consisting of blood urea nitrogen, serum creatinine, hematocrit, hemoglobin, serum potassium, and total bilirubin. Various clinico-epidemiologic characteristics can be applied together in SAN. For simplicity of comparison, age and gender were used to adjust population heterogeneity in this study.


In simulations, SAN had the lowest standardized difference in means (SDM) and Kolmogorov-Smirnov values for all tests (p < 0.05). In a real dataset, SAN had the lowest SDM and Kolmogorov-Smirnov values for blood urea nitrogen, hematocrit, hemoglobin, and serum potassium, and the lowest SDM for serum creatinine (p < 0.05).


Subgroup-adjusted normalization performed better than normalization using other methods. The SAN method is applicable in a DRN environment and should facilitate analysis of data integrated across DRN partners for retrospective observational studies.


distributed research networks; electronic health records; laboratory test; normalization; pharmacoepidemiology

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center