Send to

Choose Destination
Pharmacoepidemiol Drug Saf. 2018 Sep;27(9):1034-1041. doi: 10.1002/pds.4615. Epub 2018 Jul 18.

Comparison of privacy-protecting analytic and data-sharing methods: A simulation study.

Author information

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA.
Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.



Privacy-protecting analytic and data-sharing methods that minimize the disclosure risk of sensitive information are increasingly important due to the growing interest in utilizing data across multiple sources. We conducted a simulation study to examine how avoiding sharing individual-level data in a distributed data network can affect analytic results.


The base scenario had four sites of varying sizes with 5% outcome incidence, 50% treatment prevalence, and seven confounders. We varied treatment prevalence, outcome incidence, treatment effect, site size, number of sites, and covariate distribution. Confounding adjustment was conducted using propensity score or disease risk score. We compared analyses of three types of aggregate-level data requested from sites: risk-set, summary-table, or effect-estimate data (meta-analysis) with benchmark results of analysis of pooled individual-level data. We assessed bias and precision of hazard ratio estimates as well as the accuracy of standard error estimates.


All the aggregate-level data-sharing approaches, regardless of confounding adjustment methods, successfully approximated pooled individual-level data analysis in most simulation scenarios. Meta-analysis showed minor bias when using inverse probability of treatment weights (IPTW) in infrequent exposure (5%), rare outcome (0.01%), and small site (5,000 patients) settings. SE estimates became less accurate for IPTW risk-set approach with less frequent exposure and for propensity score-matching meta-analysis approach with rare outcomes.


Overall, we found that we can avoid sharing individual-level data and obtain valid results in many settings, although care must be taken with meta-analysis approach in infrequent exposure and rare outcome scenarios, particularly when confounding adjustment is performed with IPTW.


distributed databases; pharmacoepidemiology; privacy-protecting methods; propensity score

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Wiley Icon for PubMed Central
Loading ...
Support Center