Send to

Choose Destination
Am J Epidemiol. 2019 Apr 1;188(4):709-723. doi: 10.1093/aje/kwy265.

Validity of Privacy-Protecting Analytical Methods That Use Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks.

Author information

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts.
Division of Research, Kaiser Permanente Northern California, Oakland, California.
Division of Clinical Immunology and Rheumatology, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama.
Kaiser Permanente Washington Health Research Institute, Seattle, Washington.
The Permanente Medical Group, Kaiser Permanente Northern California, Oakland, California.
StatLog Econometrics Inc., Montreal, Quebec, Canada.
Institute for Health Research, Kaiser Permanente Colorado, Denver, Colorado.
CreakyJoints, Global Healthy Living Foundation, Upper Nyack, New York.
Limeade, Bellevue, Washington.


Distributed data networks enable large-scale epidemiologic studies, but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using 2 empirical examples within a 3-site distributed data network, we tested combinations of 3 aggregate-level data-sharing approaches (risk-set, summary-table, and effect-estimate), 4 confounding adjustment methods (matching, stratification, inverse probability weighting, and matching weighting), and 2 summary scores (propensity score and disease risk score) for binary and time-to-event outcomes. We assessed the performance of combinations of these data-sharing and adjustment methods by comparing their results with results from the corresponding pooled individual-level data analysis (reference analysis). For both types of outcomes, the method combinations examined yielded results identical or comparable to the reference results in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method; for example, risk-set data-sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across the adjustment methods examined, risk-set data-sharing generally performed better, while summary-table and effect-estimate data-sharing more often produced discrepancies in settings with rare outcomes and small sample sizes. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing of individual-level data.


confounding control; data-sharing; disease risk score; distributed data networks; meta-analysis; multicenter studies; privacy protection; propensity score

[Available on 2020-04-01]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center