Send to

Choose Destination
Am J Epidemiol. 2018 Dec 7. doi: 10.1093/aje/kwy265. [Epub ahead of print]

Privacy-Protecting Analytical Methods Using Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks.

Author information

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts.
Division of Research, Kaiser Permanente Northern California, Oakland, California.
University of Alabama at Birmingham, Birmingham, Alabama.
Kaiser Permanente Washington Health Research Institute, Seattle, Washington.
The Permanente Medical Group, Kaiser Permanente Northern California, Oakland, California.
StatLog Econometrics Inc., Montreal, Quebec, Canada.
Institute for Health Research, Kaiser Permanente Colorado, Denver, Colorado.
Global Healthy Living Foundation, CreakyJoints, Upper Nyack, New York.
Limeade®, Bellevue, Washington.


Distributed data networks enable large-scale epidemiologic studies but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using two empirical examples within a three-site distributed data network, we tested combinations of three aggregate-level data-sharing approaches (risk-set, summary-table, effect-estimate), four confounding adjustment methods (matching, stratification, inverse probability weighting, match weighting), and two summary scores (propensity score, disease risk score) for binary and time-to-event outcomes. We assessed the performance of these data-sharing and adjustment method combinations by comparing their results against the results from the corresponding pooled individual-level data analysis (reference). For both outcome types, the method combinations examined yielded identical or comparable results to the reference in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method, e.g., risk-set data sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across adjustment methods examined, risk-set data sharing generally performed better while summary-table and effect-estimate data sharing more often produced discrepancies in settings of rare outcome and small sample size. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing individual-level data.


Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center