Format

Send to

Choose Destination
Am J Epidemiol. 2018 Dec 7. doi: 10.1093/aje/kwy265. [Epub ahead of print]

Privacy-Protecting Analytical Methods Using Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks.

Author information

1
Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts.
2
Division of Research, Kaiser Permanente Northern California, Oakland, California.
3
University of Alabama at Birmingham, Birmingham, Alabama.
4
Kaiser Permanente Washington Health Research Institute, Seattle, Washington.
5
The Permanente Medical Group, Kaiser Permanente Northern California, Oakland, California.
6
StatLog Econometrics Inc., Montreal, Quebec, Canada.
7
Institute for Health Research, Kaiser Permanente Colorado, Denver, Colorado.
8
Global Healthy Living Foundation, CreakyJoints, Upper Nyack, New York.
9
Limeade®, Bellevue, Washington.

Abstract

Distributed data networks enable large-scale epidemiologic studies but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using two empirical examples within a three-site distributed data network, we tested combinations of three aggregate-level data-sharing approaches (risk-set, summary-table, effect-estimate), four confounding adjustment methods (matching, stratification, inverse probability weighting, match weighting), and two summary scores (propensity score, disease risk score) for binary and time-to-event outcomes. We assessed the performance of these data-sharing and adjustment method combinations by comparing their results against the results from the corresponding pooled individual-level data analysis (reference). For both outcome types, the method combinations examined yielded identical or comparable results to the reference in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method, e.g., risk-set data sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across adjustment methods examined, risk-set data sharing generally performed better while summary-table and effect-estimate data sharing more often produced discrepancies in settings of rare outcome and small sample size. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing individual-level data.

PMID:
30535131
DOI:
10.1093/aje/kwy265

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center