Format

Send to

Choose Destination
J Am Med Inform Assoc. 2015 Nov;22(6):1187-95. doi: 10.1093/jamia/ocv017. Epub 2015 Jul 3.

A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research.

Author information

1
Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA Information Sciences Institute, University of Southern California, Marina Del Rey, CA dmeeker@usc.edu.
2
Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093.
3
Geriatrics Research, Education, and Clinical Care Service Department of Biomedical Informatics, Division of General Internal Medicine, Department of Biostatistics.
4
Information Sciences Institute, University of Southern California, Marina Del Rey, CA.
5
Geriatrics Research, Education, and Clinical Care Service.
6
Department of Pathology and Laboratory Medicine and Department of Internal Medicine, University of California Davis, Sacramento, CA.
7
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego.
8
Lahey Hospital and Medical Center, Burlington, MA, USA.

Abstract

BACKGROUND:

Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner.

OBJECTIVE:

The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies.

MATERIALS AND METHODS:

Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network.

RESULTS:

The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws.

DISCUSSION AND CONCLUSION:

Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks.

KEYWORDS:

comparative effectiveness research; distributed analytics; federated research network; privacy-preserving network infrastructure

PMID:
26142423
PMCID:
PMC4639714
DOI:
10.1093/jamia/ocv017
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center