Send to

Choose Destination
AMIA Annu Symp Proc. 2018 Apr 16;2017:1695-1704. eCollection 2017.

A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis.

Author information

Rutgers University, Newark, NJ, USA.
Lahore University of Management Sciences, Lahore, Punjab, Pakistan.
University of California at San Diego, La Jolla, CA, USA.


Big data coupled with precision medicine has the potential to significantly improve our understanding and treatment of complex disorders, such as cancer, diabetes, depression, etc. However, the essential problem is that data are stuck in silos, and it is difficult to precisely identify which data would be relevant and useful for any particular type of analysis. While the process to acquire and access biomedical data requires significant effort, in many cases the data may not provide much insight to the problem at hand. Therefore, there is a need to be able to measure the utility/relevance of additional datasets for a particular biomedical research task without direct access to the data. Towards this, in this paper, we develop a privacy-preserving approach to create synthetic data that can provide a firstorder approximation of utility. We evaluate the proposed approach with several biomedical datasets in the context of regression and classification tasks and discuss how it can be incorporated into existing data management systems such as REDCap.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center