Format

Send to

Choose Destination
Bioinformatics. 2017 Dec 1;33(23):3716-3725. doi: 10.1093/bioinformatics/btx470.

Mechanisms to protect the privacy of families when using the transmission disequilibrium test in genome-wide association studies.

Author information

1
Department of Genetics, Stanford University, Stanford, CA 94305, USA.
2
Department of Biomedical Informatics, UC San Diego, La Jolla, CA 92093, USA.

Abstract

Motivation:

Inappropriate disclosure of human genomes may put the privacy of study subjects and of their family members at risk. Existing privacy-preserving mechanisms for Genome-Wide Association Studies (GWAS) mainly focus on protecting individual information in case-control studies. Protecting privacy in family-based studies is more difficult. The transmission disequilibrium test (TDT) is a powerful family-based association test employed in many rare disease studies. It gathers information about families (most frequently involving parents, affected children and their siblings). It is important to develop privacy-preserving approaches to disclose TDT statistics with a guarantee that the risk of family 're-identification' stays below a pre-specified risk threshold. 'Re-identification' in this context means that an attacker can infer that the presence of a family in a study.

Methods:

In the context of protecting family-level privacy, we developed and evaluated a suite of differentially private (DP) mechanisms for TDT. They include Laplace mechanisms based on the TDT test statistic, P-values, projected P-values and exponential mechanisms based on the TDT test statistic and the shortest Hamming distance (SHD) score.

Results:

Using simulation studies with a small cohort and a large one, we showed that that the exponential mechanism based on the SHD score preserves the highest utility and privacy among all proposed DP methods. We provide a guideline on applying our DP TDT in a real dataset in analyzing Kawasaki disease with 187 families and 906 SNPs. There are some limitations, including: (1) the performance of our implementation is slow for real-time results generation and (2) handling missing data is still challenging.

Availability and implementation:

The software dpTDT is available in https://github.com/mwgrassgreen/dpTDT.

Contact:

mengw1@stanford.edu.

Supplementary information:

Supplementary data are available at Bioinformatics online.

PMID:
29036461
PMCID:
PMC5860319
DOI:
10.1093/bioinformatics/btx470
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center