Format

Send to

Choose Destination
See comment in PubMed Commons below
J Am Med Inform Assoc. 2013 Jan 1;20(1):95-101. doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.

Reducing patient re-identification risk for laboratory results within research datasets.

Author information

1
Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37232-8340, USA. ravi.v.atreya@vanderbilt.edu

Abstract

OBJECTIVE:

To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.

MATERIALS AND METHODS:

In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.

RESULTS:

Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).

DISCUSSION AND CONCLUSION:

With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.

PMID:
22822040
PMCID:
PMC3555327
DOI:
10.1136/amiajnl-2012-001026
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems Icon for PubMed Central
    Loading ...
    Support Center