Format

Send to

Choose Destination
See comment in PubMed Commons below
J Am Med Inform Assoc. 2016 Apr;23(e1):e131-7. doi: 10.1093/jamia/ocv154. Epub 2015 Nov 13.

A multi-institution evaluation of clinical profile anonymization.

Author information

  • 1Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
  • 2Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
  • 3Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA.
  • 4Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
  • 5Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA.
  • 6Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN, USA b.malin@vanderbilt.edu.

Abstract

BACKGROUND AND OBJECTIVE:

There is an increasing desire to share de-identified electronic health records (EHRs) for secondary uses, but there are concerns that clinical terms can be exploited to compromise patient identities. Anonymization algorithms mitigate such threats while enabling novel discoveries, but their evaluation has been limited to single institutions. Here, we study how an existing clinical profile anonymization fares at multiple medical centers.

METHODS:

We apply a state-of-the-artk-anonymization algorithm, withkset to the standard value 5, to the International Classification of Disease, ninth edition codes for patients in a hypothyroidism association study at three medical centers: Marshfield Clinic, Northwestern University, and Vanderbilt University. We assess utility when anonymizing at three population levels: all patients in 1) the EHR system; 2) the biorepository; and 3) a hypothyroidism study. We evaluate utility using 1) changes to the number included in the dataset, 2) number of codes included, and 3) regions generalization and suppression were required.

RESULTS:

Our findings yield several notable results. First, we show that anonymizing in the context of the entire EHR yields a significantly greater quantity of data by reducing the amount of generalized regions from ∼15% to ∼0.5%. Second, ∼70% of codes that needed generalization only generalized two or three codes in the largest anonymization.

CONCLUSIONS:

Sharing large volumes of clinical data in support of phenome-wide association studies is possible while safeguarding privacy to the underlying individuals.

KEYWORDS:

anonymization; clinical codes; generalization; privacy; secondary use

PMID:
26567325
PMCID:
PMC4954623
[Available on 2017-04-01]
DOI:
10.1093/jamia/ocv154
[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems
    Loading ...
    Support Center