Send to

Choose Destination
See comment in PubMed Commons below
AMIA Annu Symp Proc. 2013 Nov 16;2013:560-9. eCollection 2013.

Location bias of identifiers in clinical narratives.

Author information

  • 1Dept. of Pediatrics, Medical School;
  • 2School of Information;
  • 3Dept. of Biomedical Informatics; ; Dept. of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN.
  • 4School of Information; ; School of Public Health, University of Michigan, Ann Arbor, MI.


Scrubbing identifying information from narrative clinical documents is a critical first step to preparing the data for secondary use purposes, such as translational research. Evidence suggests that the differential distribution of protected health information (PHI) in clinical documents could be used as additional features to improve the performance of automated de-identification algorithms or toolkits. However, there has been little investigation into the extent to which such phenomena transpires in practice. To empirically assess this issue, we identified the location of PHI in 140,000 clinical notes from an electronic health record system and characterized the distribution as a function of location in a document. In addition, we calculated the 'word proximity' of nearby PHI elements to determine their co-occurrence rates. The PHI elements were found to have non-random distribution patterns. Location within a document and proximity between PHI elements might therefore be used to help de-identification systems better label PHI.

[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for PubMed Central
    Loading ...
    Support Center