Format

Send to

Choose Destination
J Biomed Inform. 2017 Nov;75S:S19-S27. doi: 10.1016/j.jbi.2017.06.006. Epub 2017 Jun 7.

A hybrid approach to automatic de-identification of psychiatric notes.

Author information

1
School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States.
2
School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States. Electronic address: hua.xu@uth.tmc.edu.
3
School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States. Electronic address: kirk.roberts@uth.tmc.edu.

Abstract

De-identification, or identifying and removing protected health information (PHI) from clinical data, is a critical step in making clinical data available for clinical applications and research. This paper presents a natural language processing system for automatic de-identification of psychiatric notes, which was designed to participate in the 2016 CEGS N-GRID shared task Track 1. The system has a hybrid structure that combines machine leaning techniques and rule-based approaches. The rule-based components exploit the structure of the psychiatric notes as well as characteristic surface patterns of PHI mentions. The machine learning components utilize supervised learning with rich features. In addition, the system performance was boosted with integration of additional data to the training set through domain adaptation. The hybrid system showed overall micro-averaged F-score 90.74 on the test set, second-best among all the participants of the CEGS N-GRID task.

KEYWORDS:

De-identification; Natural language processing; Psychiatric notes

PMID:
28602904
PMCID:
PMC5705430
DOI:
10.1016/j.jbi.2017.06.006
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center