Format

Send to

Choose Destination
Eur J Epidemiol. 2018 Dec 10. doi: 10.1007/s10654-018-0470-0. [Epub ahead of print]

Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem.

Author information

1
Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, USA. qyzhong@mail.harvard.edu.
2
Division of Women's Mental Health, Department of Psychiatry, Brigham and Women's Hospital, Boston, MA, USA.
3
Department of Psychiatry and Behavioral Neurosciences, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
4
Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA, USA.
5
Children's Hospital Informatics Program, Boston Children's Hospital, Boston, MA, USA.
6
Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, USA.
7
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
8
Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
9
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Abstract

We developed algorithms to identify pregnant women with suicidal behavior using information extracted from clinical notes by natural language processing (NLP) in electronic medical records. Using both codified data and NLP applied to unstructured clinical notes, we first screened pregnant women in Partners HealthCare for suicidal behavior. Psychiatrists manually reviewed clinical charts to identify relevant features for suicidal behavior and to obtain gold-standard labels. Using the adaptive elastic net, we developed algorithms to classify suicidal behavior. We then validated algorithms in an independent validation dataset. From 275,843 women with codes related to pregnancy or delivery, 9331 women screened positive for suicidal behavior by either codified data (N = 196) or NLP (N = 9,145). Using expert-curated features, our algorithm achieved an area under the curve of 0.83. By setting a positive predictive value comparable to that of diagnostic codes related to suicidal behavior (0.71), we obtained a sensitivity of 0.34, specificity of 0.96, and negative predictive value of 0.83. The algorithm identified 1423 pregnant women with suicidal behavior among 9331 women screened positive. Mining unstructured clinical notes using NLP resulted in a 11-fold increase in the number of pregnant women identified with suicidal behavior, as compared to solely reliance on diagnostic codes.

KEYWORDS:

Classification algorithm; Electronic medical; Natural language processing; Pregnant women; Records; Suicidal behavior

PMID:
30535584
DOI:
10.1007/s10654-018-0470-0

Supplemental Content

Full text links

Icon for Springer
Loading ...
Support Center