Format

Send to

Choose Destination
Biol Psychiatry. 2018 Jun 15;83(12):997-1004. doi: 10.1016/j.biopsych.2018.01.011. Epub 2018 Feb 26.

High Throughput Phenotyping for Dimensional Psychopathology in Electronic Health Records.

Author information

1
Center for Quantitative Health and Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts. Electronic address: thmccoy@partners.org.
2
Harvard School of Public Health, Boston, Massachusetts; Tsinghua University, Beijing, China.
3
Center for Quantitative Health and Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
4
Harvard School of Public Health, Boston, Massachusetts.

Abstract

BACKGROUND:

Relying on diagnostic categories of neuropsychiatric illness obscures the complexity of these disorders. Capturing multiple dimensional measures of neuropathology could facilitate the clinical and neurobiological investigation of cognitive and behavioral phenotypes.

METHODS:

We developed a natural language processing-based approach to extract five symptom dimensions, based on the National Institute of Mental Health Research Domain Criteria definitions, from narrative clinical notes. Estimates of Research Domain Criteria loading were derived from a cohort of 3619 individuals with 4623 hospital admissions. We applied this tool to a large corpus of psychiatric inpatient admission and discharge notes (2010-2015), and using the same cohort we examined face validity, predictive validity, and convergent validity with gold standard annotations.

RESULTS:

In mixed-effect models adjusted for sociodemographic and clinical features, greater negative and positive symptom domains were associated with a shorter length of stay (β = -.88, p = .001 and β = -1.22, p < .001, respectively), while greater social and arousal domain scores were associated with a longer length of stay (β = .93, p < .001 and β = .81, p = .007, respectively). In fully adjusted Cox regression models, a greater positive domain score at discharge was also associated with a significant increase in readmission risk (hazard ratio = 1.22, p < .001). Positive and negative valence domains were correlated with expert annotation (by analysis of variance [df = 3], R2 = .13 and .19, respectively). Likewise, in a subset of patients, neurocognitive testing was correlated with cognitive performance scores (p < .008 for three of six measures).

CONCLUSIONS:

This shows that natural language processing can be used to efficiently and transparently score clinical notes in terms of cognitive and psychopathologic domains.

KEYWORDS:

Computed phenotype; Electronic health record; Natural language processing; Research Domain Criteria; Topic modeling; Transdiagnostic

PMID:
29496195
PMCID:
PMC5972065
[Available on 2019-06-15]
DOI:
10.1016/j.biopsych.2018.01.011
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center