Methods for Identifying Suicide or Suicidal Ideation in EHRs
K Haerian, MD, H Salmasian, MD, and C Friedman, PhD
Abstract
Electronic health records contain important data elements for detection of novel adverse drug reactions, genotype/phenotype identification and psychosocial factor analysis, and the role of each of these as risk factors for suicidality warrants further investigation. Suicide and suicidal ideation are documented in clinical narratives. The specific purpose of this study was to define an algorithm for automated detection of this serious event. We found that ICD-9 E-Codes had the lowest positive predictive value: 0.55 (90% CI: 0.42–0.67), while combining ICD-9 and NLP had the best PPV: 0.97 (90% CI: 0.92–0.99). A qualitative analysis and classification of the types of errors by ICD-9 and NLP automated coding compared to manual review are also discussed.
INTRODUCTION
Suicide is the fourth leading cause of death for Americans aged 15–65.1 Depression, suicide ideation, non-fatal suicide attempts, and suicide are important areas for potential psychiatric intervention. An increased risk for suicidal behavior may be a potential complication of certain medications, and it is important to ascertain which drugs place patients at higher risk for neuropsychiatric events, like suicide, so that appropriate precautionary measures for improved public health may be enacted. For example, the US Food and Drug Administration (FDA) has issued safety alerts and has required the inclusion of suicide to the drug label for leukotriene-modifying agents.2
Children and adolescents may be especially impacted by the adverse drug reaction of increased suicidality, which is suicidal thinking and behavior. Evidence shows suicidal ideation and behavior is more likely to occur in pediatrics and young adults taking antidepressants versus placebo.3 The FDA requires that all antidepressants carry a black box warning for suicidality risk in children (beginning in 2005) and young adults, ages18–24 (beginning in 2007).4
Automated methods could be used on large data sets to find drugs associated with suicide. In order to establish best practices to leverage existing healthcare databases for drug safety studies, several partnerships have formed such as the European Union Adverse Drug Reaction (EU-ADR) group, the Mini-Sentinel program, and the Observational Medical Outcomes Partnership.5 A crucial step for pharmacoepidemiological surveillance and research is to clearly define the health outcome of interest, such as suicide, and to determine the algorithms needed to reliably extract essential data elements.
Pharmacovigilance study of drug-adverse event associations has long relied on spontaneous reporting system (SRS) databases, such as the FDA Adverse Event Reporting System (AERS). In an attempt to address certain limitations of SRSs, such as underreporting and biased reporting rates, other complementary sources of data are being explored in a systematic way.6 Electronic Health Records (EHRs) contains data elements of interest for drug studies, such as demographic information, medication lists, diseases, symptoms, and adverse events. In a hospital cohort, it may be possible to observe drug effects, both positive and negative, that were not detected in clinical trials because of limited population size, different demographics, and other patient variables.
In the Sentinel Initiative and the Mini-Sentinel pilot project, the FDA is seeking to leverage existing longitudinal healthcare data, including data in EHRs, to monitor the safety of drugs.7 As part of the Mini-Sentinel program, a recent review paper evaluated the validity of algorithms to identify suicide and suicidal ideation (SI) in the EHR by using International Classification of Diseases, Ninth Revision, E-codes (950–959) for intentional self-injury.8 One potential complication to using E-codes is that in Medicare and private insurance claims databases, E-codes are often incomplete.9 The authors, who reviewed all currently available validated methods for identifying suicide or SI, in observational databases, such as EHRs, concluded that caution should be exercised when interpreting research that relies on the ICD-9 E-codes as a measure of outcome.8 There is a need for better algorithms to identify sucidality in an automated way.
Natural Language Processing (NLP) has been used to convert free text electronic health records into standardized output.10,11 For this work, we used MedLEE (Medical Language Extraction and Encoding System) to obtain encoded structured output from EHR notes.12,13 NLP structured output may complement the use of ICD-9 codes to identify health outcomes of interest. Some research has been published analyzing the content of the suicide note, written by an individual, usually to friends or family members prior to a suicide attempt, in an attempt to provide reasons or express an emotional state, but to the best of our knowledge, there has been no studies in the literature on identifying sucidality in EHRs using NLP.14,15 This study is focused on analyzing medical documents, such as admission notes and discharge summaries for clinical content to see if we can develop a phenotypic algorithm for identifying patients with suicidal thoughts or behaviors. Once a phenotype algorithm has been developed and studied, it may potentially be used in a variety of studies and across different research institutions. A recent study has shown the successful implementation and portability of a phenotype identification algorithm to identify rheumatoid arthritis using different NLP systems.16
The purpose of our study was to compare the ICD-9 E-code algorithm described by the Min-Sentinel effort, to an automated method that uses concept unique identifiers (CUIs) generated by the NLP processing of clinical notes to identify the suicide health outcome of interest. The objectives were to gain a preliminary quantitative and qualitative understanding of the amount and types of classification errors that occur using each source of data. Although the focus of this paper is suicidality, the use of ICD-9 versus NLP has important implications for methods to extract EHR data for a variety of other studies, including pharmacovigilance
METHODS
Data Source:
After obtaining IRB approval, we used the Clinical Data Warehouse (CDW) and WebCIS, which are the research database and the clinical electronic health record interface at NewYork Presbyterian Hospital/Columbia University Medical Center to obtain the relevant patient information. We included pediatric and adult inpatients seen during the years 2004 to 2010 in this study. For these patients, we used ICD-9 codes, admission notes and/or discharge summaries as a source for the automated algorithm, and the full electronic health record, including admission, discharge, psychiatry consult notes, nursing and social work notes, sign-out, and emergency room notes if available, for manual review of cases, which was used to obtain a reference standard.
Methodology:
Figure 1 provides an overview of our method, which consists of 6 steps. The details of the steps are as follows:
- Step 1: ICD-9 Algorithm Applied to Collect Discharge Summaries
As the Mini-Sentinal project is a large multi-center research collaboration with the FDA, we relied on their expertise for the ICD-9 Algorithm. ICD-9 positive cases (containing one or more of the codes E950–959 for intentional self-injury) were identified for patients admitted during 2004–2010 who had one or more of these E-Codes. Inpatient ICD-9 codes also have corresponding admission and discharge dates, which were used to collect the corresponding discharge summaries for patients with these suicidality ICD-9 codes.
- Step 2: NLP of Collected Discharge Summaries
The discharge summaries were parsed by MedLEE, to obtain structured and coded data, which included UMLS codes of all clinical terms identified in the notes, including procedures, medications, diseases, and signs and symptoms not related to suicide. Therefore, it was necessary to identify the terms in the NLP output that were relevant for identifying the patients of interest.
- Step 3: Choose Preliminary NLP Algorithm
This step involved identifying the relevant NLP concepts generated from the notes corresponding to the ICD-9 identified suicidal patients. NLP concepts which were negated or associated with family history were first filtered out. The aligned list of ICD-9 and NLP concepts was manually reviewed by a clinician, who annotated all possible NLP concepts of interest (Preliminary NLP Algorithm).
- Step 4: Training Set of Patients Selected
There were 50 patients randomly selected who had one of the concepts in the Preliminary NLP algorithm. The records for these patients were manually read by a clinician and classified as positive or negative (no suicidal ideation or behavior documented) for suicidality.
- Step 5: Obtain Revised NLP Algorithm
Based on the training set, a concept frequency count for our preliminary algorithm was calculated for the positive (A) and negative (B) groups. We used the preliminary terms in the selected patient reports from step 4 to identify the frequency of positive and negative cases at the term level. This data was used to create our Revised NLP Algorithm, which excluded any CUIs which were wrong more than correct (where B was more than twice A) because these terms frequently denoted non-suicidal cases. For example, overdose, frequently denoted accidental overdose and not a suicidal attempt.
- Step 6: Evaluation
A block, random selection of EHR records was selected from each algorithm group to create our evaluation set. Patients from our training set were excluded from the evaluation set for the Revised Algorithm. To assure adequate sample size, we selected a total of 280 cases, which provides a confidence level of 90% and a margin of error of 5% according to the Krejcie & Morgan formula.16 A manual review of the evaluation set was performed by two physicians, who cross-reviewed a subset (n=30) of patients to calculate the Cohen’s Kappa for inter-rater agreement. The manual review classification of cases was used as our gold standard to calculate the Positive Predictive Value for a) ICD-9 Algorithm alone b) Preliminary NLP Algorithm alone, c) Revised NLP Algorithm alone, and d) overlapping ICD-9 and NLP. In addition to calculating a PPV, a qualitative analysis was performed when an algorithm identified a case, but manual chart review concluded no suicidality was present; the qualitative analysis was performed for the purpose of determining the reasons for false positive cases.
RESULTS
ICD-9 Algorithm aligned with NLP to isolate related CUIs
Use of the ICD-9 algorithm identified 469 cases that were also associated with discharge summaries. When the discharge summaries were parsed by MedLEE, output was generated consisting of 5644 disease or symptom concepts. Review of these concepts took the clinician 3.5 hours and resulted in 31 codes chosen for the Preliminary Algorithm.
Preliminary NLP Algorithm and Revised NLP Algorithm
Based on the process described in methods, three concepts (1: poisoning/toxic ingestion, 2: overdose, and 3: suicide attempt) were removed because they resulted in a majority of false positives in our manual review. Our revised algorithm contained 28 concepts, which are shown in Table 1.
Evaluation Set of Cases Manually Reviewed
Two physicians reviewed a collective total of 280 patient EHR records. They cross-reviewed 30 of the same cases and had an Inter-rater agreement Kappa of 0.867 (95% confidence interval: 0.688 to 1.045). The strength of agreement is considered to be ‘very good’.
Comparison of Algorithms
Using ICD-9 E-Codes 469 potential cases were identified. The Revised NLP Algorithm identified 4087 potential cases. Use of the intersection of ICD-9 and NLP cases identified 260 potential cases. The potential cases for each algorithm were compared to the evaluation set of physician manually reviewed cases to determine the positive predictive value for each of the algorithms. The results are seen in Table 2.
Error Analysis
For the cases where the automated algorithm (either ICD-9 or Revised NLP) identified a potential case, but manual review determined that it was a false positive, a qualitative classification of the error was performed. The results for ICD-9 errors can be seen in Table 3 and the results for NLP errors can be seen in Table 4.
DISCUSSION:
More than ten years ago, the Surgeon General called specifically for expanded data collection and suicide research to identify gaps in the scientific knowledge on suicide.18 Since that time, scientists have linked several medications to an increased risk for suicidality. The European Union Adverse Drug Reaction (EU-ADR) project identified suicide as one of the key 23 adverse events to monitor for signal detection in pharmacovigilance in EHRs.19 Our study provides an examination of methods that can be used to extract relevant cases for both pharmacovigilance studies and for expanded data collection and studies on suicidality.
We found that ICD-9 E-Codes from the FDA Mini-Sentinel Project had a low PPV: 0.55 (90% CI: 0.42–0.67), and our final NLP algorithm had a PPV of 0.60 (90% CI: 0.52–0.67). The best performing algorithm was the one which combined the ICD-9 codes and NLP concepts by requiring the presence of at least one code from each algorithm, and resulted in a PPV of 0.98 (90% CI: 0.92–0.99). It is difficult to compare an algorithm developed to specifically identify suicidality to published algorithms developed to identify other diseases. This is because in a clinical context, it is important to document both the presence and absence of suicidality in depressed patients. As standard of care, in history taking for a patient with mental illness, a clinician would ask about suicidal ideation (SI), whereas with other clinical conditions, such as diabetes or rheumatoid arthritis, there is less need to document the absence of the condition. There are also limited data elements for suicidality, unlike other clinical conditions which may have abnormal laboratory values or imaging studies, which could also be incorporated into the detection algorithm.
Through the combination of ICD-9 and NLP, our algorithm for identifying patients with suicidality obtained a high PPV of 0.98. There were 260 cases identified during our six year study period. The results of this research have a number of potential applications. It can help data mining and other pharmacovigilance studies that require reliable identification of a patient cohort.
Our research also has implications for research areas other than pharmacovigilance. For example, phenotype extraction from EHR data is being combined with genome wide association studies (GWAS). The Electronic Medical Records and Genomics (eMERGE) network has developed phenotype definitions for approximately 13 diseases and conditions, such as dementia/Alzheimers.19 Our research for identifying suicidality would provide another possible phenotype algorithm for this collaborative consortium. There are many questions about susceptibility to suicidality. Familial/genetic relationships have been found linking schizophrenia and bipolar disorder. It is possible that a genetic predisposition to suicide may exist, which could be explored via GWAS studies. Published exploratory analysis has implicated a specific gene variant with suicide-attempt behavior.21
Our suicide phenotype algorithm could also be used to study the relationship between the presence of psychosocial stressors and the incidence of suicidality. Some of the relevant psychosocial stressors that could be examined include: post-traumatic stress disorder, post-partum depression, eating disorders, and individuals exposed to domestic abuse. High risk patients may benefit from more intensive psychological treatment or therapy. Lessons learned may supplement the US National Strategy for Suicide Prevention.22
Beyond potential discovery of other medications that result in a higher risk of suicide or the identification of genotypic or psychosocial predisposing factors, identifying patients with suicidal ideation may be beneficial for clinical decision support. There are already several medications on the market that are labeled with the side effect of sucidiality. An alert could be created that combined the drug list of known or high suspect candidates for this ADR with the phenotype identification of suicidal patients, to warn clinicians about administration of a potential hazardous drug.
Our study has several limitations. For pharmacovigilance purposes, it would be important to also identify patients with fatal suicide attempts. Through the use of EHR data only, our study minimizes patients whose attempts resulted in death prior to hospital admission, as these cases would instead be routed to the Medical Examiners office. Our study also presents a limited analysis of the potential of an existing NLP system to identify cases. For example, the NLP system MedLEE was not specifically trained to identify suicidality. For example, there was ambiguity in the processing of the abbreviation SI, which was not handled correctly by the system. MedLEE currently recognizes SI as suicidal ideation, but manual chart review revealed that it was also used as an abbreviation for Sacroiliac and Staten Island. Modification of the program to resolve this ambiguity and fix some of the errors could increase the PPV of NLP alone. An example of a simple modification would be recognizing the compound concept ‘depression with suicidal ideation’. Currently ‘no depression with suicidal ideation’ is coded as (no depression) with (suicidal ideation) but should be no (depression with suicidal ideation). Our study was also limited by the lack of a unified terminology related to self-directed violence. Future work in this area including potential adoption of the Center for Disease Control and Prevention’s (CDC) Self-Directed Violence Classification System may be beneficial.23
In conclusion, combining ICD-9 E-Codes (950–959) for intentional self-injury and NLP extracted concepts which are shown in Table 1, have a better PPV for identifying suicidality than NLP or ICD-9 alone. Qualitative classification of coding errors demonstrated that patient non-intentional overdose was a uniform source of false positives across methodology.
Acknowledgments
The authors thank Lyuda Ena for assistance with MedLEE. This research is supported by National Library of Medicine grants: R01 LM010016, R01 LM010016-0S1, R01 LM010016-0S2, R01 LM008635, and R01 LM06910. KH is supported by the NLM training grant: T15 LM007079.


