Format

Send to

Choose Destination
Am J Psychiatry. 2015 Apr;172(4):363-72. doi: 10.1176/appi.ajp.2014.14030423. Epub 2014 Dec 12.

Validation of electronic health record phenotyping of bipolar disorder cases and controls.

Author information

1
From Research Information Systems and Computing, Partners HealthCare System, Boston; the Laboratory of Computer Science, the Department of Neurology, the Department of Psychiatry, the Center for Anxiety and Traumatic Stress Disorders, the Center for Experimental Drugs and Diagnostics, and the Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston; the Center for Biomedical Informatics, Harvard Medical School, Boston; the Department of Biostatistics, Harvard School of Public Health, Boston; the Psychotic Disorders Division, McLean Hospital, Belmont, Mass.; the Department of Public Health and Preventive Medicine, Oregon Health and Science University, Portland; and the Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York.

Abstract

OBJECTIVE:

The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects.

METHOD:

EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis.

RESULTS:

The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses.

CONCLUSIONS:

Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.

PMID:
25827034
PMCID:
PMC4441333
DOI:
10.1176/appi.ajp.2014.14030423
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Atypon Icon for PubMed Central
Loading ...
Support Center