Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

Kevin Bretonnel Cohen; Benjamin Glass; Hansel M Greiner; Katherine Holland-Bouley; Shannon Standridge; Ravindra Arya; Robert Faist; Diego Morita; Francesco Mangano; Brian Connolly; Tracy Glauser; John Pestian

doi:10.4137/BII.S38308

Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

Biomed Inform Insights. 2016 May 22:8:11-8. doi: 10.4137/BII.S38308. eCollection 2016.

Affiliations

¹ Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA.
² Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA.
³ Division of Neurology, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA.
⁴ Division of Pediatric Neurosurgery, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA.

Abstract

We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient's status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral.

Keywords: epilepsy; epilepsy surgery; machine learning; natural language processing; neurosurgery.