Send to

Choose Destination
Clin Epidemiol. 2018 Oct 16;10:1509-1521. doi: 10.2147/CLEP.S160764. eCollection 2018.

A phenotyping algorithm to identify acute ischemic stroke accurately from a national biobank: the Million Veteran Program.

Author information

Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Cooperative Studies Program, VA Boston Healthcare System, Boston, MA, USA,
Department of Medicine, Cardiology Section, Boston Medical Center, Boston University School of Medicine, Boston, MA, USA.
Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
Department of Medicine, Division of Aging, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA,
Department of Neurology, Baltimore VA Medical Center and University of Maryland School of Medicine, Baltimore, MD, USA.
Harvard T. H. Chan School of Public Health, Boston, MA, USA.
Atlanta VA Medical Center, Decatur, GA, USA.
Department of Medicine, Division of Cardiovascular Disease, Emory University School of Medicine, Atlanta, GA, USA.



Large databases provide an efficient way to analyze patient data. A challenge with these databases is the inconsistency of ICD codes and a potential for inaccurate ascertainment of cases. The purpose of this study was to develop and validate a reliable protocol to identify cases of acute ischemic stroke (AIS) from a large national database.


Using the national Veterans Affairs electronic health-record system, Center for Medicare and Medicaid Services, and National Death Index data, we developed an algorithm to identify cases of AIS. Using a combination of inpatient and outpatient ICD9 codes, we selected cases of AIS and controls from 1992 to 2014. Diagnoses determined after medical-chart review were considered the gold standard. We used a machine-learning algorithm and a neural network approach to identify AIS from ICD9 codes and electronic health-record information and compared it with a previous rule-based stroke-classification algorithm.


We reviewed administrative hospital data, ICD9 codes, and medical records of 268 patients in detail. Compared with the gold standard, this AIS algorithm had a sensitivity of 91%, specificity of 95%, and positive predictive value of 88%. A total of 80,508 highly likely cases of AIS were identified using the algorithm in the Veterans Affairs national cardiovascular disease-risk cohort (n=2,114,458).


Our algorithm had high specificity for identifying AIS in a nationwide electronic health-record system. This approach may be utilized in other electronic health databases to accurately identify patients with AIS.


acute ischemic stroke; administrative health data; algorithm; big data; cerebrovascular accident; large databases

Supplemental Content

Full text links

Icon for Dove Medical Press Icon for PubMed Central
Loading ...
Support Center