Send to

Choose Destination
Clin Epidemiol. 2018 Jul 6;10:771-788. doi: 10.2147/CLEP.S166545. eCollection 2018.

Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects.

Author information

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital,
Harvard Medical School, Boston, MA, USA,



Decision makers in health care increasingly rely on nonrandomized database analyses to assess the effectiveness, safety, and value of medical products. Health care data scientists use data-adaptive approaches that automatically optimize confounding control to study causal treatment effects. This article summarizes relevant experiences and extensions.


The literature was reviewed on the uses of high-dimensional propensity score (HDPS) and related approaches for health care database analyses, including methodological articles on their performance and improvement. Articles were grouped into applications, comparative performance studies, and statistical simulation experiments.


The HDPS algorithm has been referenced frequently with a variety of clinical applications and data sources from around the world. The appeal of HDPS for database research rests in 1) its superior performance in situations of unobserved confounding through proxy adjustment, 2) its predictable efficiency in extracting confounding information from a given data source, 3) its ability to automate estimation of causal treatment effects to the extent achievable in a given data source, and 4) its independence of data source and coding system. Extensions of the HDPS approach have focused on improving variable selection when exposure is sparse, using free text information and time-varying confounding adjustment.


Semiautomated and optimized confounding adjustment in health care database analyses has proven successful across a wide range of settings. Machine-learning extensions further automate its use in estimating causal treatment effects across a range of data scenarios.


artificial intelligence; automation; causal conclusions; confounding (epidemiology); confounding adjustment; health care databases; high-dimensional data; machine learning; propensity scores; real-world data

Conflict of interest statement

Disclosure SS is a consultant to WHISCON, LLC, and to Aetion, Inc., a software manufacturer of which he also owns equity. He is a principal investigator of research grants to the Brigham and Women’s Hospital from Bayer, Genentech, and Boehringer Ingelheim unrelated to the topic of this article. The author reports no other conflicts of interest in this work.

Publication type

Publication type

Supplemental Content

Full text links

Icon for Dove Medical Press Icon for PubMed Central
Loading ...
Support Center