Send to

Choose Destination
Philos Trans A Math Phys Eng Sci. 2018 Sep 13;376(2128). pii: 20170356. doi: 10.1098/rsta.2017.0356.

Improving reproducibility by using high-throughput observational studies with empirical calibration.

Schuemie MJ1,2, Ryan PB3,2,4, Hripcsak G3,4,5, Madigan D3,6, Suchard MA3,7,8,9.

Author information

Observational Health Data Sciences and Informatics (OHDSI), New York, NY 10032, USA
Epidemiology Analytics, Janssen Research and Development, Titusville, NJ 08560, USA.
Observational Health Data Sciences and Informatics (OHDSI), New York, NY 10032, USA.
Department of Biomedical Informatics, Columbia University Medical Center, New York, NY 10032, USA.
Medical Informatics Services, New York-Presbyterian Hospital, New York, NY 10032, USA.
Department of Statistics, Columbia University, New York, NY 10027, USA.
Department of Biomathematics, University of California, Los Angeles, CA 90095, USA.
Department of Biostatistics, University of California, Los Angeles, CA 90095, USA.
Department of Human Genetics, University of California, Los Angeles, CA 90095, USA.


Concerns over reproducibility in science extend to research using existing healthcare data; many observational studies investigating the same topic produce conflicting results, even when using the same data. To address this problem, we propose a paradigm shift. The current paradigm centres on generating one estimate at a time using a unique study design with unknown reliability and publishing (or not) one estimate at a time. The new paradigm advocates for high-throughput observational studies using consistent and standardized methods, allowing evaluation, calibration and unbiased dissemination to generate a more reliable and complete evidence base. We demonstrate this new paradigm by comparing all depression treatments for a set of outcomes, producing 17 718 hazard ratios, each using methodology on par with current best practice. We furthermore include control hypotheses to evaluate and calibrate our evidence generation process. Results show good transitivity and consistency between databases, and agree with four out of the five findings from clinical trials. The distribution of effect size estimates reported in the literature reveals an absence of small or null effects, with a sharp cut-off at p = 0.05. No such phenomena were observed in our results, suggesting more complete and more reliable evidence.This article is part of a discussion meeting issue 'The growing ubiquity of algorithms in society: implications, impacts and innovations'.


medicine; observational research; publication bias; reproducibility

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center