Send to

Choose Destination
Stat Med. 2019 Sep 30;38(22):4199-4208. doi: 10.1002/sim.8215. Epub 2019 Aug 22.

A plea to stop using the case-control design in retrospective database studies.

Schuemie MJ1,2,3, Ryan PB1,2,4, Man KKC5,6,7,8, Wong ICK5,6, Suchard MA1,3,9,10, Hripcsak G1,4,11.

Author information

Observational Health Data Sciences and Informatics, New York, New York.
Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey.
Department of Biostatistics, University of California, Los Angeles, California.
Department of Biomedical Informatics, Columbia University Medical Center, New York, New York.
Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
Research Department of Practice and Policy, UCL School of Pharmacy, London, UK.
Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands.
Department of Social Work and Social Administration, Faculty of Social Science, The University of Hong Kong, Pokfulam, Hong Kong.
Department of Biomathematics, University of California, Los Angeles, California.
Department of Human Genetics, University of California, Los Angeles, California.
Medical Informatics Services, NewYork-Presbyterian Hospital, New York, New York.


The case-control design is widely used in retrospective database studies, often leading to spectacular findings. However, results of these studies often cannot be replicated, and the advantage of this design over others is questionable. To demonstrate the shortcomings of applications of this design, we replicate two published case-control studies. The first investigates isotretinoin and ulcerative colitis using a simple case-control design. The second focuses on dipeptidyl peptidase-4 inhibitors and acute pancreatitis, using a nested case-control design. We include large sets of negative control exposures (where the true odds ratio is believed to be 1) in both studies. Both replication studies produce effect size estimates consistent with the original studies, but also generate estimates for the negative control exposures showing substantial residual bias. In contrast, applying a self-controlled design to answer the same questions using the same data reveals far less bias. Although the case-control design in general is not at fault, its application in retrospective database studies, where all exposure and covariate data for the entire cohort are available, is unnecessary, as other alternatives such as cohort and self-controlled designs are available. Moreover, by focusing on cases and controls it opens the door to inappropriate comparisons between exposure groups, leading to confounding for which the design has few options to adjust for. We argue that this design should no longer be used in these types of data. At the very least, negative control exposures should be used to prove that the concerns raised here do not apply.


case control; database studies; methods; retrospective studies

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center