Multiple Imputation for Incomplete Data in Epidemiologic Studies

Ofer Harel; Emily M Mitchell; Neil J Perkins; Stephen R Cole; Eric J Tchetgen Tchetgen; BaoLuo Sun; Enrique F Schisterman

doi:10.1093/aje/kwx349

Multiple Imputation for Incomplete Data in Epidemiologic Studies

Am J Epidemiol. 2018 Mar 1;187(3):576-584. doi: 10.1093/aje/kwx349.

Authors

Ofer Harel¹, Emily M Mitchell², Neil J Perkins³, Stephen R Cole⁴, Eric J Tchetgen Tchetgen⁵, BaoLuo Sun⁵, Enrique F Schisterman³

Affiliations

¹ Department of Statistics, College of Liberal Arts and Sciences, University of Connecticut, Storrs, Connecticut.
² Centers for Financing, Access and Cost Trends, Agency for Healthcare Research and Quality, Rockville, Maryland.
³ Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, Maryland.
⁴ Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.
⁵ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.

Abstract

Epidemiologic studies are frequently susceptible to missing information. Omitting observations with missing variables remains a common strategy in epidemiologic studies, yet this simple approach can often severely bias parameter estimates of interest if the values are not missing completely at random. Even when missingness is completely random, complete-case analysis can reduce the efficiency of estimated parameters, because large amounts of available data are simply tossed out with the incomplete observations. Alternative methods for mitigating the influence of missing information, such as multiple imputation, are becoming an increasing popular strategy in order to retain all available information, reduce potential bias, and improve efficiency in parameter estimation. In this paper, we describe the theoretical underpinnings of multiple imputation, and we illustrate application of this method as part of a collaborative challenge to assess the performance of various techniques for dealing with missing data (Am J Epidemiol. 2018;187(3):568-575). We detail the steps necessary to perform multiple imputation on a subset of data from the Collaborative Perinatal Project (1959-1974), where the goal is to estimate the odds of spontaneous abortion associated with smoking during pregnancy.

Publication types

Multicenter Study
Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural

MeSH terms

Bias
Data Accuracy*
Data Interpretation, Statistical*
Epidemiologic Research Design*
Epidemiologic Studies*
Female
Humans
Pregnancy

Grants and funding

K01 MH087219/MH/NIMH NIH HHS/United States