Bias and mean squared error in Mendelian randomization with invalid instrumental variables

Genet Epidemiol. 2024 Feb;48(1):27-41. doi: 10.1002/gepi.22541. Epub 2023 Nov 16.

Abstract

Mendelian randomization (MR) is a statistical method that utilizes genetic variants as instrumental variables (IVs) to investigate causal relationships between risk factors and outcomes. Although MR has gained popularity in recent years due to its ability to analyze summary statistics from genome-wide association studies (GWAS), it requires a substantial number of single nucleotide polymorphisms (SNPs) as IVs to ensure sufficient power for detecting causal effects. Unfortunately, the complex genetic heritability of many traits can lead to the use of invalid IVs that affect both the risk factor and the outcome directly or through an unobserved confounder. This can result in biased and imprecise estimates, as reflected by a larger mean squared error (MSE). In this study, we focus on the widely used two-stage least squares (2SLS) method and derive formulas for its bias and MSE when estimating causal effects using invalid IVs. Using those formulas, we identify conditions under which the 2SLS estimate is unbiased and reveal how the independent or correlated pleiotropic effects influence the accuracy and precision of the 2SLS estimate. We validate these formulas through extensive simulation studies and demonstrate the application of those formulas in an MR study to evaluate the causal effect of the waist-to-hip ratio on various sleeping patterns. Our results can aid in designing future MR studies and serve as benchmarks for assessing more sophisticated MR methods.

Keywords: Mendelian randomization; bias; instrumental variable; mean squared error; two-stage least squares estimate.

MeSH terms

  • Bias
  • Causality
  • Genome-Wide Association Study*
  • Humans
  • Mendelian Randomization Analysis* / methods
  • Models, Genetic
  • Risk Factors