Send to

Choose Destination
Biometrics. 2019 Mar 11. doi: 10.1111/biom.13053. [Epub ahead of print]

High dimensional mediation analysis with latent variables.

Author information

Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland.
Department of Mathematics and Statistics, Laval University, Quebec City, Canada.


We propose a model for high dimensional mediation analysis that includes latent variables. We describe our model in the context of an epidemiologic study for incident breast cancer with one exposure and a large number of biomarkers (i.e., potential mediators). We assume that the exposure directly influences a group of latent, or unmeasured, factors which are associated with both the outcome and a subset of the biomarkers. The biomarkers associated with the latent factors linking the exposure to the outcome are considered "mediators." We derive the likelihood for this model and develop an expectation-maximization algorithm to maximize an L1-penalized version of this likelihood to limit the number of factors and associated biomarkers. We show that the resulting estimates are consistent and that the estimates of the nonzero parameters have an asymptotically normal distribution. In simulations, procedures based on this new model can have significantly higher power for detecting the mediating biomarkers compared with the simpler approaches. We apply our method to a study that evaluates the relationship between body mass index, 481 metabolic measurements, and estrogen-receptor positive breast cancer.


direct effect; factor analysis; mediation analysis; oracle property; penalized likelihood


Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center