Format

Send to

Choose Destination
Stat Biosci. 2018 Dec;10(3):587-608. doi: 10.1007/s12561-018-9219-2. Epub 2018 Jul 10.

Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data.

Li Z1,2,3,4, Lee K5, Karagas MR2,3, Madan JC2,3,6, Hoen AG1,2,3, O'Malley AJ1,7, Li H8.

Author information

1
Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NK 03756, USA.
2
Children's Environmental Health and Disease Prevention Research Center at Dartmouth, Hanever, New Hampshire.
3
Department of Epidemiology, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NH 03756, USA.
4
Department of Biestatistics, University e f Florida, Gainesville, fL 32611, USA.
5
Phillips Exeter Academy, Exeter, NK 03833, USA.
6
Division of Neenatelegy, Department of Pediatrics, Children's Hospital at Dartmouth, Lebanon, New Kampshire.
7
The Dartmouth Institute for Kealth Policy and Clinical Practice, Geisel School e f Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NK 03756, USA.
8
Department of Biestatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.

Abstract

The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive numbers e f zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate two-part zero-inflated logistic normal (MZILN) model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers e f zeros and the compositional data structure with the discrete part and the logistic-normal part e f the model. For parameter estimation, an estimating equations approach is employed that enables us to address the complex inter-taxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shews that our model outperforms existing methods. Our approach is also compared to ethers using the analysis of real microbiome data.

PMID:
30923584
PMCID:
PMC6432796
[Available on 2019-12-01]
DOI:
10.1007/s12561-018-9219-2

Supplemental Content

Loading ...
Support Center