Send to

Choose Destination
See comment in PubMed Commons below
Int J Biostat. 2010;6(1):Article 27.

Bivariate zero-inflated regression for count data: a Bayesian approach with application to plant counts.

Author information

Arizona State University, AZ, USA.


Lately, bivariate zero-inflated (BZI) regression models have been used in many instances in the medical sciences to model excess zeros. Examples include the BZI Poisson (BZIP), BZI negative binomial (BZINB) models, etc. Such formulations vary in the basic modeling aspect and use the EM algorithm (Dempster, Laird and Rubin, 1977) for parameter estimation. A different modeling formulation in the Bayesian context is given by Dagne (2004). We extend the modeling to a more general setting for multivariate ZIP models for count data with excess zeros as proposed by Li, Lu, Park, Kim, Brinkley and Peterson (1999), focusing on a particular bivariate regression formulation. For the basic formulation in the case of bivariate data, we assume that Xi are (latent) independent Poisson random variables with parameters λ i, i = 0, 1, 2. A bi-variate count vector (Y1, Y2) response follows a mixture of four distributions; p0 stands for the mixing probability of a point mass distribution at (0, 0); p1, the mixing probability that Y2 = 0, while Y1 = X0 + X1; p2, the mixing probability that Y1 = 0 while Y2 = X0 + X2; and finally (1 - p0 - p1 - p2), the mixing probability that Yi = Xi + X0, i = 1, 2. The choice of the parameters {pi, λ i, i = 0, 1, 2} ensures that the marginal distributions of Yi are zero inflated Poisson (λ 0 + λ i). All the parameters thus introduced are allowed to depend on co-variates through canonical link generalized linear models (McCullagh and Nelder, 1989). This flexibility allows for a range of real-life applications, especially in the medical and biological fields, where the counts are bivariate in nature (with strong association between the processes) and where there are excess of zeros in one or both processes. Our contribution in this paper is to employ a fully Bayesian approach consolidating the work of Dagne (2004) and Li et al. (1999) generalizing the modeling and sampling-based methods described by Ghosh, Mukhopadhyay and Lu (2006) to estimate the parameters and obtain posterior credible intervals both in the case where co-variates are not available as well as in the case where they are. In this context, we provide explicit data augmentation techniques that lend themselves to easier implementation of the Gibbs sampler by giving rise to well-known and closed-form posterior distributions in the bivariate ZIP case. We then use simulations to explore the effectiveness of this estimation using the Bayesian BZIP procedure, comparing the performance to the Bayesian and classical ZIP approaches. Finally, we demonstrate the methodology based on bivariate plant count data with excess zeros that was collected on plots in the Phoenix metropolitan area and compare the results with independent ZIP regression models fitted to both processes.

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Support Center