Modelling count data with excessive zeros: the need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data

Stat Med. 2009 Dec 10;28(28):3539-53. doi: 10.1002/sim.3699.

Abstract

Count data may possess an 'excess' of zeros relative to standard distributions. Zero-inflated Poisson (ZiP) or binomial (ZiB) and generic mixture models have been proposed to deal with such data. We consider biomedical count data with an excess number of zeros and seek to address the following: (i) do zero-inflated models need covariates in the distribution part to predict class membership; (ii) what model-fit criteria have clinical relevance to predicted counts; (iii) can very different model parameterizations have near-identical fit; and (iv) how could model selection and hence model interpretation be aided by considering data generation processes? We show that covariates in the distribution part of zero-inflated models are needed to predict class membership. A range of model-fit criteria should be considered, as consensus is rarely achieved, and considering predicted outcomes may be just as valuable as likelihood-based criteria. Zero-inflated and generic mixture models may be indistinguishable according to both likelihood-based model-fit criteria and predicted outcomes, in which case model differentiation, hence, model selection and interpretation, might be guided by the consideration of a priori data generation processes. Zero-inflated models reflect whether or not there are (or have been) risk differences in disease onset and disease progression, while generic mixture models identify sub-types of individuals with similar risks of disease onset and progression. One or both modelling strategies may be used, though a priori knowledge or clinical impression of data generation might help to distinguish between two or more parameterizations that exhibit similar fit and yield near-identical predicted counts.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem*
  • Brazil
  • Child
  • Computer Simulation
  • DMF Index
  • Data Interpretation, Statistical*
  • Dental Caries / epidemiology*
  • Humans
  • Incidence
  • Models, Statistical*
  • Prevalence