- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2975948

# A Bayesian model for longitudinal count data with non-ignorable dropout

*Address for correspondence*: Niko A. Kaciroti, 300 N. Ingalls Building, 10th Floor, Center for Human Growth and Development, University of Michigan, Ann Arbor, MI 48109, USA. Email: ude.hcimu@alocin

## Summary

Asthma is an important chronic disease of childhood. An intervention programme for managing asthma was designed on principles of self-regulation and was evaluated by a randomized longitudinal study.The study focused on several outcomes, and, typically, missing data remained a pervasive problem. We develop a pattern–mixture model to evaluate the outcome of intervention on the number of hospitalizations with non-ignorable dropouts. Pattern–mixture models are not generally identifiable as no data may be available to estimate a number of model parameters. Sensitivity analyses are performed by imposing structures on the unidentified parameters.We propose a parameterization which permits sensitivity analyses on clustered longitudinal count data that have missing values due to non-ignorable missing data mechanisms. This parameterization is expressed as ratios between event rates across missing data patterns and the observed data pattern and thus measures departures from an ignorable missing data mechanism. Sensitivity analyses are performed within a Bayesian framework by averaging over different prior distributions on the event ratios. This model has the advantage of providing an intuitive and flexible framework for incorporating the uncertainty of the missing data mechanism in the final analysis.

**Keywords:**Gibbs sampling, Longitudinal data, Non-linear mixed effects models, Poisson outcomes, Randomized trials, Transition Markov models

## 1. Introduction

In longitudinal studies complete follow-up data are often not available for all subjects. Several approaches are available for analysing these incomplete data, e.g. mixed effects models or imputation-based techniques. Using such methods, however, inferences are only valid when the missing data mechanism is ignorable, i.e. we can correctly condition on variables that are necessary to yield a missingness at random (MAR) mechanism. When the missing data mechanism is non-ignorable, inferences based on only the observed data will not be valid. Thus, analysing such data requires more complex models which incorporate the missing data mechanism in the analysis. Two broad approaches are available: selection models and pattern–mixture models (Little and Rubin, 2002). These two approaches arise from different partitions of the observables *y* and the missing data indicator *R*. Selection models partition the joint distribution of Pr(*Y*, *R*) as the product of Pr(*Y*) and Pr(*R*|*Y*) (Heckman, 1979; Little, 1995; Kenward, 1998). They require explicit modelling of the missing data mechanism where the probability that a subject would drop out depends on the unobserved values. Pattern–mixture models (Little, 1993; Little and Rubin, 2002), in contrast, express the joint distribution as the product of Pr(*Y*|*R*) and Pr(*R*). Then they stratify the data by dropout patterns and allow distinct model parameters for each stratum. The marginal estimates in pattern-mixture models can be derived as a weighted average across pattern-specific estimates (Little, 1995) or by using multiple imputation (Demirtas and Schafer, 2003). Regardless of which partition is used, additional assumptions or data are needed to identify the parameters in the joint distribution. Pattern–mixture models are commonly used as they do not require specific modelling of the dropout mechanism and the estimates of the identified parameters are not affected by the nature of the dropout mechanism.

Little (1995), Little and Wang (1996), Molenberghs *et al.* (1998), Daniels and Hogan (2000) and Kenward *et al.* (2003) identified parameters in the pattern–mixture model by using constraints. In a model with no constraints, Demirtas (2005) used a Bayesian smoothed pattern–mixture model for normal outcomes. Other approaches for identifying the parameters in pattern–mixture models have been proposed by Wu and Carroll (1988), Little (1994), Hogan and Laird (1997), Albert and Follmann (2000) and Guo *et al.* (2004), who all used latent random effects to relate the response and the missing data indicator. For a detailed literature review see Little (1995), Kenward and Molenberghs (1999) and Thijs *et al.* (2002).

Here we propose a Bayesian pattern–mixture model for analysing clustered longitudinal count data with non-ignorable dropouts. The model is identified by using easy-to-understand parameters, namely, ratios of event rates across missing data patterns with the observed data pattern as the reference group. Each parameter, which we refer to as an ignorability index, provides an intuitive way to capture the effect of a non-ignorable missing data mechanism and is easily used for sensitivity analyses. We have used similar parameters previously in pattern–mixture models for ordinal outcomes (Kaciroti, 2002, Kaciroti *et al.*, 2006). Now we extend the use of such ignorability index parameters for Poisson outcomes. Because the ignorability index cannot be defined by using only the observed data, we introduce an informative prior distribution, whereby the prior distribution reflects the nature of the missing data mechanism. By using a prior distribution we can incorporate in the final inferences any uncertainty, as well as prior knowledge related to the missing data mechanism. Within this framework, models with missing data generated by an ignorable missing data mechanism are a special case, where the ignorability index parameters are set equal to 1. An additional feature of the particular application that is considered in this paper is clustering of subjects. We account for clustering by introducing random-effects parameters. Bayesian inferences are constructed by using Markov chain Monte Carlo simulations.

The model proposed is motivated by, and fitted to, data from an asthma intervention study, which is described in Section 2. In Section 3 the complete-data model is defined and the pattern–mixture model with a potentially non-ignorable missing data mechanism is proposed. A Gibbs sampling algorithm for fitting such models is described in Section 4. The model is then applied to the asthma data in Section 5. A conclusion is given in Section 6.

The data that are analysed in the paper and the programs that were used to analyse them can be obtained from http://www.blackwellpublishing.com/rss

## 2. Asthma intervention study

Asthma is the most common chronic disease of childhood; for example in the USA it affects an estimated 9 million children under age 18 years (National Center for Health Statistics, 2002). Thus, managing asthma is important for both reducing the medical costs as well as for improving quality of life. The intervention that is considered here focused on educating physicians about establishing strong partnerships with asthma patients and their families. The intervention took the form of an interactive seminar between general practice paediatricians and their asthma patients and was based on the theoretical principles of self-regulation (Clark *et al.*, 1998). The efficacy of the intervention was evaluated by using a randomized study with the following outcomes:

- treatment practices and communication behaviour of physicians
- health status and medical care use by their asthma patients
- satisfaction of the patients’ parents with the medical care.

The intervention programme has already been shown to decrease health care usage over a 2-year period (Clark *et al.*, 2000).

In this paper we explore in more detail how the intervention mechanism accomplished a decrease in health care use over time. We focus on three important questions.

- Does the effect of the intervention vary between the first and the second year and, if yes, how?
- Does the effect of the intervention vary with the initial severity?
- Are the results sensitive to the missing data assumptions?

Answering such questions should result in a better understanding of how the intervention works and could be a starting point to improve future interventions. We address these questions by using a transition Markov model with random effect, as described in Section 3. Our broader methodological aim is to develop statistical models for clustered longitudinal Poisson outcomes with incomplete data.

In this study, physicians were randomized into either an intervention group (38 physicians) or a control group (36 physicians); both groups were compared at two post-intervention follow-ups. The time between each wave was up to 12 months and varied between subjects. Out of the 74 physicians, seven (with a total of 20 patients) decided not to participate (four in the intervention group and three in the control group). Data on hospitalizations were available for 74 physicians and 635 patients at baseline (*t* = 1), 67 physicians and 446 patients at first follow-up (*t* = 2) and 67 physicians and 302 patients at second follow-up (*t* = 3). No information was available about the reasons why patients dropped out during the first follow-up period, other than for 20 patients who dropped out because their physicians withdrew. For the 144 patients who dropped out after first follow-up, 68% was due to disconnected telephones or families moving, 22% no longer had asthma symptoms and about 10% reported other reasons, such as dislike of the research study.

Although the missing data pattern was essentially monotone, the methods that are developed here could be generalized into any pattern of missing data. We classified the subjects into one of three broad patterns: *r* = 1(*n* = 189) for patients with observed data only on y_{1}; *r* = 2(*n* = 144) for patients with observed data only on y_{1} and y_{2}; *r* = 3(*n* = 302) for cases who provided data y_{1}, y_{2} and y_{3}.

The study was subject to considerable missing data; therefore, it is important to investigate the nature of the missing data mechanism. Following Ridout (1991) and Diggle *et al.* (2002) we used a logistic regression model to predict the probability of dropout at first and second follow-up. All demographic factors (i.e. parental income, age, education, sex and race), treatment indicator and medication use were initially included in each model. In addition, the number of hospitalizations at baseline and at first follow-up, and the change from baseline to first follow-up, plus their interaction with treatment, were used as independent variables.

For the first follow-up period only parents’ income and baseline medication intake predicted the dropout during this period. Patients whose parents had lower incomes or were not taking medication at baseline were more likely to drop out between the baseline and first follow-up. During the first to second follow-up period patients who were not taking prescription medication at first follow-up were more likely to drop out during this period. The dropout process was different between the two groups. In the control group, patients whose number of hospitalizations increased at first follow-up compared with baseline were more likely to drop out, whereas, in the intervention group, patients whose number of hospitalizations increased at first follow-up compared with baseline were more likely to remain in the study. Such findings show evidence that dropouts differ from the subjects who remained in the study. Therefore, when performing data analysis, it was important to investigate the sensitivity of the results to potentially non-ignorable missing data mechanisms.

## 3. The model

To evaluate the intervention effect over time on *y*, the number of hospitalizations, we propose a transition Markov model of first order with random intercept, similar to the model that was proposed by Zeger and Qaqish (1988). The random intercept is used to model the correlation across subjects having the same physician. The within-subject serial correlation is modelled by the transition Markov model, in which the expected response at a given time depends not only on the associated covariates but also on past responses. The analysis addresses the per-protocol question, under an ignorable missing data mechanism where all the randomized subjects comply with the treatment assigned.

### 3.1. Complete-data model

We model the number of asthma-related hospitalizations at each follow-up time by using Poisson regression. Let *y _{ijt}* be the number of asthma-related hospitalizations for patient

*j*who is under the care of physician

*i*(cluster) at time

*t*= 1,2, … , T. Let

**y**

_{ijt}= (y

_{ij1}, y

_{ij2}, … , y

_{ijt}) be the collection of the responses up to, and including, time

*t*. Let

*x*

_{ij}be=the set of fixed covariates; then the joint distribution of the follow-up responses (y

_{ij2}, y

_{ij3}, … , y

_{ijT}) for subject

*j*in cluster

*i*conditioned on

*x*

_{ij}and the baseline measure y

_{ij1}can be factorized as

where *β* =(*β*_{2}, *β*_{3}, … , *β*_{T}) is the collection of regression coefficients, and b_{i} =(b_{i2}, b_{i3}, … , b_{iT}) is the collection of random effects. We assume that the distribution of y_{ijt} for *t* = 2,3, … , T, conditional on u_{ijt} =.x_{ij}, **y**_{ijt-1}/, and the random effects, b_{it}, is Poisson with mean = *μ*_{ijt} modelled by

where *β*_{t} =(*β*_{t0}, *β*_{t1}, … , *β*_{tp}) and *p* is the number of predictors. Since the observation period may not be the same across all the individuals, an offset term *o _{ijt}* is introduced. Specifically,

*o*= log(n

_{ijt}_{ijt}) where

*n*is the number of months (or any time unit) over which y

_{ijt}_{ijt}events have been reported. Random effects b

_{it}are introduced to account for the correlation due to clustering. We assume that random effects

*b*=(b

_{i}_{i2}, b

_{i3}, … , b

_{iT}) are independently identically multivariate normally distributed with mean 0 and covariance matrix Σ,for

*i*= 1, … ,

*K*, where

*K*is the number of physicians. Let b =(b

_{1}, b

_{2}, … , b

_{K}); thus the joint posterior = distribution is

where (b_{i})=(2*π*)^{−}T=^{2}|Σ|^{-1=2} exp(-b_{i}Σ^{−1}b^{T}_{i} =2) and *n _{it}* is the number of patients who are seen by the

*i*th physician at time

*t*. To complete the model specification, a diffuse but proper prior distribution for

*β*and Σ, p(

*β*, Σ), is assumed, with

*β*having a diffuse normal prior with mean 0 and some large variance. The prior for variance-covariance matrix, Σ, follows an inverse Wishart distribution, Σ ~ IW(R,

*ν*/, where

*R*is a prior guess of the magnitude of Σ, and

*ν*is a number larger than dim(Σ) + 1 (Spiegelhalter

*et al.*, 2003). The primary parameter of interest is

*β*, but other parameters (Σ, b

_{i}) are also of interest. Given the complexity of the model, inferences are based on simulation techniques. For example, Gibbs sampling or other Markov chain Monte Carlo methods can be used to construct inferences on the basis of values drawn from the posterior distribution (2).

When there are missing values in *y*, and if the missing data mechanism is ignorable, Gibbs sampling for the complete-data model can be easily modified as described in Section 4.2.

### 3.2. Pattern-mixture model for non-ignorable missing data mechanisms

When the missing data mechanism is non-ignorable we use pattern-mixture models to derive inferences. Thus, we assume that model (1) applies to each missing data pattern but allow *β*-parameters to differ across patterns. Let *β*^{(r)}_{t} denote the parameters of model (1) for missing data pattern *r* at time *t*, where *r* indicates the time of last measurement with r T corresponding to completers. Because there are no data to estimate all parameters *β*^{(r)} = t, for r<t, the pattern-mixture model is underidentified. Thus, restrictions or prior information about parameters in the model are required. Let *β*^{(0)}_{t} be the identified parameters at time *t* corresponding to the observed data pattern at time *t* .r ≥ t). Following Little and Rubin (2002), we specify a prior distribution p(*β*^{(r)}) t|*β*^{(0}_{t}) on the unidentified parameters, *β*^{(r)}_{t}, r<t, conditioned on the identified parameters *β*^{(0)}_{t}. We have used a similar approach previously to identify pattern-mixture models with a non-ignorable missing data mechanism for ordinal outcomes (Kaciroti *et al.*, 2006). In that situation the prior distribution was constructed by relating the distribution of the missing data to the distribution of the observed data on the basis of the differences in the cumulative odds. Here we extend the same method for Poisson outcomes with non-ignorably missing data by relating the event rates in the missing data patterns with the event rate in the observed data patterns.

Specifically, let ${\mu}_{t}^{\left(r\right)}=E({Y}_{t}^{\left(r\right)}\mid {u}_{t},{\beta}_{t}^{\left(r\right)},{b}_{t})$ for pattern *r* at time *t*. Then, there is some function of ${u}_{t},{\stackrel{~}{\lambda}}_{t}^{\left(r\right)}\left({u}_{t}\right)$, such that, for *r*=1,2, … ,*t*-1 and *t* ≥ 2,

where ${\mu}_{t}^{\left(0\right)}$ is the mean at time *t* for the observed data at time *t*. Here ${\stackrel{~}{\lambda}}_{t}^{\left(r\right)}\left({u}_{t}\right)$ is the ratio between the event rate in the *r*th missing data pattern and the event rate in the observed data pattern at time *t*; and it measures the departure for ignorable dropout. Further, it can be seen as a relative risk of the *r*th missing data pattern with the observed data pattern as the reference group. We assume that ${\stackrel{~}{\lambda}}_{t}^{\left(r\right)}\left({u}_{t}\right)$ has a log-normal distribution with mean ${l}_{t}^{\left(r\right)}\left({u}_{t}\right)$ and variance ${c}^{2}{l}_{t}^{2\left(r\right)}\left({u}_{t}\right)$ where *c* is the coefficient of variation. In this approach, the uncertainty in the relationship between the distribution of the missing data and the distribution of the observed data is captured by the prior distribution (probabilistic range) that is given to ${\stackrel{~}{\lambda}}_{t}^{\left(r\right)}\left({u}_{t}\right)$.

The distribution *p*(*β*^{(r)}|*β*^{0}) is derived on the basis of the prior distribution of ${\stackrel{~}{\lambda}}_{t}^{\left(r\right)}$. Let ${\beta}_{t}^{\left(r\right)}=({\beta}_{t0},{\beta}_{t1}^{\left(r\right)})$ where *β*_{t0} is the set of parameters that are the same in missing data pattern *r* and the observed data pattern at time *t* and ${\beta}_{t1}^{\left(r\right)}$ is the set of parameters in pattern *r* that are different from the corresponding parameters in the observed data pattern. Let u_{t0} be the set of covariates that are associated with *β*_{t0} and *u _{t1}* be the set of covariates that are associated with ${\beta}_{t1}^{\left(r\right)}$. Thus, we have

We then consider the case where *u _{t1}* is a vector of dummy variables, u

_{t1}= (u

_{t11}, u

_{t21}, … , u

_{tp1}), and where each dummy indicator corresponds to a subgroup. Continuous variables can be categorized into groups, and then we can proceed as in the case of categorical variables. We assume that for each subgroup that is identified by

*u*

_{tk1}= 1, k = 1, … ,

*p*, the number of subjects with observed values at time

*t*is non-zero. Let ${\stackrel{~}{\lambda}}_{\beta tk}^{\left(r\right)}={\stackrel{~}{\lambda}}_{t}^{\left(r\right)}({u}_{tk1}=1)$ be the ratio between the event rates of the missing data pattern and the observed data pattern at time

*t*for the subgroup that is identified by u

_{tk1}= 1 for

*k*= 1, … ,

*p*. Then

or

From equation (6), the prior distribution $p({\beta}_{tk}^{\left(r\right)}\mid {\beta}_{tk}^{\left(0\right)})$ is defined on the basis of the distribution of ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$-parameters. Thus, the identifiability of the pattern–mixture model is translated into defining a distribution on ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$ for each subgroup, identified by *u*_{tk1}, *k* = 1,2, … , *p*, at time *t* = 2,3, … , *T*. Giving a distribution to ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$ is easy to understand rather than working directly with *β*_{tk}. For instance, let *u*_{t1} be the indicator for intervention, which corresponds to *β*_{t1}. Then ${\stackrel{~}{\lambda}}_{{\beta}_{t1}}^{\left(r\right)}~$ log-normal with mean *l* = 0.5 and *c* = 0.1 indicates that in the intervention group, on average, for a subject who was in missing data pattern *r*, the adjusted event rate of y_{t} is half (95% confidence interval CI = (0.41,0.61)) of that for a subject in the observed data pattern. Then inferences derived on the basis of this ${\stackrel{~}{\lambda}}_{{\beta}_{t1}}^{\left(r\right)}$ would be approximately valid even when the missing data mechanism is non-ignorable but is within the range that is identified by ${\stackrel{~}{\lambda}}_{{\beta}_{t1}}^{\left(r\right)}$. The *c*-parameter captures the uncertainty that is related to the missing data mechanism, i.e. the range of ${\stackrel{~}{\lambda}}_{{\beta}_{t1}}^{\left(r\right)}$. Thus, in the above example, if *c* = 0.5, the 95% CI of ${\stackrel{~}{\lambda}}_{{\beta}_{t1}}^{\left(r\right)}$ would be wider, CI = (0.16,1.16).

The log-normal distribution family is an attractive choice for ${\stackrel{~}{\lambda}}_{{\beta}_{t1}}^{\left(r\right)}$ as it yields a normal prior distribution for ${\beta}_{tk}^{\left(r\right)}$, although other distributions for ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$ are possible. Under the log-normal distribution for ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$ the distribution $p({\beta}_{tk}^{\left(r\right)}\mid {\beta}_{tk}^{\left(0\right)})$ is $\mathcal{N}\{E\left({\beta}_{tk1}^{\left(r\right)}\right),\text{var}\left({\beta}_{tk1}^{\left(r\right)}\right)\}$, where $E\left({\beta}_{tk1}^{\left(r\right)}\right)$ and $\text{var}\left({\beta}_{tk1}^{\left(r\right)}\right)$ are derived by using equations (5) and (6). On the basis of equation (5) we obtain

or

Alternatively, taking expectations on both sides of equation (6), after using Taylor series expansion for $\left({\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}\right)$, we obtain

from which $\left({\beta}_{tk}^{\left(r\right)}\right)\approx {c}^{2}$.

For *c* = 0, the model proposed is equivalent to a deterministic constraint. The MAR model under this pattern-mixture framework is a special case with ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}\equiv 1\phantom{\rule{thickmathspace}{0ex}}(l=1;c=0)$ for all t ≥ 2, r<t and *k*. Indeed, the Poisson distribution is uniquely defined by its mean structure, so ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}\equiv 1$ for t ≥ 2, r<t, and *k* is equivalent to *f*(*y _{t}*|

**y**

_{t−1},

*x, r*=

*j*) =

*f*(

*y*|

_{t}**y**

_{t−1},

*x, r*≥

*t*), ∀

*t*≥ 2, ∀

*j*<

*t*. The latter are the available case missing value restrictions that were defined by Molenberghs

*et al.*(1998), which are equivalent to MAR. Thus, ${\stackrel{~}{\lambda}}_{t}^{\left(r\right)}\left({u}_{t}\right)$ can be seen as an ignorability index that is equivalent to 1 if the missing data mechanism is ignorable and different from 1 when the missing data mechanism is non-ignorable.

## 4. Computations: Gibbs sampling

The posterior distribution of the parameters in model (1) is analytically intractable; hence samples from the posterior are obtained by using Gibbs sampling (Gelfand and Smith, 1990).

### 4.1. Complete-data inference

With no missing data, draws from Pr(β, b, Σ|y, x) are generated via Gibbs sampling based on the following conditional distributions:

- [
*β*, b, Σ y, x]- [
*β*| |b, Σ, y, x] - [b|
*β*, Σ, y, x] - [Σ|
*β*, b, y, x].

Distributions (i) and (ii) do not have a closed form and draws from them are based on the Metropolis algorithm (Metropolis *et al.*, 1953) or the Metropolis-Hastings algorithm (Hastings, 1970); the draws for distribution (iii) are obtained from an inverse Wishart distribution.

### 4.2. Inference under ignorable missing data

Under ignorable missing data mechanisms for *y*, inferences can be made by drawing values from Pr(*β*, b, Σ|y_{obs}, x), which is equivalent to drawing from Pr(*β*, Σ, b, y_{mis} y_{obs}, x) fixing l= 1 and c = 0. These draws are obtained by using Gibbs sampling based on the ’data algorithm (Tanner and Wong, 1987) as applied in the following conditional distributions:

- [
*β*, b, Σ y, x]- [
*β*| |b, Σ, y, x] - [b|
*β*, Σ, y, x] - [Σ|
*β*, b, y, x]

- [y
_{mis}|*β*, b, Σ, y_{obs}, x].

The two blocks (a) and (b) represent an ‘outer’ Gibbs sampling from which draws from Pr(*β*, Σ, b, y_{mis}|y_{obs}, *x*) are obtained. The first block represents the posterior distribution of the parameters from the outcome model, and the second the posterior distribution of the missing values.

### 4.3. Inference under non-ignorable missing data

Under non-ignorable missing data mechanisms for *y*, the Gibbs sampling that was just described is modified to suit pattern-mixture models. The posterior distribution Pr.*β*, Σ, b, y_{mis} y_{obs}, x, l, c/ is identified by introducing informative prior distributions on $p({\beta}_{t}^{\left(r\right)}\mid {\beta}_{t}^{\left(0\right)})$ for *r* = 1,2, … , *t*−1 and *t* ≥ 2. The draws from the posterior are obtained by using Gibbs|sampling in=the following conditional distributions:

- [
*β*, b, Σ|y, x, l, c] - [
*β*|b, Σ, y, x, l, c] - [b|
*β*, Σ, y, x, l, c] - [Σ|
*β*, b, y, x, l, c]

- [y
_{mis}|*β*, b, Σ, y_{obs}, x, l, c].

For a given prior distribution on ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$, the prior distribution $p({\beta}_{tk}^{\left(r\right)}\mid {\beta}_{tk}^{\left(0\right)})$ is fully determined (described in Section 3.2) and, hence, so are the corresponding posterior distributions. A log-normal distribution for ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$ is used with mean ${l}_{{\beta}_{tk}}^{\left(r\right)}$ and variance ${c}^{2}{l}_{{\beta}_{tk}}^{2\left(r\right)}$. Step (i) is then modified to incorporate this important prior distribution by conditioning on *l*, the vector of all l_{β}_{tk}, and *c*. Values of ${l}_{{\beta}_{tk}}^{\left(r\right)}$ and *c* are varied to explore the sensitivity of the conclusions across different ${l}_{{\beta}_{tk}}^{\left(r\right)}$ and *c*. WinBUGS software is used to implement the draws and to derive inferences on parameters of interest (Spiegelhalter *et al.*, 2003; Gilks *et al.*, 1996).

Pattern-mixture models and selection models are two paths leading to the same joint distribution. Ekholm and Skinner (1998) used a pattern-mixture model and a selection interpretation for their sensitivity analysis. Similarly here, we shall relate the pattern-mixture model to a selection model, which will offer some reassurance that the assumed pattern-mixture models provide sensible implications about the selection mechanism. This can be easily derived in WinBUGS by adding a logistic regression that relates the missing data indicator to the full data (observed and imputed in step (b)). We implement this part in WinBUGS by using the cut(*y*) function as a valve to control the flow of the information from the pattern-mixture model to the selection model, but not vice versa.

## 5. Application to the asthma study

We now apply the model that was described in Section 3 to evaluate the effect of the asthma intervention programme on reducing the number of asthma-related hospitalizations over time, as well as the sensitivity of the results to different missing data mechanisms.

### 5.1. Inferences under an ignorable missing data mechanism

We fit the following transition Markov model of first order for *t* = 2, 3:

*η*_{ijt-1} is the yearly hospitalization rate at time *t* - 1; *I*_{i} is the intervention indicator, and *o _{ijt}* = log(

*n*) is the offset variable for the number of months

_{ijt}*n*over which the hospitalizations are counted. We use log

_{ijt}*{(η*

_{ijt-1}+ 1)=2

*}*as a predictor because it fits the model better and is defined when

*η*−

_{ijt}_{1}= 0. In equation (8),

*β*

_{t0}and

*β*

_{t1}estimate the adjusted event rate at time

*t*,for both the control and the intervention group. Thus, testing the average intervention effect over time corresponds to estimating

*β*

_{11}-

*β*

_{10}and

*β*

_{21}-

*β*

_{20}, which represent the treatment effect on hospitalizations during the first and second follow-up period respectively when

*η*

_{t−1}= 1.

Inferences under MAR are derived by using a pattern-mixture model (Section 3.2) and assuming that all ${\stackrel{~}{\lambda}}^{\left(r\right)}$ are 1. The missing values of the offset variable are set at *o _{ijt}* log(12 months)=2:48. Model (8) is fitted by using a Bayesian approach implemented through WinBUGS software. Parameter inferences are derived on the basis of five chains of 20000 iterations, each with different random starting points and following a burn-in of 5000 iterations. The convergence of iterations for each parameter is monitored by using the Gelman and Rubin (1992) univariate scale reduction factor (SRF), with all being less than 1.01. The overall convergence is monitored by using the Brooks and Gelman (1998) multivariate SRF, which equals 1.05, indicating that no gain will result if the iterations continue. The results under an MAR mechanism are shown in Table 1. For hospitalizations, no intervention effect occurs during the first follow-up period (

*β*

_{11}−

*β*

_{01}=−0.03; p 0.94), but a significant effect occurs at second follow-up (

*β*

_{12}−

*β*

_{02}=−1.67;

*p*= 0.03). When the intervention group is compared with the control group, the rate of hospitalization is reduced by 71% (95% CI = (27–96%)) between the first and second follow-up. The interaction between intervention and previous hospitalization is nonsignificant at either time point. The variances of the random intercepts are comparable with their standard errors, which reflect a weak clustering within physician once conditioning on the previous response. As expected, inferences that are derived by using a pattern-mixture model that assumes that all ${\stackrel{~}{\lambda}}^{\left(r\right)}$s are 1 are the same as inferences that are derived on the basis of only observed data.

### 5.2. Inferences under a non-ignorable missing data mechanism

For the pattern–mixture model, different parameters for each missing data pattern are used in model (8) for the control group ${\beta}_{t0}^{\left(r\right)}$ and the intervention group ${\beta}_{t1}^{\left(r\right)}$, at both follow-ups. The effect of the other covariates is held the same across the missing data patterns. Prior distributions on the parameters ${\stackrel{~}{\lambda}}_{{\beta}_{t0}}^{\left(r\right)}$ and ${\stackrel{~}{\lambda}}_{{\beta}_{t1}}^{\left(r\right)}$ are log-normal with means ${l}_{{\beta}_{t0}}^{\left(r\right)}$ and ${l}_{{\beta}_{t1}}^{\left(r\right)}$ and coefficient of variation *c*. The overall *β*-parameters are derived by averaging across the missing data patterns (Little, 1995), i.e. ${\beta}_{t}={\Sigma}_{r}{\pi}_{tr}{\beta}_{t}^{\left(r\right)}$, where *π*_{tr} = Pr (*R _{it}*=r) is the proportion of subjects in pattern

*r*at time

*t*. The

*π*is estimated by

_{tr}*m*/

_{tr}*n*where

*m*is the number of subjects in pattern

_{tr}*r*at time

*t*and

*n*is the total number of subjects. The uncertainty in

*π*is negligible and is ignored.

_{tr}The missing data mechanism is determined primarily by mean parameters ${l}_{{\beta}_{t0}}^{\left(r\right)}$ and ${l}_{{\beta}_{t1}}^{\left(r\right)}$, with different values of ${l}_{{\beta}_{t0}}^{\left(r\right)}$ and ${l}_{{\beta}_{t1}}^{\left(r\right)}$ representing different dropout mechanisms. Several values of ${l}_{{\beta}_{t0}}^{\left(r\right)}$ and ${l}_{{\beta}_{t1}}^{\left(r\right)}$ are used for sensitivity analyses. Because the treatment effect at *t* = 2 is highly non-significant we focus the sensitivity analyses on the missing data at *t* = 3. We assume that the rates of hospitalizations between the patients in the missing data pattern and those in the observed data pattern at *t* = 2 are the same on average, i.e. ${l}_{{\beta}_{20}}^{\left(2\right)}={l}_{{\beta}_{21}}^{\left(2\right)}=1$. We also let ${l}_{{\beta}_{30}}^{\left(2\right)}={l}_{{\beta}_{30}}^{\left(3\right)}={l}_{{\beta}_{30}}$ and ${l}_{{\beta}_{31}}^{\left(2\right)}={l}_{{\beta}_{31}}^{\left(3\right)}={l}_{{\beta}_{31}}$. With this assumption the sensitivity analyses are based on different values of *l _{β}*

_{30}and

*l*

_{β}_{31}, which represent the relationship between the missing values across all the missing data patterns at

*t*= 3(r<3) and the observed values at time

*t*= 3(

*r*≥ 3). We consider several values for

*l*

_{β}_{30}and

*l*

_{β}_{31}including a combination that would make the treatment effect at

*t*= 3 non-significant. Initially we set

*c*= 0.1, which corresponds to a 95% CI for

*λ*neither too narrow nor too wide. For example, for λ

_{βt}*β*

_{t1}= 0.5 the 95% CI is (0.41,0.61).

Our primary parameter of interest is *β _{t}*

_{1}–

*β*

_{t}_{0}, which estimates the average intervention effect. Inferences on this parameter are affected by the ratio

*η*−

*β*

_{t1}-

*β*

_{t0}=

*λ*−

*β*

_{t1}=

*λ*−

_{β}_{t0,}which is the ratio of the relative risks (intervention

*versus*control) between patients in the missing data pattern

*r*and patients in the observed data pattern, i.e.

This follows a log-normal distribution with mean log-normal distribution with mean log(*l*_{βt}/*l*_{βt0}) and variance 2c^{2}. With this parameterization the inferences on the treatment effect would be the same across different specifications of l_{β}_{t0} and l_{β}_{t1}, as long as the distribution of their ratio is the same. Without loss of generality, we set l_{β}_{30} = 1 and we vary l_{β}_{31} to perform sensitivity analyses on the *β*_{31} − *β*_{30})-parameter for a range of *l _{β}*

_{31}/l

_{β}_{30}.

Choosing *l _{β}*

_{31}/l

_{β}_{30}= 1.76 yields an estimate for

*β*

_{31}−

*β*

_{30}that is borderline significant (95% CI (0.05–1.00)). Several other combinations of

*l*

_{β}_{30}and

*l*

_{β}_{31}, including

*l*

_{β}_{31}/l

_{β}_{30}= 1, are used for sensitivity analyses. In addition, for each pattern-mixture model, a logistic selection model is fitted to relate the missing data indicator (at

*t*= 3) to intervention group variable, current and previous

*y*s, and their interaction:

The random effect for the physician is *g*_{i}, normally distributed with mean 0 and variance *σ*^{2}* _{g}*. Inferences for each model are derived from five chains with different random starting points of 20000 iterations each following a burn-in of 5000 iterations. The convergence of iterations is monitored by using the SRF; all univariate SRFs are less than 1.02 and all multivariate SRFs are less than 1.09, indicating that the iterations converged. The results for both pattern-mixture model and logistic selection model are shown in Table 2. Dependence of the missing data indicator

*R*

_{3}on

*y*

_{3}in the intervention group (

*γ*

_{5}) becomes stronger as the

*l*

_{β}_{31}is further from 1. The parameter estimates that are derived for the logistic selection model are consistent with assumptions in the pattern–mixture model. Thus, in the pattern–mixture model with

*l*

_{β}_{30}= 1 and

*l*

_{β}_{31}= 4, the selection model yields a positive estimate of

*γ*

_{5}= 1.26. Both models indicate that, in the intervention group, subjects with a higher number of hospitalizations are more likely to drop out. For the control group, no difference is assumed on the rate of hospitalizations at

*t*= 3 between subjects who are missing and subjects who are observed. This is consistent with the close-to-zero estimate of

*γ*

_{4}= 0:01.

Next, other values of *c* are used for sensitivity analyses. These evaluate how the boundary ratio *l _{β}*

_{31}/

*l*

_{β}_{30}, for which the intervention effect at second follow-up is just significant, relates to

*c*. A graphical display of this relationship is shown in Fig. 1. As the coefficient of variation increases, the boundary ratio decreases. This is mainly related to having larger CIs on the estimate of

*β*

_{31}−

*β*

_{30}as the prior on

*λ*− is less informative (high

*c*). We fixed

*c*= 0.1 for sensitivity analyses that are shown in Table 2, but other choices of

*c*can be used if deemed appropriate.

*l*

_{β}_{31}/

*l*

_{β}_{30}at which the estimated intervention effect is just significant declines as the coefficient of variation

*c*increases

Finally, the sensitivity analyses show that the parameter estimates for intervention effect atsecond follow-up differ across missing data mechanisms. However, the evidence of an intervention effect at second follow-up is robust to a range of missing data mechanisms. It becomes non-significant only when the relative risk of hospitalization for intervention *versus* control at *t* = 3 is on average 1.76 or higher among the missing subjects compared with the observed subjects, assuming that *c* = 0:1. No variation appeared on the other parameters. This is consistent with the fact that in all analyses we assumed that these parameters did not vary by missingness patterns. As a result, the overall estimates will be the same as the estimates by using an MAR mechanism.

## 6. Conclusions

In this paper we have developed a Bayesian model to fit clustered longitudinal data from Poisson outcomes with potential non-ignorably missing observations. The model for non-ignorably missing values was developed by using a pattern–mixture model that was identified on the basis of easy-to-understand assumptions about missing data. These assumptions used prior distributions on an ignorability index parameter ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$, which represents the ratio of event rates between the missing data pattern *r* and the observed data pattern (conditioned on other covariates and previous responses), for group *k* at time *t*. The distributions of ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$ are unknown and cannot be derived from the data. However, it is possible for an investigator to give a range for each ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$, and then to explore the sensitivity of statistical inferences over that range. Such a parameterization is intuitive and easy to understand. It contains the MAR mechanism as a special case where all ${\stackrel{~}{\lambda}}_{{\beta}_{tk}}^{\left(r\right)}$ are 1. The method was implemented by using WinBUGS1.4 software and applied to the asthma intervention study for evaluating the effect of the interactive seminar on the hospitalization outcome. WinBUGS gives added strength because it can relate the pattern-mixture model and its assumptions to a logistic selection model. Such flexibility provides additional assurance and validity to the sensitivity analyses.

In the asthma study following per-protocol analysis, under an ignorable missing data mechanism, intervention did not show a significant effect in reducing the overall rate of hospitalizations during the first follow-up period. During the second follow-up period the overall rate of hospitalizations in the intervention group compared with the control group was reduced by 71% (95% CI (27–96%)). The intervention showed similar beneficial effects across patients with different rates of hospitalizations in either period. The sensitivity analyses showed that the parameter estimates for the intervention group variable differ across missing data mechanisms. However, the evidence of an intervention effect at second follow-up was fairly robust to possible departures from an ignorable missing data mechanism.

Finally, although the model that was used answered specific questions for the asthma study, it can also be applied more generally, e.g. to impute non-ignorably missing values on count outcomes. Then, once the missing data have been imputed, additional analyses can be performed on the complete-data set by using standard statistical techniques.

## Acknowledgements

Part of the work and the data that are described in this paper were supported by Physician-Family Partnership Education in Asthma Management grant HL-44976 from the Lung Division of the National Heart, Lung, and Blood Institute. The authors thank the referees for their comments, which substantially improved the paper.

## Footnotes

Access to the WinBUGS code can be obtained from ude.hcimu@alocin.

## References

- Albert PS, Follmann DA. Modeling repeated count data subject to informative dropout. Biometrics. 2000;56:667–677. [PubMed]
- Brooks SP, Gelman A. Alternative methods for monitoring convergence of iterative simulations. J. Computnl Graph. Statist. 1998;17:434–455.
- Clark NM, Gong M, Schork MA, Evans D, Roloff D, Hurwitz M, Maiman LA, Mellins RB. Impact of education for physicians on patients outcomes. Pediatrics. 1998;101:831–836. [PubMed]
- Clark NM, Gong M, Schork MA, Kaciroti N, Evans D, Roloff D, Hurwitz M, Maiman LA, Mellins RB. Long-term effects of asthma education for physicians on patient satisfaction and use of health services. Eur. Resp. J. 2000;16:15–21. [PubMed]
- Daniels M, Hogan J. Reparameterizing the pattern mixture model for sensitivity analysis under informative dropout. Biometrics. 2000;56:1241–1248. [PubMed]
- Demirtas H. Multiple imputation under Bayesianly smoothed pattern-mixture models for non-ignorable drop-out. Statist. Med. 2005;24:2345–2363. [PubMed]
- Demirtas H, Schafer JL. On the performance of random-coefficient pattern-mixture models for non-ignorable drop-out. Statist. Med. 2003;22:2553–2575. [PubMed]
- Diggle PJ, Heagerty P, Liang KY, Zeger S. Analysis of Longitudinal Data. 2nd edn. Oxford University Press; New York: 2002.
- Ekholm H, Skinner C. The Muscatine children’s obesity data reanalysed using pattern mixture models. Appl. Statist. 1998;47:251–263.
- Gelfand AE, Smith AFM. Sampling based approaches to calculate marginal densities. J. Am. Statist. Ass. 1990;85:398–409.
- Gelman A, Rubin DB. Inference from iterative simulations using multiple sequences. Statist. Sci. 1992;7:457–472.
- Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo in Practice. Chapman and Hall; London: 1996.
- Guo W, Ratcliffe SJ, Ten Have TT. A random pattern-mixture model for longitudinal data with dropouts. J. Am. Statist. Ass. 2004;99:929–937.
- Hastings WK. Monte Carlo sampling method using Markov chains and their applications. Biometrika. 1970;57:97–109.
- Heckman JJ. Sample selection bias as a specification error. Econometrica. 1979;47:153–162.
- Hogan JW, Laird NM. Mixture models for joint distribution of repeated measures and event times. Statist. Med. 1997;16:239–257. [PubMed]
- Kaciroti N. PhD Thesis. Department of Biostatistics, University of Michigan; Ann Arbor: 2002. Modeling nonignorable missing data for clustered longitudinal discrete outcomes: a Bayesian approach.
- Kaciroti N, Raghunathan TE, Schork MA, Clark NM, Gong M. A Bayesian approach for clustered longitudinal ordinal outcome with nonignorable missing data: evaluation of an asthma education program. J. Am. Statist. Ass. 2006;101:435–446.
- Kenward MG. Selection models for repeated measures with non-random dropout: an illustration of sensitivity. Statist. Med. 1998;17:2723–2732. [PubMed]
- Kenward MG, Molenberghs G. Parametric models for incomplete continuous and categorical longitudinal data. Statist. Meth. Med. Res. 1999;8:51–83. [PubMed]
- Kenward MG, Molenberghs G, Thijs H. Pattern-mixture models with proper time dependence. Biometrika. 2003;90:53–71.
- Little RJA. Pattern-mixture models for multivariate incomplete data. J. Am. Statist. Ass. 1993;88:125–134.
- Little RJA. A class of pattern-mixture models for normal missing data. Biometrika. 1994;81:471–483.
- Little RJA. Modeling the dropout mechanism in repeated measures studies. J. Am. Statist. Ass. 1995;90:1113–1121.
- Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd edn. Wiley; New York: 2002.
- Little RJA, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996;52:98–111. [PubMed]
- Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH. Equations of state calculations by fast computing machines. J. Chem. Phys. 1953;21:1087–1091.
- Molenberghs G, Michiels B, Kenward MG, Diggle PJ. Monotone missing data and pattern-mixture models. Statist. Neerland. 1998;52:153–161.
- National Center for Health Statistics . National Health Interview Survey, 2001-2003. National Center for Health Statistics; Atlanta: 2002.
- Ridout MS. Testing for random dropouts in repeated measurement data. Biometrics. 1991;47:1617–1621. [PubMed]
- Spiegelhalter DJ, Thomas A, Best NG, Lunn D. WinBUGS User Manual: Version 1.4. Medical Research Council Biostatistics Unit; Cambridge: 2003.
- Tanner M, Wong WH. The calculation of posterior distribution by data augmentation (with discussion) J. Am. Statist. Ass. 1987;82:528–550.
- Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002;3:245–264. [PubMed]
- Wu M, Carroll R. Estimation and comparison of change in presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188.
- Zeger SL, Qaqish B. Markov regression models for time series: a quasi-likelihood approach. Biometrics. 1988;44:1019–1031. [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (655K)

- Bayesian latent-class mixed-effect hybrid models for dyadic longitudinal data with non-ignorable dropouts.[Biometrics. 2013]
*Ahn J, Liu S, Wang W, Yuan Y.**Biometrics. 2013 Dec; 69(4):914-24. Epub 2013 Nov 6.* - A local sensitivity analysis approach to longitudinal non-Gaussian data with non-ignorable dropout.[Stat Med. 2008]
*Xie H.**Stat Med. 2008 Jul 20; 27(16):3155-77.* - Reparameterizing the pattern mixture model for sensitivity analyses under informative dropout.[Biometrics. 2000]
*Daniels MJ, Hogan JW.**Biometrics. 2000 Dec; 56(4):1241-8.* - Subjective prior distributions for modeling longitudinal continuous outcomes with non-ignorable dropout.[Stat Med. 2009]
*Paddock SM, Ebener P.**Stat Med. 2009 Feb 15; 28(4):659-78.* - Markov transition models for binary repeated measures with ignorable and nonignorable missing values.[Stat Methods Med Res. 2007]
*Xiaowei Yang, Shoptaw S, Kun Nie, Juanmei Liu, Belin TR.**Stat Methods Med Res. 2007 Aug; 16(4):347-64.*

- A Bayesian model for time-to-event data with informative censoring[Biostatistics (Oxford, England). 2012]
*Kaciroti NA, Raghunathan TE, Taylor JM, Julius S.**Biostatistics (Oxford, England). 2012 Apr; 13(2)341-354*

- PubMedPubMedPubMed citations for these articles

- A Bayesian model for longitudinal count data with non-ignorable dropoutA Bayesian model for longitudinal count data with non-ignorable dropoutNIHPA Author Manuscripts. Dec 1, 2008; 57(5)521PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...