NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Hong H, Carlin BP, Chu H, et al. A Bayesian Missing Data Framework for Multiple Continuous Outcome Mixed Treatment Comparisons [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan.

Cover of A Bayesian Missing Data Framework for Multiple Continuous Outcome Mixed Treatment Comparisons

A Bayesian Missing Data Framework for Multiple Continuous Outcome Mixed Treatment Comparisons [Internet].

Show details

Results

Results for OA Data

Table 2 compares the fit of six models with our OA data. We apply homogeneous variance across arms in LARE and homogeneous covariance matrices for CBRE2 and ABRE2; that is, kOut and ΛkOut are the same for all k, respectively. All CB and AB models incorporate the missingness into models, and only CBRE2hom and ABRE2hom models allow correlation structure between outcomes. The fixed effects model gives the largest mean deviance score D̄ when applied to the OA data, and an unacceptably large DIC score. ABRE1 fits the data best with the smallest D̄, but there is no significant difference in fit across random effects models. AB models give slightly higher pD than CB models because they are less constrained and more parameters need to be estimated. Since our data are sparse, heterogeneous variance assumption, a feature of CBRE1 and ABRE1, is not a good choice here. Considering both goodness of fit and complexity, CBRE2hom gives the smallest DIC, though again, the DIC differences between this model and ABRE2hom or LAREhom are not of practical importance (less than five units). The estimated variability on the standard deviation scale is always between 1 and 1.5, with associated 95% credible interval widths around 0.4 based on the median posteriors in LAREhom, CBRE2hom and ABRE2hom models. The median posterior of correlations between two outcomes are 0.494 (95% credible interval 0.18 to 0.71) and 0.377 (0.06 to 0.61) for the CBRE2hom and ABRE2hom models, respectively, revealing the two outcomes to be positively but weakly correlated (data is not shown).

Table 2. Model comparisons for the OA data.

Table 2

Model comparisons for the OA data.

Table 3 displays the results from four models; LAFE, LAREhom, CBRE2hom, and ABRE2hom with respect to the pain outcome. Here, smaller values of dk1 and μk1 mean better condition and the “best” treatment based on the Best12 probability is in bold. In the LAREhom model, it is essentially tied with aquatic and proprioception exercises for first place. Our CB and ABRE2hom models suggest that proprioception exercise is the best treatment, followed by strength exercise, but the Best12 probability of proprioception exercise from ABRE2hom is much larger than that from CBRE2hom. However, since standard deviations are somewhat large, there is no significant difference between these two treatments. There are large differences in Best12 probabilities across three random effects models. This might be due to different model assumptions and settings but also to the network in the data structure.

Table 3. Estimates of treatment effects and Best12 probabilities from four models with outcome pain.

Table 3

Estimates of treatment effects and Best12 probabilities from four models with outcome pain.

Table 4 shows similar information with respect to the disability outcome. Aerobic exercises perform best based on Best12 probabilities from LAREhom models. Proprioception and aerobic exercises are tied for first place in the CBRE2hom model, and proprioception exercise is the best treatment followed by strength exercise in ABRE2hom. It seems that proprioception and aerobic exercises are helpful to reduce disability across all models, but there is still no strong evidence regarding significant difference among the treatments.

Table 4. Estimates of treatment effects and Best12 probabilities from four models with outcome disability.

Table 4

Estimates of treatment effects and Best12 probabilities from four models with outcome disability.

Figure 3 delivers our findings above graphically in terms of mean difference between therapy and no treatment (dkl) with 95% credible intervals across the four models. We indicate the best treatment with respect to each outcome in each model with a triangle character, and the worst treatment with a square. For the pain outcome, strength and proprioception exercises perform significantly better than no active treatment across all models, whereas for the disability outcome, only aerobic exercise is significantly different from no active treatment under the three random effects models. Compared with the pain outcome, the 95% credible sets in disability are wider because only about half as many studies reported this outcome.

Figure 3 is an interval plot of odds ratios of OA physical therapies versus no treatment with 95% credible intervals of them from four models for each outcome. This figure is described further at the fourth paragraph in the Results for OA Data section as follows: we indicate the best treatment with respect to each outcome in each model with a triangle character, and the worst treatment with a square. For the pain outcome, strength and proprioception exercises perform significantly better than no active treatment across all models, whereas for the disability outcome, only aerobic exercise is significantly different from no active treatment under the three random effects models. Compared to the pain outcome, the 95 percent credible sets in disability are wider because only about half as many studies reported this outcome.

Figure 3

OA data interval plot of difference between fixed mean of therapies and no treatment for each outcome. Abbreviations: ABRE2 = arm-based random effects model assuming dependence between outcomes; CBRE2 = contrast-based random effects model assuming dependence (more...)

Figures 4 and 5 exhibit the posterior probabilities of each treatment taking each possible ranking from 1 (best) to 11 (worst) for both the pain reduction and disability improvement outcomes.25 Although these graphs cannot reveal significant differences in rankings among treatments or the magnitudes of any treatment differences, they do still give a sense of the uncertainty in the rank for each treatment. Note that in both figures the positive correlation between the two outcomes leads to generally similar treatment ranking probabilities for both outcomes. In Figure 5, proprioception exercise's probability of being the best treatment for pain is roughly 0.8, leaving the remaining 10 treatments to share the remaining 0.2 probability of being the best; this treatment also has the single largest probability of being best for disability improvement (about 0.4). By contrast, the LA model rankings in Figure 4 do not suggest a dominant treatment for either outcome, though aerobic exercise has a nearly 0.4 chance of being best for disability improvement, and placebo is unequivocally worst for pain reduction.

Figure 4 is a rankogram displaying the ranks of each OA treatment in terms of the pain (solid line) and disability (dashed line) outcomes from the LAREhom model. There are 11 graphs in 3 by 4 panels, and the order of treatments is no treatment, education, placebo, low density diathermy, high density diathermy, electrical stimulation, aerobic exercise, aquatic exercise, strength exercise, proprioception exercise, and ultrasound from left to right and up to down. The vertical axis gives the posterior probability of the indicated treatment taking each of the ranks on the horizontal axis, where 1 is best and 11 is worst. In this figure the positive correlation between the two outcomes leads to generally similar treatment ranking probabilities. None of therapies dominates treatment for either outcome, though aerobic exercise has a nearly 0.4 chance of being best for disability improvement, and placebo is unequivocally worst for pain reduction.

Figure 4

Ranking of treatments for reducing pain and improving disability from the homogeneous Lu and Ades-style random effects model (LAREhom). Note: The vertical axis gives the posterior probability of the indicated treatment taking each of the ranks on the (more...)

Figure 5 has the same form of rankogram as in Figure 4 but under the ABRE2hom model. Again, in this figure, the positive correlation between the two outcomes leads to generally similar treatment ranking probabilities. Proprioception exercise's probability of being the best treatment for pain is roughly 0.8, leaving the remaining 10 treatments to share the remaining 0.2 probabilities of being the best; this treatment also has the single largest probability of being best for disability improvement (about 0.4).

Figure 5

Ranking of treatments for reducing pain and improving disability from the homogeneous arm-based random effects model 2 (ABRE2hom). Note: The vertical axis gives the posterior probability of the indicated treatment taking each of the ranks on the horizontal (more...)

To obtain Best12 probabilities with combined score in Equation (8), we investigate three sets of weights: (w1,w2) = (0.5, 0.5), (0.8, 0.2), and (0.2, 0.8). Our CB and ABRE2hom models give proprioception exercise as the global winner for all three sets of weights. Aerobic exercise is the overall winner in the LAREhom model (results not shown). The reason why the weights do not have much effect here is that some treatment effects are so large in one outcome that they dominate the effects from the other outcome, even when we put low weight on the former (e.g., Best12 probability of aerobic exercise in the disability outcome is much larger than that of low intensity diathermy the pain outcome for LAREhom).

Sensitivity Analysis

Our CB and ABRE2hom models yield weakly positive correlation between two outcomes under noninformative Wishart prior on covariance matrix of random effects, assuming zero correlation between outcomes with γ = 2 degrees of freedom. As a sensitivity analysis, we consider three different more informative Wishart priors: 0.5 between-outcome correlation with γ = 2 and 4, and 0.9 between-outcome correlation with γ = 4. Note that a Wishart prior becomes less informative as γ decreases to 0.

Table 5 displays the results of our sensitivity analysis in terms of model fits (pD, D̄, and DIC) and posterior estimates of correlation between two outcomes (ρ̂). Here, the degree of informativeness in the Wishart hyperprior increases from left to right. The ρ̂s in CBRE2hom models are likely to be affected more by the selection of a Wishart prior having ρ̂ close to 0.9 when ρ0 = 0.9, γ = 4 while ABRE2hom gives a bit more robust ρ̂ around 0.5 across the three sets of informative priors. In CBRE2hom, pD decreases as we utilize a more informative prior, whereas ABRE2hom gives almost the same pD values across all informative priors. Regarding treatment effect parameters, informative priors do not give dramatic difference in the treatment ranking (proprioception exercise is the best treatment in both outcomes under both CB and ABRE2hom models across all informative prior cases), but provide smaller standard deviation of those parameters.

Table 5. Results from sensitivity analysis.

Table 5

Results from sensitivity analysis.

Results for Simulation Study

Tables 6 and 7 present the results of our simulation under ρAB=0.6and0.0, respectively. For CBRE2 and ABRE2 models, we used two different Wishart priors for the covariance matrices; namely, a noninformative Wishart ((100010),2) and a weakly informative Wishart(4R*, 4), respectively, where R* is the true covariance matrix. We report Pr(μ11^>μ21^) in parentheses which is interpreted as the probability of an incorrect decision when d21=1or2, but should be around 0.5 when d21=0, along with the simulated Type I error and power. Here, using true covariance matrix in the prior distribution could be a way overly optimistic, but we adopt the truth to investigate how much power could be gained with informative priors.

Table 6. Simulation results when ρAB∗=0.6; Type I error, Power1, and Power2 in terms of d21; Pr(μ11^>μ21^) is in parentheses.

Table 6

Simulation results when ρAB∗=0.6; Type I error, Power1, and Power2 in terms of d21; Pr(μ11^>μ21^) is in parentheses.

Table 7. Simulation results when ρAB∗=0.0; Type I error, Power1, and Power2 in terms of d21; Pr(μ11^>μ21^) is in parentheses.

Table 7

Simulation results when ρAB∗=0.0; Type I error, Power1, and Power2 in terms of d21; Pr(μ11^>μ21^) is in parentheses.

In Table 6, all models work fairly well when there is no missing data (“complete”). For Type I error, the LAREhom model performs poorly under MAR and MNAR mechanisms with very extreme Pr(μ11^>μ21^) values, very close to 0 (MAR) or 1 (MNAR). Power1 decreases under the MCAR mechanism as we expected due to the loss of data, but our CBRE2 and ABRE2 models give slightly higher power than LAREhom. The LAREhom model gives extremely high Power1 under MAR, but too low under MNAR. Here, under MNAR the probability of an incorrect decision is 0.377 using LAREhom, while it is only 0.080 using CBRE2 and ABRE2. All models yield very high power when d21=2 except the LAREhom model under MNAR mechanism. The fifth and sixth columns show that adopting weakly informative Wishart priors can improve power without severely damaging Type I error.

Table 7 shows that our methods have less benefit when two outcomes are independent. In this case, the LAREhom model does not suffer as much on Type I error under MAR and MNAR mechanisms, and Power1 values are not extreme; it also gives slightly smaller Pr(μ11^>μ21^) values when d21=1 under MNAR than our CBRE2 and ABRE2 models. This is because these methods do not borrow much strength across outcomes since the correlation is close to zero in this setting. Compared with Table 2, CBRE2 and ABRE2 produce somewhat smaller powers under severe missingness mechanisms than when the two outcomes were correlated.

Figure 6 exhibits the density plot of median posteriors of d21 from 1,000 simulated partially missing data under each of three models with noninformative Wishart priors, when ρAB=0.6 and d21 is 0, 1, and 2 under MCAR, MAR, and MNAR mechanisms. When the missingness does not depend on the data (MCAR), the median posteriors of d21 are unbiased across all three models, though ABRE2 gives slightly smaller estimator variances, suggesting smaller mean squared error (MSE). On the other hand, the MAR and MNAR mechanisms lead to huge positive or negative biases with the LAREhom model, resulting in large Type I error and extreme Power1 values. This bias depends on the choices of coefficients in Equation (9); for example, if we alter (9) to logit(pi,mis) = −4 − 2i12 + i22 for MAR, LAREhom gives 0.087 Power1 while CBRE2 and ABRE2 give 0.37 and 0.311, respectively. No matter which rules drive the missingness, it is obvious that LAREhom models produce larger bias than our models when the missingness does not randomly occur and the two outcomes are correlated.

Figure 6 is a density plot exhibiting the bias of d sub 21 from simulations when the true between-outcome correlation in AB model's scale is 0.6 with LARE (solid line), CBRE2 (dashed line), ABRE2 (dotted line) under; (a) MCAR, Type I error, (b) MCAR, Power1, (c) MCAR, Power2, (d) MAR, Type I error, (e) MAR, Power1, (f) MAR, Power2, (g) MNAR, Type I error, (h) MNAR, Power1, and (i) MNAR, Power2 scenarios. Each figure has a gray vertical line at 0 indicating the null hypothesis and black solid vertical line at 0, 1, and 2 indicating the truth for the case of Type I error, Power1, and Power2, respectively. This figure is described further at the fourth paragraph in the Results for Simulation Study section as follows: when the missingness does not depend on the data (MCAR), the median posteriors of d sub 21 are unbiased across all three models, through ABRE2 gives slightly smaller estimator variances, suggesting smaller mean squared error (MSE). On the other hand, the MAR and MNAR mechanisms lead to huge positive or negative biases with the LAREhom model, resulting in large Type I error and extreme Power1 values.

Figure 6

Density plot of 1,000 median posteriors of d21 from simulations when ρAB∗=0.6 under MCAR (first row), MAR (second row), and MNAR (third row) mechanisms under noninformative Wishart priors; (a), (d), (g) d21∗=0, (b), (e), (h) (more...)

Figure 7 displays the same density plots as in Figure 6, but under ρAB=0.0. All three models deliver unbiased estimates under MCAR and MAR, but give somewhat biased estimates under MNAR, although the magnitudes of bias are similar across models. Our CBRE2 and ABRE2 models tend to give slightly larger estimator variances. Here, the missingness does not much affect the bias of estimators in LAREhom with two uncorrelated outcomes. Although our methods do not deliver strikingly better features over the existing LAREhom model in this idealized case, our methods do not surrender much in terms of Type I error and power, justifying their uses across both dependent and independent scenarios.

Figure 7 is the same as Figure 6 but when the true between-outcome correlation in AB model's scale is 0.0. This figure is described further at the fifth paragraph in the Results for Simulation Study section as follows: all three models deliver unbiased estimates under MCAR and MAR, but give somewhat biased estimates under MNAR, although the magnitudes of bias are similar across models. Our CBRE2 and ABRE2 models tend to give slightly larger estimator variances. Here, the missingness does not much affect the bias of estimators in LAREhom with two uncorrelated outcomes.

Figure 7

Density plot of 1,000 median posteriors of d21 from simulations when ρAB∗=0.0 under MCAR (first row), MAR (second row), and MNAR (third row) mechanisms under noninformative Wishart priors; (a), (d), (g) d21∗=0, (b), (e), (h) (more...)

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...