NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Hong H, Carlin BP, Chu H, et al. A Bayesian Missing Data Framework for Multiple Continuous Outcome Mixed Treatment Comparisons [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan.

Cover of A Bayesian Missing Data Framework for Multiple Continuous Outcome Mixed Treatment Comparisons

A Bayesian Missing Data Framework for Multiple Continuous Outcome Mixed Treatment Comparisons [Internet].

Show details


The main objective of this report has been to propose new Bayesian MTC approaches for multiple continuous outcomes, and compare them with previous hierarchical modeling methods. We considered unobserved arms to be missing data and handled them by borrowing information from the observed indirect relationships. We also combined multiple outcomes into one model by incorporating the correlation structures between them. Next, we developed arm-based (AB) models that estimate absolute effects of treatments, rather than relative effects. We illustrated our methods using the OA data, and used simulation to show that our models can outperform existing Lu and Ades-style models in terms of Type I error, power, and probability of incorrectly selecting the best treatment under various missing data mechanisms.

We fit six models to the OA data, with LAREhom, CBRE2hom, and ABRE2hom producing slightly smaller DIC values. The fixed effects model performs poorly because it can never fully capture variability across studies. In the random effects models, a homogeneous variance (or covariance matrix) assumption is quite reasonable because our data are so sparse that heterogeneous covariances may not be well estimated. Regarding the pain outcome, low intensity diathermy emerged as the best therapy in LA models, whereas proprioception exercise performed best under CB and ABRE2hom models, followed by strength exercise. However, there were no significant differences between most active therapies, due to the large associated standard deviations (e.g., Figure 3). Note that three studies reported diathermy intervention with only a short length of followup (0 to 5 weeks), so we can only see the short-term effect of diathermy here. By contrast, most studies for proprioception or strength exercises reported a followup period of 6 to 12 weeks. For the disability outcome, aerobic and proprioception exercises perform well across all three random effects models, though again significant differences were rare. Unfortunately, our OA data analysis did not show much impact of our methods compared with the existing methods due to sparseness of the data, although we have shown our methods give less biased estimates through simulation studies.

Our simulation study shows that ignoring missing data and correlations between outcomes can cause biased estimates, resulting in bad hypothesis test performance when missingness of treatment arms depends on the observed (and even missing) data. Although our simulation setting is simple, this problem could be more severe for more complicated data structures. Also, CB models cannot capture the correct correlation in some settings due to their inherent constraints, while AB models can. For example, in our simulation setting, CB models cannot estimate ρCB if we set ρAB=0.9 because this violates the positive definiteness of the CB covariance matrix. Although our methods perform almost equally when two outcomes are independent (ρAB=0), our methods still outperform the existing LA methods in terms of Type I error, power, and Pr(μ11^>μ21^). Generally, the AB models with weakly informative priors help to yield more reliable estimates resulting in more power.

Regarding the missingness mechanism, we generally assume that the data have MAR missingness. The MCAR assumption might be valid but could be too strong in some cases. For example, in our simulation missingness mechanism (9), the probability of missingness in the first outcome increases as a population has higher and lower second-outcome responses in the first and second treatments, respectively.

Our methods have several limitations. First, since we have only summary statistics for every study, there is the possibility of ecological fallacy. Second, all our models are fitted under the assumption of consistency. Although we do not follow the Lu-and-Ades consistency equation, measuring inconsistency between direct and indirect comparisons in MTCs with incorporating missingness and multiple outcomes is a topic for a future manuscript. Furthermore, we will try to distinguish the data-driven missingness mechanism by using this inconsistency information. Third, in our CB and AB random effect models, we assumed that either the between-outcome or between-treatment correlations were all zero a priori. However, such assumptions can be loosened by factorizing the random effects into two independent sources. For example, in the AB model, (6) can be rewritten as Δikl= μkl + vik + wil, where (vi1, … , viK)T ∼ MVN (0, DTrt), (wi1, … , wiL)T MVN (0, DOut), and vik and wil are independent. Here, DTrt and DOut are K × K and L × L unstructured covariance matrices implying correlation between treatments and outcomes, respectively, where each covariance matrix has an inverse Wishart prior. In this approach, we must select these Wishart priors carefully to ensure identifiability, and this is a subject of ongoing investigation. Fourth, we assumed that the within-study correlations are zero in likelihood. However, Riley et al. discussed when we can estimate within-study correlation and thus produce estimates with smaller standard errors than in the independent setting for bivariate random effect meta-analysis.82 Finally, we have discussed borrowing strength from the missingness, but this does not mean that our estimates always have narrower 95% credible interval than those from the existing model. If there is not enough observed data, our methods could have a lot of uncertainty, resulting in wider 95% credible intervals.

In the standard meta-analysis with a continuous outcome, standardized mean differences (SMDs) are often calculated and used for analysis and inference.83 However, we avoid using those quantities in our method because it does not fully handle situations with multi-arm trials and uncommon baseline treatments across studies. For example, in a three-arm study, three SMD values can be calculated, but only by reusing the data, violating the Likelihood Principle. Also, it is not reasonable to combine SMD values that can possibly have different control arms (or baseline treatment) across studies.

Our data analysis also has some limitations. First, we assumed that patients in each intervention from each study had similar clinical characteristics, so we did not adjust our models for such baseline covariates, (e.g., age, severity of OA, or comorbidities). Meta-regression6 is usually applied to see associations between those sample covariates and treatment effects, but it does not detect the relationship well here because we have only aggregated information.84 To see such relationships correctly, individual-level data should be incorporated. Second, we assumed a common covariance matrix across treatments in our CB and ABRE2 models. This might not be a valid assumption because differences in outcome correlations between treatments could exist. Next, we did not control for the effect of varying followup times but instead selected a frequently observed followup time for each treatment when studies reported outcomes from multiple followup times. Although we made an effort to have similar followup times within each treatment, not all studies had precisely the same followup time for a specific treatment. However, a majority of studies investigated only one followup time, and in any case our data were not intended to measure the effect of followup time. Also, the outcomes from different followup times are likely to be correlated because they are typically obtained from the same sample of patients; modeling this feature is beyond the scope of our present report. Lu et al.85 suggest various models for MTCs at multiple followup times with single binary outcome. We found that the baseline pain scores from the studies not reporting disability scores are slightly smaller than those from the studies that reported both outcomes. This could imply that the missingness depends on the observed, information implying the MAR mechanism.

Our simulation studies can be improved by including more features. For example, we might extend it to have more than two treatments with a more complicated evidence network so that inconsistency could be measured. In this report, we only considered 50 percent missingness in the first outcome, all studies have the same sample size and assumed standard deviation, and true d21 values were somewhat arbitrarily selected. We could explore various missingness rates and patterns with some heterogeneity between studies for different sample sizes and standard deviations. Also, we need to examine more d21 values rather than just 1 and 2.

Finally, our models can be applied when the MTC data have multiple outcomes (i.e., efficacy and safety outcomes) with possible correlations but not measured at multiple time points. We can reduce our model to handle a single continuous outcome. Also, our CB and AB models can be applied to single or multiple binary outcome settings by using a logit link function rather than a linear link function.86 We can also extend our approaches to categorical outcomes. We are currently extending our methods to mixed types of outcomes (say, a binary safety outcome paired with a continuous efficacy outcome). Furthermore, we hope to extend our models to incorporate both aggregated and individual-level (i.e., patient-level) data, potentially permitting borrowing of strength from patient-level covariates to investigate how those personal clinical characteristics impact estimated treatment effects.

Image resultsf1
PubReader format: click here to try


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (829K)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...