Tables 3–5 display the results of all eight Bayesian models (four models each with two possible priors; six Bayesian models for the continence outcome) and two frequentist models (fixed effect and homogeneous random effect models) for all seven pharmacological treatments in terms of continence, UI improvement, and discontinuation due to AE, respectively. Shown are estimated log odds ratios between treatment and placebo (d_{k}) with standard errors in parentheses from both frequentist and Bayesian methods, and the probability of being among the two best treatments (Best12) from the Bayesian analysis. For Bayesian analysis, we report the posterior median values. Note that the orders of d_{k} and Best12 probability are similar in Bayesian results. To find the best drug in terms of each outcome, we use d_{k} for frequentist methods and Best12 for Bayesian; NNT is also provided. The first part of all three tables provides Bayesian goodness-of-fit statistics. Table 4 reveals a decrease in the fit statistic between the fixed and random effects models; thus, introducing randomness among studies essentially forces improved model fit although the variability in the random effects (σ) is quite small. The heterogeneous random effects model yields almost the same DIC for all outcomes, and the homogeneous random effects model generally offers the best compromise between fit and complexity (i.e., lowest DIC); neither the addition of w-factors nor heterogeneous variances pay practically significant improvements in DIC. Regarding the two priors, again we see no meaningful difference in DIC, though the shrinkage prior is slightly preferred for the discontinuation outcome (Table 5). As such, for Bayesian decisionmaking, we adopt the homogeneous random effects models with the shrinkage prior.

In Table 3, there are no substantial differences in DIC across all Bayesian models and this might be due to small σ. The frequentist random effects model yields almost zero variability, resulting in the same d_{k} estimates between fixed and random effects models. In the Bayesian homogeneous random effects models, two priors provide slightly different order of d_{k} (propiverine moves to first place under the Bayes2 prior), but the Best12 probabilities deliver the same order and so does the frequentist ranking based on d_{k}, though there is the lack of statistically significant differences among the Best12 probabilities. Across all models, trospium is the best drug in terms of continence, suggesting the effect of trospium is dominant regardless of the presence of random effects or a shrinkage prior. Overall, trospium and propiverine appear to have a slight edge, with tolterodine appearing to be the worst drug to cure UI, given its smallest Best12 probabilities and d_{k}. The rankings based on NNT are rather different, with propiverine emerging as a clear winner, followed by a three-way tie for third place. However, we caution that few of the differences between drugs are statistically significant, a subject to which we return in Table 6.

Table 4 displays the results from frequentist and Bayesian models with respect to the UI improvement outcome. Again, frequentist random effects models give smaller σ estimates compared to Bayesian models. In the Bayesian results, propiverine has a greater than 0.7 probability of being the first or second best. The runner-up here appears to be oxybutynin, which emerges with the second highest probabilities of being among the top two. Tolterodine fares worst. Frequentist ranking based on d_{k} from random effects model gives the same results, though the drugs' differences are not statistically significant. NNT is not reported when the treatment fails to differ significantly from placebo; this is why trospium has no NNT. The estimated w-factors in the inconsistency model are small (w_{137} = 0.00 and w_{147} = 0.20), and there is no strong evidence of inconsistency.

Table 5 shows the model comparisons with respect to the safety outcome, discontinuation due to AE. Since the outcome now has a negative meaning, “Worst12” is now interpreted as being first or second worst. There is a roughly five unit decrease in DIC, resulting from a decrease in pD between the Bayes1 and Bayes2 priors across all models. In this specific dataset, the shrinkage encouraged by Bayes2 implies lower model complexity. All w-factors are smaller than 0.1, implying minimal inconsistency between direct and indirect comparisons. Again, the estimated σ from frequentist homogeneous random effects model is close to zero. In both frequentist and Bayesian analyses, oxybutynin is the worst drug with the highest d_{k} and Worst12 probability from all models, followed by fesoterodine. tolterodine has the smallest d_{k} and 0 probability of being the first or second least safe drug, suggesting it is safest among the seven treatments. Although the Worst12 probabilities are not significant between drugs, the Bayes2 prior gives slightly smaller standard deviation of Worst12 (see oxybutynin) than the Bayes1 prior. Here, smaller NNT values mean *less* safe; e.g., an NNT of 24 means that we would expect that one woman of each 24 enrolled would not tolerate treatment.

Figure 2 shows our findings graphically in terms of odds ratios with MCMC-computed 95 percent equal-tail credible intervals from Bayesian models, or 95 percent confidence intervals from frequentist models for each outcome. We compare four models; Bayes2 fixed effects model, frequentist random effects model, and Bayes1 and 2 homogeneous random effects models. We mark the best drug with respect to each outcome with a triangle character, and the worst drug with a square. For the continence outcome, all of the odds ratios are significantly greater than 0 (that is, all drugs are more effective than placebo) and trospium and propiverine have odds ratios close to 2, meaning that being treated with either of these leads to about two times greater odds of continence compared to being untreated. However there appear to be no significant differences between drugs. Regarding the UI improvement outcome, the odds ratios of propiverine, oxybutynin, and solifenacin exceed 2, while tolterodine delivers the worst performance, though they are not significantly different. In the discontinuation outcome, tolterodine is the safest drug and oxybutynin performs worst. There are just two significant differences between drugs: tolterodine versus oxybutynin and fesoterodine (their 95 percent intervals do not overlap). Note that propiverine emerges as having very wide intervals because there are only two studies for this drug, and the two studies do not agree with the direction of this drug's safety.

Table 6 presents odds ratios and 95 percent credible or confidence intervals for all pairwise comparisons under both our Bayesian analyses (Bayes1 and Bayes2) and a frequentist analysis carried out with the homogeneous random effects model. Although most drugs are significantly effective compared to placebo with all outcomes, there is only one significant odds ratio between active drugs (tolterodine vs. trospium) for the continence outcome, two for the UI improvement outcome (oxybutynin and propiverine vs. tolterodine), and three for the discontinuation AE outcome (tolterodine vs. fesoterodine and oxybutynin and trospium vs. oxybutynin) under the Bayes1 (noninformative) prior. The Bayes2 prior gives similar significances. The Bayesian analyses generally give wider 95 percent credible intervals than the frequentist method because the Bayesian approach incorporates all sources of uncertainty into the model. However, note that Bayes2 does sometimes find significance where the frequentist method does not; e.g., darifenacin versus fesoterodine for discontinuation due to AE.

Figure 3 exhibits rankings according to two pairs of outcomes under the Bayes2 homogeneous random effects model.^{37} Drugs plotted at the upper right corner are considered the best in terms of both efficacy and safety. Panel (a) compares continence and discontinuation outcomes. While this display does not include standard errors (and thus the significance of the differences shown is difficult to judge), trospium emerges as most attractive since it is the best in terms of continence and also the third safest drug (although it fails to differ from placebo in terms of UI improvement under the Bayes2 homogeneous random effects model, it is very close and significant in the other models). Panel (b) indicates solifenacin may offer the best compromise between the UI improvement and discontinuation outcomes. This drug delivers the third best outcome for UI improvement and a Worst12 probability of 0.026, fairly small though the discontinuation ranking is rather high at 5. As such, these two drugs may be viewed (at least informally) as offering the best compromise between safety and efficacy in this investigation. In summary, while frequentist and Bayesian analyses produce broadly comparable odds ratios of safety and efficacy, the Bayesian method's ability to deliver the probability that any treatment is among the top two such treatments leads to more meaningful clinical interpretation.

## Publication Details

### Copyright

### Publisher

Agency for Healthcare Research and Quality (US), Rockville (MD)

### NLM Citation

Carlin BP, Hong H, Shamliyan TA, et al. Case Study Comparing Bayesian and Frequentist Approaches for Multiple Treatment Comparisons [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Mar. Results.