We performed a large-scale empirical comparison of univariate and multivariate meta-analysis using data from the Cochrane Library of Systematic Reviews, and complemented it with a simulation study. Overall, univariate and multivariate methods yield numerically similar means and confidence intervals, suggesting that systematic review conclusions are not sensitive to this particular choice of methods. However, the confidence intervals of relative odds ratios between the pairs of outcomes can differ substantially between univariate and multivariate meta-analysis.

It appears that, as long as we focus on summaries for individual outcomes (and the respective confidence intervals) the choice between univariate and multivariate meta-analysis has limited practical importance. This is supported by our simulation analyses, and is congruent with the numerical results in the worked examples of several methodological papers introducing or reviewing methods for multivariate meta-analysis.2,4,5,7,8,11,13,48,49 It is not clear whether this would be observed in other examples or in other types of data, where the information on within-study correlations of treatment effects is not extractable (but presumably available from external sources, e.g., by contacting authors). Because the actual mechanics of meta-analysis methodologies are the same, however, it is likely that similar observations would hold for a wider range of examples.

It is not clear how often systematic reviewers face the methodological dilemma explored in this work. In our empirical evaluation, out of 1919 reviews with at least one binary meta-analysis, 29 (1.5 percent) reviews had at least one pair of meta-analyses that fulfilled our eligibility criteria. This proportion is probably an underestimate. We used outcomes exactly as defined by the Cochrane reviewers, and did not make any effort to redefine them to facilitate joint meta-analysis. Cochrane reviews include only univariate meta-analysis; if they were routinely performing multivariate analyses, they might have reviewed a larger number of outcomes. Further, we limited our analyses to examples where counts of combinations of outcomes are exactly recoverable from data used in univariate meta-analysis. However, it is possible that complete data on combinations of categorical outcomes can be obtained by contacting primary study authors, or even with care and perseverance during extraction of data from published articles.50 Finally, a single reviewer judged the eligibility of each pair or triplet of meta-analyses from the Cochrane Library, without checking by others. Nevertheless, eligibility criteria pertaining to the number of total studies or studies that are common to all outcomes, and the minimum number of patients were done programmatically, and thus consistently. The only judgment calls pertained to the relationships between pairs or triplets of outcomes (mutually exclusive, one being a subset of the other, or other relationship).

Our results and conclusions are limited by the decision to use multivariate outcomes that could be represented as a set of categories, either mutually exclusive or represented as subsets of each other. This choice was motivated by the desire to have known correlations among the multivariate outcomes, but it does rule out consideration of many common multivariate outcomes and design structures for which our findings may not hold. Common multivariate outcomes that we do not consider include different biomarkers, repeated measurements of outcomes at different times, different adverse events, combinations of efficacy and safety endpoints, combinations of medical outcomes and quality of life measures and bivariate analysis of sensitivity and specificity in studies of diagnostic test accuracy.

Another key observation is that although the Cochrane reviews reported outcomes as event counts either at one or several points in time, the outcomes are fundamentally not counts but rather time-to-event outcomes that could, or perhaps should, be analyzed by survival analysis with appropriate adjustment for censoring. Moreover, the different types of outcome categories suggest competing risks analysis. While such analyses may be preferred if the individual outcome times are available, in many cases only summary counts are reported and time-to-event analyses must be sacrificed in place of the multinomial analyses used here.

Thus, our conclusions must be tempered by the restricted set of problems considered, the lack of reporting of appropriate metrics of analysis, the lack of complete individual patient data with which to carry out the ideal statistical analysis and the necessarily limited simulations that, for instance, only consider two outcomes at a time. Further study may uncover differences between the univariate and multivariate analyses that we did not find. Of particular importance, broader conclusions can be drawn through analytical approaches, at least in the models that use the normal approximation to model within study variance.

If the patterns that we observed in this work are more broadly applicable, and provided that one is not interested in linear combinations of treatment effects across outcomes (e.g., log relative odds ratios), it may be argued that decisions between univariate (separate) and multivariate (joint) meta-analysis have theoretical rather than practical interest. So should one use separate or joint meta-analysis for sets of outcomes that can be approached with either method? In theory, the decision on performing separate versus joint meta-analyses depends on the underlying assumptions that the researcher is prepared to make about the data. Ideally, these decisions should be made early in the analysis, and not after examining the data. The key reason for using multivariate meta-analysis is that, through the correlations, it utilizes more information. Though in the majority of the 45 applications there is very little clinical or statistical difference in the results/conclusions, this itself is an important finding in each case. In any single application, if the multivariate approach does not change the conclusions from a univariate approach this increases the reliability of the findings, and gives some reassurance to the clinician that the findings robust. The fact that the conclusion does not change does not automatically render the multivariate result of no practical use (see also the discussion by Trikalinos and Olkin11).

An additional opportunity where multivariate meta-analysis may yield more precise or different results than univariate analyses, is when there is preferential non-reporting of results for one of the outcomes that could be analyzed jointly. Many systematic reviews neglect to analyze certain outcomes because of the number of studies in which these outcomes go unreported. The remaining studies may be felt either to be too few to provide an accurate estimate or to be unrepresentative of the complete set because of outcome reporting bias caused by failure to report the outcome because of the lack of statistical or biological significance of its estimated effect. Because multivariate models incorporate the correlations between the outcomes, they may provide information about the missing outcomes and enable them to be effectively incorporated into analyses by the borrowing of strength from the observed outcomes.51 In such cases, multivariate models may give more accurate and more precise estimates than univariate models.

While we consider it commendable to conduct both univariate and multivariate meta-analysis in sensitivity analyses, when possible, we are reluctant to recommend this practice as a minimum standard for systematic review and meta-analysis. A minimum standard implies that failing to follow the recommendation can result in misleading conclusions, and prove detrimental to decision making. In our opinion, our findings and the findings of others are compatible with the notion that using multivariate meta-analysis methods is good practice, but probably not a prerequisite for drawing valid conclusions in an applied meta-analysis setting.