Systematic reviewers often employ a “best evidence” approach to address the key questions in the reviews. What is meant by “best,” however, is often unclear. Clearly, some manner of evidence prioritization (i.e., prioritizing some studies over others) is employed by all systematic reviews. This prioritization can help ensure (but cannot guarantee) that the review's conclusions will stand the test of time.

The phrase “best evidence” was used by Slavin in a 1995 article as an “intelligent alternative” to meta-analysis.1 Instead, this paper uses “best evidence” to refer to any strategy for prioritizing evidence, regardless of whether that evidence is combined quantitatively in a meta-analysis.

This paper encompasses different interpretations of the phrase “best evidence.” Some reviewers may interpret it to mean the “best available evidence,” and would therefore always include at least one study in the evidence base for a Key Question. Other reviewers may believe this approach to be too lenient, because the best available evidence may be too biased and potentially misleading, thus sometimes no studies should be included. These latter reviewers can be said to use a threshold interpretation of “best evidence.” Both interpretations fit within the larger framework of this paper.

Existing guidance from the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care (EHC) Program addresses the notion of “best evidence” in at least two areas: the study inclusion criteria2 and the inclusion of nonrandomized studies of beneficial effects. Granted, the study inclusion criteria are not normally considered to define the “best evidence,” but rather they typically define the relevant evidence, and the “best evidence” is a subset of what is relevant. Nevertheless, inclusion criteria implicitly prioritize evidence, for example, the inclusion of studies only of a certain design or a certain minimum number of study participants. Studies failing the inclusion criteria receive zero priority. Thus, we place inclusion criteria within the relatively large network of decisions encompassing “best evidence.”

The second relevant area of existing EHC guidance is the chapter on when to include nonrandomized studies of beneficial effects.3 This chapter acknowledges that for many topics in comparative effectiveness, the randomized evidence is insufficient to answer the Key Question. This may be due to poor applicability, low precision, risk of bias (based on other problems with the study's design or conduct), or other factors. The insufficiency of randomized evidence necessitates the consideration of nonrandomized evidence, which may or may not lead to a conclusion, but at least it should be considered in an effort to reach a conclusion. This represents a specific example of a “best evidence” approach in which a reviewer may potentially include nonrandomized evidence as long as their risk of bias is not too high. This approach involves a consideration of the results of randomized trials (i.e., their conclusiveness) when considering whether to include nonrandomized evidence. However, this staged approach should be planned a priori to avoid the possible bias of trial results directly influencing study inclusion decisions.

Using randomization alone as a basis for prioritization is one example, and many other prioritization schemes are possible. For example, within a set of identified randomized trials, the variation in risk-of-bias can be considerable, and many systematic reviewers have subprioritized randomized trials in various ways. One approach is to include only blinded randomized trials, thereby employing a best-evidence approach at the level of the inclusion criteria (for examples, see references 4–7). Clearly, this is only possible if an evidence base contains many randomized studies and the reviewer has the luxury of excluding unblinded randomized trials. Conversely, in the absence of randomized trials, there can be considerable variability in designs of nonrandomized studies, and a subprioritization of these can be easily justified (e.g., based on whether the authors matched groups at baseline). For examples of reviewers subprioritizing nonrandomized studies, see references 8–12.

In addition to using the inclusion criteria as a vehicle for prioritizing evidence, many other approaches are possible. Some reviews may contain a set of studies that are included and tabled, but not actually analyzed. Some may distinguish between qualitative analysis and quantitative analysis, and perform a meta-analysis on only the highest priority subset of studies. Some may formulate the review conclusions based only on a higher priority subset, or rate the strength of evidence based only on the higher priority subset. These activities, while very different in implementation, all serve to draw the reader's attention towards some studies and away from other studies, and they are discussed in a later section of this report.

Overall, evidence prioritization is a common and necessary practice in systematic reviews. However, the variety of dilemmas facing reviewers, some of which are unanticipated, has spawned innumerable approaches, with no organizing framework. This absence of guidance was the impetus behind this project.

Before we describe the objectives and methods of the project, we list three caveats:

  1. Different topics demand different approaches, and it is not the purpose of this document to recommend any single approach. Thus, we do not recommend some prioritization strategies over others.
  2. None of the strategies require meta-analysis, and also none preclude meta-analysis. Thus, the framework is independent of how the results of different studies are considered together.
  3. Any of the strategies can potentially result in a judgment that the evidence is insufficient to answer the Key Question. Some strategies do consider the conclusiveness of the evidence when prioritizing evidence (such as the aforementioned EHC chapter on when to include nonrandomized studies), whereas others do not. None, however, can guarantee an answer to the Key Question.

Essentially, this paper addresses a reviewer's decisions about lowering the evidence threshold. Why might reviewers do this? How can it be done? When does one stop lowering the bar? The following sections flesh out answers to these questions and are intended to map out numerous options for systematic reviewers.