NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Institute of Medicine (US) Committee on Standards for Systematic Reviews of Comparative Effectiveness Research; Eden J, Levit L, Berg A, et al., editors. Finding What Works in Health Care: Standards for Systematic Reviews. Washington (DC): National Academies Press (US); 2011.

Cover of Finding What Works in Health Care

Finding What Works in Health Care: Standards for Systematic Reviews.

Show details

3Standards for Finding and Assessing Individual Studies

Abstract: This chapter addresses the identification, screening, data collection, and appraisal of the individual studies that make up a systematic review’s (SR’s) body of evidence. The committee recommends six related standards. The search should be comprehensive and include both published and unpublished research. The potential for bias to enter the selection process is significant and well documented. Without appropriate measures to counter the biased reporting of primary evidence from clinical trials and observational studies, SRs will reflect and possibly exacerbate existing distortions in the biomedical literature. The review team should document the search process and keep track of the decisions that are made for each article. Quality assurance and control are critical during data collection and extraction because of the substantial potential for errors. At least two review team members, working independently, should screen and select studies and extract quantitative and other critical data from included studies. Each eligible study should be systematically appraised for risk of bias; relevance to the study’s populations, interventions, and outcomes measures; and fidelity of the implementation of the interventions.

The search for evidence and critical assessment of the individual studies identified are the core of a systematic review (SR). These SR steps require meticulous execution and documentation to minimize the risk of a biased synthesis of evidence. Current practice falls short of recommended guidance and thus results in a meaningful proportion of reviews that are of poor quality (Golder et al., 2008; Moher et al., 2007a; Yoshii et al., 2009). An extensive literature documents that many SRs provide scant, if any, documentation of their search and screening methods. SRs often fail to acknowledge or address the risk of reporting biases, neglect to appraise the quality of individual studies included in the review, and are subject to errors during data extraction and the meta-analysis (Cooper et al., 2006; Delaney et al., 2007; Edwards et al., 2002; Golder et al., 2008; Gøtzsche et al., 2007; Horton et al., 2010; Jones et al., 2005; Lundh et al., 2009; Moher et al., 2007a; Roundtree et al., 2008; Tramer et al., 1997). The conduct of the search for and selection of evidence may have serious implications for patients’ and clinicians’ decisions. An SR might lead to the wrong conclusions and, ultimately, the wrong clinical recommendations, if relevant data are missed, errors are uncorrected, or unreliable research is used (Dickersin, 1990; Dwan et al., 2008; Glanville et al., 2006; Gluud, 2006; Kirkham et al., 2010; Turner et al., 2008).

In this chapter, the committee recommends methodological standards for the steps involved in identifying and assessing the individual studies that make up an SR’s body of evidence: planning and conducting the search for studies, screening and selecting studies, managing data collection from eligible studies, and assessing the quality of individual studies. The committee focused on steps to minimize bias and to promote scientifically rigorous SRs based on evidence (when available), expert guidance, and thoughtful reasoning. The recommended standards set a high bar that will be challenging for many SR teams. However, the available evidence does not suggest that it is safe to cut corners if resources are limited. These best practices should be thoughtfully considered by anyone conducting an SR. It is especially important that the SR is transparent in reporting what methods were used and why.

Each standard consists of two parts: first, a brief statement describing the related SR step and, second, one or more elements of performance that are fundamental to carrying out the step. Box 3-1 lists all of the chapter’s recommended standards.

Box Icon

BOX 3-1

Recommended Standards for Finding and Assessing Individual Studies. Standard 3.1 Conduct a comprehensive systematic search for evidence Required elements:

Note that, as throughout this report, the chapter’s references to “expert guidance” refer to the published methodological advice of the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program, the Centre for Reviews and Dissemination (CRD) (University of York), and the Cochrane Collaboration. Appendix E contains a detailed summary of expert guidance on this chapter’s topics.


When healthcare decision makers turn to SRs to learn the potential benefits and harms of alternative health care therapies, it is with the expectation that the SR will provide a complete picture of all that is known about an intervention. Research is relevant to individual decision making, whether it reveals benefits, harms, or lack of effectiveness of a health intervention. Thus, the overarching objective of the SR search for evidence is to identify all the studies (and all the relevant data from the studies) that may pertain to the research question and analytic framework. The task is a challenging one. Hundreds of thousands of research articles are indexed in bibliographic databases each year. Yet despite the enormous volume of published research, a substantial proportion of effectiveness data are never published or are not easy to access. For example, approximately 50 percent of studies appearing as conference abstracts are never fully published (Scherer et al., 2007), and some studies are not even reported as conference abstracts. Even when there are published reports of effectiveness studies, the studies often report only a subset of the relevant data. Furthermore, it is well documented that the data reported may not represent all the findings on an intervention’s effectiveness because of pervasive reporting bias in the biomedical literature. Moreover, crucial information from the studies is often difficult to locate because it is kept in researchers’ files, government agency records, or manufacturers’ proprietary records.

The following overview further describes the context for the SR search process: the nature of the reporting bias in the biomedical literature; key sources of information on comparative effectiveness; and expert guidance on how to plan and conduct the search. The committee’s related standards are presented at the end of the section.

Planning the Search

The search strategy should be an integral component of the research protocol1 that specifies procedures for finding the evidence directly relevant to the SR. Items described in the protocol include, but are not limited to, the study question; the criteria for a study’s inclusion in the review (including language and year of report, publication status, and study design restrictions, if any); the databases, journals, and other sources to be searched for evidence; and the search strategy (e.g., sequence of database thesaurus terms, text words, methods of handsearching).

Expertise in Searching

A librarian or other qualified information specialist with training or experience in conducting SRs should work with the SR team to design the search strategy to ensure appropriate translation of the research question into search concepts, correct choice of Boolean operators and line numbers, appropriate translation of the search strategy for each database, relevant subject headings, and appropriate application and spelling of terms (Sampson and McGowan, 2006). The Cochrane Collaboration includes an Information Retrieval Methods Group2 that provides a valuable resource for information specialists seeking a professional group with learning opportunities.

Expert guidance recommends that an experienced librarian or information specialist with training in SR search methods should also be involved in performing the search (CRD, 2009; Lefebvre et al., 2008; McGowan and Sampson, 2005; Relevo and Balshem, 2011). Navigating through the various sources of research data and publications is a complex task that requires experience with a wide range of bibliographic databases and electronic information sources, and substantial resources (CRD, 2009; Lefebvre et al., 2008; Relevo and Balshem, 2011).

Ensuring an Accurate Search

An analysis of SRs published in the Cochrane Database of Systematic Reviews found that 90.5 percent of the MEDLINE searches contained at least one search error (Sampson and McGowan, 2006). Errors included spelling errors, the omission of spelling variants and truncations, the use of incorrect Boolean operators and line numbers, inadequate translation of the search strategy for different databases, misuse of MeSH3 and free-text terms, unwarranted explosion of MeSH terms, and redundancy in search terms. Common sense suggests that these errors affect the accuracy and overall quality of SRs. AHRQ and CRD SR experts recommend peer review of the electronic search strategy to identify and prevent these errors from occurring (CRD, 2009; Relevo and Balshem, 2011). The peer reviewer should be independent from the review team in order to provide an unbiased and scientifically rigorous review, and should have expertise in information retrieval and SRs. In addition, the peer review process should take place prior to the search process, rather than in conjunction with the peer review of the final report, because the search process will provide the data that are synthesized and analyzed in the SR.

Sampson and colleagues (2009) recently surveyed individuals experienced in SR searching and identified aspects of the search process that experts agree are likely to have a large impact on the sensitivity and precision of a search: accurate translation of each research question into search concepts; correct choice of Boolean and proximity operators; absence of spelling errors; correct line numbers and combination of line numbers; accurate adaptation of the search strategy for each database; and inclusion of relevant subject headings. Then they developed practice guidelines for peer review of electronic search strategies. For example, to identify spelling errors in the search they recommended that long strings of terms be broken into discrete search statements in order to make null or misspelled terms more obvious and easier to detect. They also recommended cutting and pasting the search into a spell checker. As these guidelines and others are implemented, future research needs to be conducted to validate that peer review does improve the search quality.

Reporting Bias

Reporting biases (Song et al., 2010), particularly publication bias (Dickersin, 1990; Hopewell et al., 2009a) and selective reporting of trial outcomes and analyses (Chan et al., 2004a, 2004b; Dwan et al., 2008; Gluud, 2006; Hopewell et al., 2008; Turner et al., 2008; Vedula et al., 2009), present the greatest obstacle to obtaining a complete collection of relevant information on the effectiveness of healthcare interventions. Reporting biases have been identified across many health fields and interventions, including treatment, prevention, and diagnosis. For example, McGauran and colleagues (2010) identified instances of reporting bias spanning 40 indications and 50 different pharmacological, surgical, diagnostic, and preventive interventions and selective reporting of study data as well as efforts by manufacturers to suppress publication. Furthermore, the potential for reporting bias exists across the entire research continuum—from before completion of the study (e.g., investigators’ decisions to register a trial or to report only a selection of trial outcomes), to reporting in conference abstracts, selection of a journal for submission, and submission of the manuscript to a journal or other resource, to editorial review and acceptance.

The following describes the various ways in which reporting of research findings may be biased. Table 3-1 provides definitions of the types of reporting biases.

TABLE 3-1. Types of Reporting Biases.


Types of Reporting Biases.

Publication Bias

The term publication bias refers to the likelihood that publication of research findings depends on the nature and direction of a study’s results. More than two decades of research have shown that positive findings are more likely to be published than null or negative results. At least four SRs have assessed the association between study results and publication of findings (Song et al., 2009). These investigations plus additional individual studies indicate a strong association between statistically significant or positive results and likelihood of publication (Dickersin and Chalmers, 2010).

Investigators (not journal editors) are believed to be the major reason for failure to publish research findings (Dickersin and Min, 1993; Dickersin et al., 1992). Studies examining the influence of editors on acceptance of submitted manuscripts have not found an association between results and publication (Dickersin et al., 2007; Lynch et al., 2007; Okike et al., 2008; Olson et al., 2002).

Selective Outcome Reporting Bias

To avert problems introduced by post hoc selection of study outcomes, a randomized controlled trial’s (RCT’s) primary outcome should be stated in the research protocol a priori, before the study begins (Kirkham et al., 2010). Statistical testing of the effect of an intervention on multiple possible outcomes in a study can lead to a greater probability of statistically significant results obtained by chance. When primary or other outcomes of a study are selected and reported post hoc (i.e., after statistical testing), the reader should be aware that the published results for the “primary outcome” may be only a subset of relevant findings, and may be selectively reported because they are statistically significant.

Outcome reporting bias refers to the selective reporting of some outcomes but not others because of the nature and direction of the results. This can happen when investigators rely on hypothesis testing to prioritize research based on the statistical significance of an association. In the extreme, if only positive outcomes are selectively reported, we would not know that an intervention is ineffective for an important outcome, even if it had been tested frequently (Chan and Altman, 2005; Chan et al., 2004a,b; Dwan et al., 2008; Turner et al., 2008; Vedula et al., 2009).

Recent research on selective outcome reporting bias has focused on industry-funded trials, in part because internal company documents may be available, and in part because of evidence of biased reporting that favors their test interventions (Golder and Loke, 2008; Jorgensen et al., 2008; Lexchin et al., 2003; Nassir Ghaemi et al., 2008; Ross et al., 2009; Sismondo 2008; Vedula et al., 2009).

Mathieu and colleagues (2009) found substantial evidence of selective outcome reporting. The researchers reviewed 323 RCTs with results published in high-impact journals in 2008. They found that only 147 had been registered before the end of the trial with the primary outcome specified. Of these 147, 46 (31 percent) were published with different primary outcomes than were registered, with 22 introducing a new primary outcome. In 23 of the 46 discrepancies, the influence of the discrepancy could not be determined. Among the remaining 23 discrepancies, 19 favored a statistically significant result (i.e. a new statistically significant primary outcome was introduced in the published article or a nonsignificant primary outcome was omitted or not defined as primary in the published article).

In a study of 100 trials published in high-impact journals between September 2006 and February 2007 and also registered in a trial registry, Ewart and colleagues found that in 34 cases (31 percent) the primary outcome had changed (10 by addition of a new primary outcome; 3 by promotion from a secondary outcome; 20 by deletion of a primary outcome; and 6 by demotion to a secondary outcome); and in 77 cases (70 percent) the secondary outcome changed (54 by addition of a new secondary outcome; 5 by demotion from a primary outcome; 48 by deletion; 3 by promotion to a primary outcome) (Ewart et al., 2009).

Acquiring unpublished data from industry can be challenging. However, when available, unpublished data can change an SR’s conclusions about the benefits and harms of treatment. A review by Eyding and colleagues demonstrates both the challenge of acquiring all relevant data from a manufacturer and how acquisition of those data can change the conclusion of an SR (Eyding et al., 2010). In their SR, which included both published and unpublished data acquired from the drug manufacturer, Eyding and colleagues found that published data overestimated the benefit of the antidepressant reboxetine over placebo by up to 115 percent and over selective serotonin reuptake inhibitors (SSRIs) by up to 23 percent. The addition of unpublished data changed the superiority of reboxetine vs. placebo to a nonsignificant difference and the nonsignificant difference between reboxetine and SSRIs to inferiority for reboxetine. For patients with adverse events and rates of withdrawals from adverse events inclusion of unpublished data changed nonsignificant difference between reboxetine and placebo to inferiority of reboxetine; while for rates of withdrawals for adverse events inclusion of unpublished data changed the nonsignificant difference between reboxetine and fluoxetine to an inferiority of fluoxetine.

Although there are many studies documenting the problem of publication bias and selective outcome reporting bias, few studies have examined the effect of such bias on SR findings. One recent study by Kirkham and colleagues assessed the impact of outcome reporting bias in individual trials on 81 SRs published in 2006 and 2007 by Cochrane review groups (Kirkham et al., 2010). More than one third of the reviews (34 percent) included at least one RCT with suspected outcome reporting bias. The authors assessed the potential impact of the bias and found that meta-analyses omitting trials with presumed selective outcome reporting for the primary outcome could overestimate the treatment effect. They also concluded that trials should not be excluded from SRs simply because outcome data appear to be missing when in fact the missing data may be due to selective outcome reporting. The authors suggest that in such cases the trialists should be asked to provide the outcome data that were analyzed, but not reported.

Time-lag Bias

In an SR of the literature, Hopewell and her colleagues (2009a) found that trials with positive results (statistically significant in favor of the experimental arm) were published about a year sooner than trials with null or negative results (not statistically significant or statistically significant in favor of the control arm). This has implications for both systematic review teams and patients. If positive findings are more likely to be available during the search process, then SRs may provide a biased view of current knowledge. The limited evidence available implies that publication delays may be caused by the investigator rather than by journal editors (Dickersin et al., 2002b; Ioannidis et al., 1997, 1998).

Location Bias

The location of published research findings in journals with different ease of access or levels of indexing is also correlated with the nature and direction of results. For example, in a Cochrane methodology review, Hopewell and colleagues identified five studies that assessed the impact of including trials published in the grey literature in an SR (Hopewell et al., 2009a). The studies found that trials in the published literature tend to be larger and show an overall larger treatment effect than those trials found in the grey literature (primarily abstracts and unpublished data, such as data from trial registries, “file drawer data,” and data from individual trialists). The researchers suggest that, by excluding grey literature, an SR or meta-analysis is likely to artificially inflate the benefits of a health care intervention.

Language Bias

As in other types of reporting bias, language bias refers to the publication of research findings in certain languages, depending on the nature and direction of the findings. For example, some evidence shows that investigators in Germany may choose to publish their negative RCT findings in non-English language journals and their positive RCT findings in English-language journals (Egger and Zellweger-Zahner, 1997; Heres et al., 2004). However, there is no definitive evidence on the impact of excluding articles in languages other than English (LOE), nor is there evidence that non-English language articles are of lower quality (Moher et al., 1996); the differences observed appear to be minor (Moher et al., 2003).

Some studies suggest that, depending on clinical specialty or disease, excluding research in LOE may not bias SR findings (Egger et al., 2003; Gregoire et al., 1995; Moher et al., 2000, 2003; Morrison et al., 2009). In a recent SR, Morrison and colleagues examined the impact on estimates of treatment effect when RCTs published in LOE are excluded (Morrison et al., 2009).4 The researchers identified five eligible reports (describing three unique studies) that assessed the impact of excluding articles in LOE on the results of a meta-analysis. None of the five reports found major differences between English-only meta-analyses and meta-analyses that included trials in LOE (Egger et al., 2003; Jüni et al., 2002; Moher et al., 2000, 2003; Pham et al., 2005; Schulz et al., 1995).

Many SRs do not include articles in LOE, probably because of the time and cost involved in obtaining and translating them. The committee recommends that the SR team consider whether the topic of the review might require searching for studies not published in English.

Multiple (Duplicate) Publication Bias

Investigators sometimes publish the same findings multiple times, either overtly or what appears to be covertly. When two or more articles are identical, this constitutes plagiarism. When the articles are not identical, the systematic review team has difficulty discerning whether the articles are describing the findings from the same or different studies. von Elm and colleagues described four situations that may suggest duplicate publication; these include articles with the following features: (1) identical samples and outcomes; (2) identical samples and different outcomes; (3) samples that are larger or smaller, yet with identical outcomes; and (4) different samples and different outcomes (von Elm et al., 2004). The World Association of Medical Editors (WAME, 2010) and the International Committee of Medical Journal Editors (ICMJE, 2010) have condemned duplicate or multiple publication when there is no clear indication that the article has been published before.

Von Elm and colleagues (2004) identified 141 SRs in anesthesia and analgesia that included 56 studies that had been published two or more times. Little overlap occurred among authors on the duplicate publications, with no cross-referencing of the articles. Of the duplicates, 33 percent were funded by the pharmaceutical industry. Most of the duplicate articles (63 percent) were published in journal supplements soon after the “main” article. Positive results appear to be published more often in duplicate, which can lead to overestimates of a treatment effect if the data are double counted (Tramer et al., 1997).

Citation Bias

Searches of online databases of cited articles are one way to identify research that has been cited in the references of published articles. However, many studies show that, across a broad array of topics, authors tend to cite selectively only the positive results of other studies (omitting the negative or null findings) (Gøtzsche, 1987; Kjaergard and Als-Nielsen, 2002; Nieminen et al., 2007; Ravnskov, 1992, 1995; Schmidt and Gøtzsche, 2005;). Selective pooling of results, that is, when the authors perform a meta-analysis of studies they have selected without a systematic search for all evidence, could be considered both a non-SR and a form of citation bias. Because a selective meta-analysis or pooling does not reflect the true state of research evidence, it is prone to selection bias and may even reflect what the authors want us to know, rather than the totality of knowledge.

Addressing Reporting Bias

Reporting bias clearly presents a fundamental obstacle to the scientific integrity of SRs on the effectiveness of healthcare inter ventions. However, at this juncture, important, unresolved questions remain on how to overcome the problem. No empirically-based techniques have been developed that can predict which topics or research questions are most vulnerable to reporting bias. Nor can one determine when reporting bias will lead to an “incorrect” conclusion about the effectiveness of an intervention. Moreover, researchers have not yet developed a low-cost, effective approach to identifying a complete, unbiased literature for SRs of comparative effectiveness research (CER).

SR experts recommend a prespecified, systematic approach to the search for evidence that includes not only easy-to-access bibliographic databases, but also other information sources that contain grey literature, particularly trial data, and other unpublished reports. The search should be comprehensive and include both published and unpublished research. The evidence on reporting bias (described above) is persuasive. Without appropriate measures to counter the biased reporting of primary evidence from clinical trials and observational studies, SRs may only reflect—and could even exacerbate—existing distortions in the biomedical literature. The implications of developing clinical guidance from incomplete or biased knowledge may be serious (Moore, 1995; Thompson et al., 2008). Yet, many SRs fail to address the risk of bias during the search process.

Expert guidance also suggests that the SR team contact the researchers and sponsors of primary research to clarify unclear reports or to obtain unpublished data that are relevant to the SR. See Table 3-2 for key techniques and information sources recommended by AHRQ, CRD, and the Cochrane Collaboration. Appendix E provides further details on expert guidance.

TABLE 3-2. Expert Suggestions for Conducting the Search Process and Addressing Reporting Bias.


Expert Suggestions for Conducting the Search Process and Addressing Reporting Bias.

Key Information Sources

Despite the imperative to conduct an unbiased search, many SRs use abbreviated methods to search for the evidence, often because of resource limitations. A common error is to rely solely on a limited number of bibliographic databases. Large databases, such as MEDLINE and Embase (Box 3-2), are relatively easy to use, but they often lack research findings that are essential to answering questions of comparative effectiveness (CRD, 2009; Hopewell et al., 2009b; Lefebvre et al., 2008; Scherer et al., 2007; Song et al., 2010). The appropriate sources of information for an SR depend on the research question, analytic framework, patient outcomes of interest, study population, research design (e.g., trial data vs. observational data), likelihood of publication, authors, and other factors (Egger et al., 2003; Hartling et al., 2005; Helmer et al., 2001; Lemeshow et al., 2005). Relevant research findings may reside in a large, well-known bibliographic databases, subject-specific or regional databases, or in the grey literature.

Box Icon

BOX 3-2

Bibliographic Databases. Cochrane Central Register of Controlled Trials (CENTRAL)—A database of more than 500,000 records of controlled trials and other healthcare interventions including citations published in languages other than English and (more...)

The following summarizes the available evidence on the utility of key data sources—such as bibliographic databases, grey literature, trial registries, and authors or sponsors of relevant research—primarily for searching for results from RCTs. While considerable research has been done to date on finding relevant randomized trials (Dickersin et al., 1985; Dickersin et al., 1994; McKibbon et al., 2009; Royle and Milne, 2003; Royle and Waugh, 2003), less work has been done on methods for identifying qualitative (Flemming and Briggs, 2007) and observational data for a given topic (Booth 2006; Furlan et al., 2006; Kuper et al., 2006; Lemeshow et al., 2005). The few electronic search strategies that have been evaluated to identify studies of harms, for example, suggest that further methodological research is needed to find an efficient balance between sensitivity5 and precision in conducting electronic searches (Golder and Loke, 2009).

Less is known about the consequences of including studies missed in these searches. For example, one SR of the literature on search methods found that adverse effects information was included more frequently in unpublished sources, but also concluded that there was insufficient evidence to determine how including unpublished studies affects an SR’s pooled risk estimates of adverse effects (Golder and Loke, 2010). Nevertheless, one must assume that the consequences of missing relevant articles may be clinically significant especially if the search fails to identify data that might alter conclusions about the risks and benefits of an intervention.

Bibliographic Databases

Unfortunately, little empirical evidence is available to guide the development of an SR bibliographic search strategy. As a result, the researcher has to scrutinize a large volume of articles to identify the relatively small proportion that are relevant to the research question under consideration. At present, no one database or information source is sufficient to ensure an unbiased, balanced picture of what is known about the effectiveness, harms, and benefits of health interventions (Betran et al., 2005; Crumley et al., 2005; Royle et al., 2005; Tricco et al., 2008). Betran and colleagues, for example, assessed the utility of different databases for identifying studies for a World Health Organization (WHO) SR of maternal morbidity and mortality (Betran et al., 2005). After screening more than 64,000 different citations, they identified 2,093 potentially eligible studies. Several databases were sources of research not found elsewhere; 20 percent of citations were found only in MEDLINE, 7.4 percent in Embase, and 5.6 percent in LILACS and other topic specific databases.

Specialized databases Depending on the subject of the SR, specialized topical databases such as POPLINE and PsycINFO may provide research findings not available in other databases (Box 3-3). POPLINE is a specialized database of abstracts of scientific articles, reports, books, and unpublished reports in the field of population, family planning, and related health issues. PsycINFO, a database of psychological literature, contains journal articles, book chapters, books, technical reports, and dissertations related to behavioral health interventions.

Box Icon

BOX 3-3

Subject-Specific Databases. Campbell Collaboration Social, Psychological, Educational & Criminological Trials Register (C2-SPECTR)—A registry of more than 10,000 trials in education, social work and welfare, and criminal justice. The primary (more...)

Citation indexes Scopus, Web of Science, and other citation indexes are valuable for finding cited reports from journals, trade publications, book series, and conference papers from the scientific, technical, medical, social sciences, and arts and humanities fields (Bakkalbasi et al., 2006; Chapman et al., 2010; Falagas et al., 2008; ISI Web of Knowledge, 2009; Kuper et al., 2006; Scopus, 2010). Searching the citations of previous SRs on the same topic could be particularly fruitful.

Grey literature Grey literature includes trial registries (discussed below), conference abstracts, books, dissertations, monographs, and reports held by the Food and Drug Administration (FDA) and other government agencies, academics, business, and industry. Grey-literature databases, such as those described in Box 3-4, are important sources for technical or research reports, doctoral dissertations, conference papers, and other research.

Box Icon

BOX 3-4

Grey-Literature Databases. New York Academy of Medicine Grey Literature Report—A bimonthly publication of the New York Academy of Medicine Library that includes grey literature in health services research and selected public health topics. OAIster (more...)

Handsearching Handsearching is when researchers manually examine—page by page—each article, abstract, editorial, letter to the editor, or other items in journals to identify reports of RCTs or other relevant evidence (Hopewell et al., 2009b). No empirical research shows how an SR’s conclusions might be affected by adding trials identified through a handsearch. However, for some CER topics and circumstances, handsearching may be important (CRD, 2009; Hopewell et al., 2009a; Lefebvre et al., 2008; Relevo and Balshem, 2011). The first or only appearance of a trial report, for example, may be in the nonindexed portions of a journal.

Contributors to the Cochrane Collaboration have handsearched literally thousands of journals and conference abstracts to identify controlled clinical trials and studies that may be eligible for Cochrane reviews (Dickersin et al., 2002a). Using a publicly available resource, one can identify which journals, abstracts, and years have been or are being searched by going to the Cochrane Master List of Journals Being Searched.6 If a subject area has been well covered by Cochrane, then it is probably reasonable to forgo handsearching and to rely on the Cochrane Central Register of Controlled Trials (CENTRAL), which should contain the identified articles and abstracts. It is always advisable to check with the relevant Cochrane review group to confirm the journals/conference abstracts that have been searched and how they are indexed in CENTRAL. The CENTRAL database is available to all subscribers to the Cochrane Library. For example, if the search topic was eye trials, numerous years of journals and conference abstracts have been searched, and included citations have been MeSH coded if they were from a source not indexed on MEDLINE. Because of the comprehensive searching and indexing available for the eyes and vision field, one would not need to search beyond CENTRAL.

Clinical Trials Data

Clinical trials produce essential data for SRs on the therapeutic effectiveness and adverse effects of health care interventions. However, the findings for a substantial number of clinical trials are never published (Bennett and Jull, 2003; Hopewell et al., 2009b; MacLean et al., 2003; Mathieu et al., 2009; McAuley et al., 2000; Savoie et al., 2003; Turner et al., 2008). Thus, the search for trial data should include trial registries (, Clinical Study Results, Current Controlled Trials, and WHO International Clinical Trials Registry), FDA medical and statistical reviews records (MacLean et al., 2003; Turner et al., 2008), conference abstracts (Hopewell et al., 2009b; McAuley et al., 2000), non-English literature, and outreach to investigators (CRD, 2009; Golder et al., 2010; Hopewell et al., 2009b; Lefebvre et al., 2008; Miller, 2010; O’Connor, 2009; Relevo and Balshem, 2011; Song et al., 2010).

Trial registries Trial registries have the potential to address the effects of reporting bias if they provide complete data on both ongoing and completed trials (Boissel, 1993; Dickersin, 1988; Dickersin and Rennie, 2003; Hirsch, 2008; NLM, 2009; Ross et al., 2009; Savoie et al., 2003; Song et al., 2010; WHO, 2010; Wood, 2009). One can access a large proportion of international trials registries using the WHO International Clinical Trials Registry Platform (WHO, 2010). is the most comprehensive public registry. It was established in 2000 by the National Library of Medicine as required by the FDA Modernization Act of 19977 (NLM, 2009). At its start, had minimal utility for SRs because the required data were quite limited, industry compliance with the mandate was poor, and government enforcement of sponsors’ obligation to submit complete data was lax (Zarin, 2005). The International Committee of Medical Journal Editors (ICMJE), among others, spurred trial registration overall by requiring authors to enroll trials in a public trials registry at or before the beginning of patient enrollment as a precondition for publication in member journals (DeAngelis et al., 2004). The implementation of this policy is associated with a 73 percent increase in worldwide trial registrations at for all intervention types (Zarin et al., 2005).

The FDA Amendments Act of 20078 significantly expanded the potential depth and breadth of the registry. The act mandates that sponsors of any ongoing clinical trial involving a drug, biological product, or device approved for marketing by the FDA, not only register the trial,9 but also submit data on the trial’s research protocol and study results (including adverse events).10 As of October 2010, 2,300 results records are available. Much of the required data have not yet been submitted (Miller, 2010), and Congress has allowed sponsors to delay posting of results data until after the product is granted FDA approval. New regulations governing the scope and timing of results posting are pending (Wood, 2009).

Data gathered as part of the FDA approval process The FDA requires sponsors to submit extensive data about efficacy and safety as part of the New Drug Application (NDA) process. FDA analysts—statisticians, physicians, pharmacologists, and chemists—examine and analyze these data.

Although the material submitted by the sponsor is confidential, under the Freedom of Information Act, the FDA is required to make its analysts’ reports public after redacting proprietary or sensitive information. Since 1998, selected, redacted copies of reports conducted by FDA analysts have been publicly available (see Drugs@ FDA11). When available, these are useful for obtaining clinical trials data, especially when studies are not otherwise reported.12,13 For example, as part of an SR of complications from nonsteroidal anti-inflammatory drugs (NSAIDs), MacLean and colleagues identified trials using the FDA repository. They compared two groups of studies meeting inclusion criteria for the SR: published reports of trials and studies included in submissions to the FDA. They identified 20 published studies on the topic and 37 studies submitted to the FDA that met their inclusion criteria. Only one study was in both the published and FDA groups (i.e., only 1 of 37 studies submitted to the FDA was published) (MacLean et al., 2003). The authors found no meaningful differences in the information reported in the FDA report and the published report on sample size, gender distribution, indication for drug use, and components of study methodological quality. This indicated, at least in this case, there is no reason to omit unpublished research from an SR for reasons of study quality.

Several studies have demonstrated that the FDA repository provides opportunities for finding out about unpublished trials, and that reporting biases exist such that unpublished studies are associated more often with negative findings. Lee and colleagues examined 909 trials supporting 90 approved drugs in FDA reviews, and found that 43 percent (394 of 909) were published 5 years post-approval and that positive results were associated with publication (Lee et al., 2008).

Rising and colleagues (2008) conducted a study of all efficacy trials found in approved NDAs for new molecular entities from 2001 to 2002 and all published clinical trials corresponding to trials within those NDAs. The authors found that trials in NDAs with favorable primary outcomes were nearly five times more likely to be published than trials with unfavorable primary outcomes. In addition, for those 99 cases in which conclusions were provided in both the NDA and the published paper, in 9 (9 percent) the conclusion was different in the NDA and the publication and all changes favored the test drug. Published papers included more outcomes favoring the test drug than the NDAs. The authors also found that, excluding outcomes with unknown significance, 43 outcomes in the NDAs did not favor the test drug (35 were nonsignificant and 8 favored the comparator). Of these 20 (47 percent) were not included in the published papers and of the 23 that were published 5 changed between the NDA-reported outcome and the published outcome with 4 changed to favor the test drug in the published results.

Turner and his colleagues (2008) examined FDA submissions for 12 antidepressants, and identified 74 clinical trials, of which 31 percent had not been reported. The researchers compared FDA review data of each drug’s effects with the published trial data. They found that the published data suggested that 94 percent of the antidepressant trials were positive. In contrast, the FDA data indicated that only 51 percent of trials were positive. Moreover, when meta-analyses were conducted with and without the FDA data, the researchers found that the published reports overstated the effect size from 11 to 69 percent for the individual drugs. Overall studies judged positive by the FDA were 12 times as likely to be published in a way that agreed with the FDA than studies not judged positive by the FDA.

FDA material can also be useful for detecting selective outcome reporting bias and selective analysis bias. For example, Turner and colleagues (2008) found that the conclusions for 11 of 57 published trials did not agree between the FDA review and the publication. In some cases, the journal publication reported different p values than the FDA report of the same study, reflecting preferential reporting of comparisons or analyses that had statistically significant p values.

The main limitation of the FDA files is that they may remain unavailable for several years after a drug is approved. Data on older drugs within a class are often missing. For example, of the 9 atypical antipsychotic drugs marketed in the United States in 2010, the FDA material is available for 7 of them. FDA reviews are not available for the 2 oldest drugs—clozapine (approved in 1989) and risperidone (approved in 1993) (McDonagh et al., 2010).

Contacting Authors and Study Sponsors for Missing Data

As noted earlier in the chapter, more than half of all trial findings may never be published (Hopewell et al., 2009b; Song et al., 2009). If a published report on a trial is available, key data are often missing. When published reports do not contain the information needed for the SR (e.g., for the assessment of bias, description of study characteristics), the SR team should contact the author to clarify and obtain missing data and to clear up any other uncertainties such as possible duplicate publication (CRD, 2009; Glasziou et al., 2008; Higgins and Deeks, 2008; Relevo and Balshem, 2011). Several studies have documented that collecting some, if not all, data needed for a meta-analysis is feasible by directly contacting the relevant author and Principal Investigators (Devereaux et al., 2004; Kelley et al., 2004; Kirkham et al., 2010; Song et al., 2010). For example, in a study assessing outcome reporting bias in Cochrane SRs, Kirkham and colleagues (2010) e-mailed the authors of the RCTs that were included in the SRs to clarify whether a trial measured the SR’s primary outcome. The researchers were able to obtain missing trial data from more than a third of the authors contacted (39 percent). Of these, 60 percent responded within a day and the remainder within 3 weeks.

Updating Searches

When patients, clinicians, clinical practice guideline (CPG) developers, and others look for SRs to guide their decisions, they hope to find the most current information available. However, in the Rising study described earlier, the researchers found that 23 percent of the efficacy trials submitted to the FDA for new molecular entities from 2001–2002 were still not published 5 years after FDA approval (Rising et al., 2008). Moher and colleagues (2007b) cite a compelling example—treatment of traumatic brain injury (TBI)—of how an updated SR can change beliefs about the risks and benefits of an intervention. Corticosteroids had been used routinely over three decades for TBI when a new clinical trial suggested that patients who had TBI and were treated with corticosteroids were at higher risk of death compared with placebo (CRASH Trial Collaborators, 2004). When Alderson and Roberts incorporated the new trial data in an update of an earlier SR on the topic, findings about mortality risk dramatically reversed—leading to the conclusion that steroids should no longer be routinely used in patients with TBI (Alderson and Roberts, 2005).

Two opportunities are available for updating the search and the SR. The first opportunity for updating is just before the review’s initial publication. Because a meaningful amount of time is likely to have elapsed since the initial search, SRs are at risk of being outdated even before they are finalized (Shojania et al., 2007). Among a cohort of SRs on the effectiveness of drugs, devices, or procedures published between 1995 and 2005 and indexed in the ACP Journal Club14 database, on average more than 1 year (61 weeks) elapsed between the final search and publication and 74 weeks elapsed between the final search and indexing in MEDLINE (when findings are more easily accessible) (Sampson et al., 2008). AHRQ requires Evidence-Based Practice Centers (EPCs) to update SR searches at the time of peer review.15 CRD and the Cochrane Collaboration recommend that the search be updated before the final analysis but do not specify an exact time period (CRD, 2009; Higgins et al., 2008).

The second opportunity for updating is post-publication, and occurs periodically over time, to ensure a review is kept up-to-date. In examining how often reviews need updating, Shojania and colleagues (2007) followed 100 meta-analyses, published between 1995 and 2005 and indexed in the ACP Journal Club, of the comparative effectiveness of drugs, devices, or procedures. Within 5.5 years, half of the reviews had new evidence that would have substantively changed conclusions about effectiveness, and within 2 years nearly 25 percent had such evidence.

Updating also provides an opportunity to identify and incorporate studies with negative findings that may have taken longer to be published than those with positive findings (Hopewell et al., 2009b) and larger scale confirmatory trials that can appear in publications after smaller trials (Song et al., 2010).

According to the Cochrane Handbook, an SR may be out-of-date under the following scenarios:

  • A change is needed in the research question or selection criteria for studies. For example, a new intervention (e.g., a newly marketed drug within a class) or a new outcome of the interventions may have been identified since the last update;
  • New studies are available;
  • Methods are out-of-date; or
  • Factual statements in the introduction and discussion sections of the review are not up-to-date.

Identifying reasons to change the research question and searching for new studies are the initial steps in updating. If the questions are still up-to-date, and searches do not identify relevant new studies, the SR can be considered up-to-date (Moher and Tsertsvadze, 2006). If new studies are identified, then their results must be incorporated into the existing SR.

A typical approach to updating is to consider the need to update the research question and conduct a new literature search every 2 years. Because some reviews become out-of-date sooner than this, several recent investigations have developed and tested strategies to identify SRs that need updating earlier (Barrowman et al., 2003; Garritty et al., 2009; Higgins et al., 2008; Louden et al., 2008; Sutton et al., 2009; Voisin et al., 2008). These strategies use the findings that some fields move faster than others; large studies are more likely to change conclusions than small ones; and both literature scans and consultation with experts can help identify the need for an update. In the best available study of an updating strategy, Shojania and colleagues sought signals that an update would be needed sooner rather than later after publication of an SR (Shojania et al., 2007). Fifty-seven percent of reviews had one or more of these signals for updating. Cardiovascular medicine, heterogeneity in the original review, and publication of a new trial larger than the previous largest trial were associated with shorter survival times, while inclusion of more than 13 studies in the original review was associated with increased time before an update was needed. In 23 cases the signal occurred within 2 years of publication. The median survival of a review without any signal that an update was needed was 5.5 years.


The committee recommends the following standards and elements of performance for identifying the body of evidence for an SR:

Standard 3.1—Conduct a comprehensive systematic search for evidence

Required elements:

3.1.1Work with a librarian or other information specialist trained in performing systematic reviews to plan the search strategy
3.1.2Design the search strategy to address each key research question
3.1.3Use an independent librarian or other information specialist to peer review the search strategy
3.1.4Search bibliographic databases
3.1.5Search citation indexes
3.1.6Search literature cited by eligible studies
3.1.7Update the search at intervals appropriate to the pace of generation of new information for the research question being addressed
3.1.8Search subject-specific databases if other databases are unlikely to provide all relevant evidence
3.1.9Search regional bibliographic databases if other databases are unlikely to provide all relevant evidence

Standard 3.2—Take action to address reporting biases of research results

Required elements:

3.2.1Search grey-literature databases, clinical trial registries, and other sources of unpublished information about studies
3.2.2Invite researchers to clarify information related to study eligibility, study characteristics, and risk of bias
3.2.3Invite all study sponsors to submit unpublished data, including unreported outcomes, for possible inclusion in the systematic review
3.2.4Handsearch selected journals and conference abstracts
3.2.5Conduct a web search
3.2.6Search for studies reported in languages other than English if appropriate


In summary, little evidence directly addresses the influence of each search step on the final outcome of the SR (Tricco et al., 2008). Moreover, the SR team cannot judge in advance whether reporting bias will be a threat to any given review. However, evidence shows the risks of conducting a nonsystematic, incomplete search. Relying solely on mainstream databases and published reports may misinform clinical decisions. Thus, the search should include sources of unpublished data, including grey-literature databases, trial registries, and FDA submissions such as NDAs.

The search to identify a body of evidence on comparative effectiveness must be systematic, prespecified, and include an array of information sources that can provide both published and unpublished research data. The essence of CER and patient-centered health care is an accurate and fair accounting of the evidence in the research literature on the effectiveness and potential benefits and harms of health care interventions (IOM, 2008, 2009). Informed health care decision making by consumers, patients, clinicians, and others, demands unbiased and comprehensive information. Developers of clinical practice guidelines cannot produce sound advice without it.

SRs are most useful when they are up-to-date. Assuming a field is active, initial searches should be updated when the SR is final ized for publication, and studies ongoing at the time the review was undertaken should be checked for availability of results. In addition, notations of ongoing trials (e.g., such as those identified by searching trials registries) is important to notify the SR readers when new information can be expected in the future.

Some of the expert search methods that the committee endorses are resource intensive and time consuming. The committee is not suggesting an exhaustive search using all possible methods and all available sources of unpublished studies and grey literature. For each SR, the researcher must determine how best to identify a comprehensive and unbiased set of the relevant studies that might be included in the review. The review team should consider what information sources are appropriate given the topic of the review and review those sources. Conference abstracts and proceedings will rarely provide useful unpublished data but they may alert the reviewer to otherwise unpublished trials. In the case of drug studies, FDA reviews and trial registries are likely sources of unpublished data that, when included, may change an SR’s outcomes and conclusions from a review relying only on published data. Searches of these sources and requests to manufacturers should always be conducted. With the growing body of SRs being performed on behalf of state and federal agencies, those reviews should also be considered as a potential source of otherwise unpublished data and a search for such reports is also warranted. The increased burden on reviewers, particularly with regard to the inclusion of FDA reviews, will likely decrease over time as reviewers gain experience in using those sources and in more efficiently and effectively abstracting the relevant data. The protection against potential bias brought about by inclusion of these data sources makes the development of that expertise critical.

The search process is also likely to become less resource intensive as specialized databases of comprehensive article collections used in previous SRs are developed, or automated search and retrieval methods are tested and implemented.


Selecting which studies should be included in the SR is a multistep, labor-intensive process. EPC staff have estimated that the SR search, review of abstracts, and retrieval and review of selected full-text papers takes an average of 332 hours (Cohen et al., 2008). If the search is conducted appropriately, it is likely to yield hundreds—if not thousands—of potential studies (typically in the form of cita tions and abstracts). The next step—the focus of this section of the chapter—is to screen the collected studies to determine which ones are actually relevant to the research question under consideration.

The screening and selection process requires careful, sometimes subjective, judgments and meticulous documentation. Decisions on which studies are relevant to the research question and analytic framework are among the most significant judgments made during the course of an SR. If the study inclusion criteria are too narrow, critical data may be missed. If the inclusion criteria are too broad, irrelevant studies may overburden the process.

The following overview summarizes the available evidence on how to best screen, select, and document this critical phase of an SR. The focus is on unbiased selection of studies, inclusion of observational studies, and documentation of the process. The committee’s related standards are presented at the end of the section.

See Table 3-3 for steps recommended by AHRQ, CRD, and the Cochrane Collaboration for screening publications and extracting data from eligible studies. Appendix E provides additional details.

TABLE 3-3. Expert Suggestions for Screening Publications and Extracting Data from Eligible Studies.


Expert Suggestions for Screening Publications and Extracting Data from Eligible Studies.

Ensuring an Unbiased Selection of Studies

Use Prespecified Inclusion and Exclusion Criteria

Using prespecified inclusion and exclusion criteria to choose studies is the best way to minimize the risk of researcher biases influencing the ultimate results of the SR (CRD, 2009; Higgins and Deeks, 2008; Liberati et al., 2009; Silagy et al., 2002). The SR research protocol should make explicit which studies to include or exclude based on the patient population and patient outcomes of interest, the healthcare intervention and comparators, clinical settings (if relevant), and study designs (e.g., randomized vs. observational research) that are appropriate for the research question. Only studies that meet all of the criteria and none of the exclusion criteria should be included in the SR. Box 3-5 provides an example of selection criteria from a recent EPC research protocol for an SR of therapies for children with an autism spectrum disorder.

Box Icon

BOX 3-5

Study Selection Criteria for a Systematic Review of Therapies for Children with Autism Spectrum Disorders (ASD). Review questions: Among children ages 2–12 with ASD, what are the short- and long-term effects of available behavioral, educational, (more...)

Although little empirical evidence informs the development of the screening criteria, numerous studies have shown that, too often, SRs allow excessive subjectivity into the screening process (Cooper et al., 2006; Delaney et al., 2007; Dixon et al., 2005; Edwards et al., 2002; Linde and Willich, 2003; Lundh et al., 2009; Mrkobrada et al., 2008; Peinemann et al., 2008; Thompson et al., 2008). Mrkobrada and colleagues, for example, assessed the quality of all the nephrology-related SRs published in 2005 (Mrkobrada et al., 2008). Of the 90 SRs, 51 did not report efforts to minimize bias during the selection process, such as using prespecified inclusion criteria and having more than one person select eligible studies. An assessment of critical care meta-analyses published between 1994 and 2003 yielded similar findings. Delaney and colleagues (2007) examined 139 meta-analyses related to critical care medicine in journals or the Cochrane Database of Systematic Reviews. They found that a substantial proportion of the papers did not address potential biases in the selection of studies; 14 of the 36 Cochrane reviews (39 percent) and 69 of the 92 journal articles (75 percent).

Reviewing the full-text papers for all citations identified in the original search is time consuming and expensive. Expert guidance recommends that a two-stage approach to screening citations for inclusion in an SR is acceptable in minimizing bias or producing quality work (CRD, 2009; Higgins and Deeks, 2008). The first step is to screen the titles and abstracts against the inclusion criteria. The second step is to screen the full-text papers passing the first screen. Selecting studies based solely on the titles and abstracts requires judgment and experience with the literature (Cooper et al., 2006; Dixon et al., 2005; Liberati et al., 2009).

Minimize Subjectivity

Even when the selection criteria are prespecified and explicit, decisions on including particular studies can be subjective. AHRQ, CRD, and the Cochrane Collaboration recommend that more than one individual independently screens and selects studies in order to minimize bias and human error and to help ensure that the selection process is reproducible (Table 3-3) (CRD, 2009; Higgins and Deeks, 2008; Khan, 2001; Relevo and Balshem, 2011). Although doubling the number of screeners is costly, the committee agrees that the additional expense is justified because of the extent of errors and bias that occur when only one individual does the screening. Without two screeners, SRs may miss relevant data that might affect conclusions about the effectiveness of an intervention. Edwards and colleagues (2002), for example, found that using two reviewers may reduce the likelihood that relevant studies are discarded. The researchers increased the number of eligible trials by up to 32 percent (depending on the reviewer).

Experience, screener training, and pilot-testing of screening criteria are key to an accurate search and selection process. The Cochrane Collaboration recommends that screeners be trained by pilot testing the eligibility criteria on a sample of studies and assessing reliability (Higgins and Deeks, 2008), and certain Cochrane groups require that screeners take the Cochrane online training for handsearchers and pass a test on identification of clinical trials before they become involved (Cochrane Collaboration, 2010b).

Use Observational Studies, as Appropriate

In CER, observational studies should be considered complementary to RCTs (Dreyer and Garner, 2009; Perlin and Kupersmith, 2007). Both can provide useful information for decision makers. Observational studies are critical for evaluating the harms of interventions (Chou and Helfand, 2005). RCTs often lack prespecified hypotheses regarding harms; are not adequately powered to detect serious, but uncommon events (Vandenbroucke, 2004); or exclude patients who are more susceptible to adverse events (Rothwell, 2005). Well-conducted, observational evaluations of harms, particularly those based on large registries of patients seen in actual practice, can help to validate estimates of the severity and frequency of adverse events derived from RCTs, identify subgroups of patients at higher or lower susceptibility, and detect important harms not identified in RCTs (Chou et al., 2010).

The proper role of observational studies in evaluating the benefits of interventions is less clear. RCTs are the gold standard for determining efficacy and effectiveness. For this reason they are the preferred starting place for determining intervention effectiveness. Even if they are available, however, trials may not provide data on outcomes that are important to patients, clinicians, and developers of CPGs. When faced with treatment choices, decision makers want to know who is most likely to benefit from a treatment and what the potential tradeoffs are. Some trials are designed to fulfill regulatory requirements (e.g., for FDA approval) rather than to inform everyday treatment decisions and these studies may address narrow patient populations and intervention options. For example, study populations may not represent the population affected by the condition of interest; patients could be younger or not as ill (Norris et al., 2010). As a result, a trial may leave unanswered certain important questions about the treatment’s effects in different clinical settings and for different types of patients (Nallamothu et al., 2008).

Thus, although RCTs are subject to less bias, when the available RCTs do not examine how an intervention works in everyday practice or evaluate patient-important outcomes, observational studies may provide the evidence needed to address the SR team’s questions. Deciding to extend eligibility of study designs to observational studies represents a fundamental challenge because the suitability of observational studies for assessment of effectiveness depends heavily on a number of clinical and contextual factors. The likelihood of selection bias, recall bias, and other biases are so high in certain clinical situations that no observational study could address the question with an acceptable risk of bias (Norris et al., 2010).

An important note is that in CER, observational studies of benefits are intended to complement, rather than substitute for, RCTs. Most literature about observational studies of effectiveness has examined whether observational studies can be relied on to make judgments about effectiveness when there are no high-quality RCTs on the same research question (Concato et al., 2000; Deeks et al., 2003; Shikata et al., 2006). The committee did not find evidence to support a recommendation about substituting observational data in the absence of data from RCTs. Reasonable criteria for relying on observational studies in the absence of RCT data have been proposed (Glasziou et al., 2007), but little empiric data support these criteria.

The decision to include or exclude observational studies in an SR should be justifiable, explicit and well-documented (Atkins, 2007; Chambers et al., 2009; Chou et al., 2010; CRD, 2009; Goldsmith et al., 2007). Once this decision has been made, authors of SRs of CER should search for observational research, such as cohort and casecontrol studies, to supplement RCT findings. Less is known about searching for observational studies than for RCTs (Golder and Loke, 2009; Kuper et al., 2006; Wieland and Dickersin, 2005; Wilczynski et al., 2004). The SR team should work closely with a librarian with training and experience in this area and should consider peer review of the search strategy (Sampson et al., 2009).

Documenting the Screening and Selection Process

SRs rarely document the screening and selection process in a way that would allow anyone to either replicate it or to appraise the appropriateness of the selected studies (Golder et al., 2008; Moher et al., 2007a). In light of the subjective nature of study selection and the large volume of possible citations, the importance of maintaining a detailed account of study selection cannot be understated. Yet, years after reporting guidelines have been disseminated and updated, documentation remains inadequate in most published SRs (Liberati et al., 2009).

Clearly, the search, screening, and selection process is complex and highly technical. The effort required in keeping track of citations, search strategies, full-text articles, and study data is daunting. Experts recommend using reference management software, such as EndNote, RefWorks, or RevMan, to document the process and keep track of the decisions that are made for each article (Cochrane IMS, 2010; CRD, 2009; Elamin et al., 2009; Hernandez et al., 2008; Lefebvre et al., 2008; RefWorks, 2009; Relevo and Balshem, 2011; Thomson Reuters, 2010). Documentation should occur in real time—not retrospectively, but as the search, screening, and selection are carried out. This will help ensure accurate recordkeeping and adherence to protocol.

The SR final report should include a flow chart that shows the number of studies that remain after each stage of the selection process.16 Figure 3-1 provides an example of an annotated flow chart. The flow chart documents the number of records identified through electronic databases searched, whether additional records were identified through other sources, and the reasons for excluding articles. Maintaining a record of excluded as well as selected articles is important.

FIGURE 3-1. Example of a flow chart.


Example of a flow chart. SOURCE: Gillen et al. (2010).


The committee recommends the following standards for screening and selecting studies for an SR:

Standard 3.3—Screen and select studies

Required elements:

3.3.1Include or exclude studies based on the protocol’s prespecified criteria
3.3.2Use observational studies in addition to randomized clinical trials to evaluate harms of interventions
3.3.3Use two or more members of the review team, working independently, to screen and select studies
3.3.4Train screeners using written documentation; test and retest screeners to improve accuracy and consistency
3.3.5Use one of two strategies to select studies: (1) read all full-text articles identified in the search or (2) screen titles and abstracts of all articles and then read the full-text of articles identified in initial screening
3.3.6Taking account of the risk of bias, consider using observational studies to address gaps in the evidence from randomized clinical trials on the benefits of interventions

Standard 3.4—Document the search

Required elements:

3.4.1Provide a line-by-line description of the search strategy, including the date of search for each database, web browser, etc.
3.4.2Document the disposition of each report identified including reasons for their exclusion if appropriate


The primary purpose of CER is to generate reliable, scientific information to guide the real-world choices of patients, clinicians, developers of clinical practice guidelines, and others. The committee recommends the above standards and performance elements to address the pervasive problems of bias, errors, and inadequate documentation of the study selection process in SRs. While the evidence base for these standards is sparse, these common-sense standards draw from the expert guidance of AHRQ, CRD, and the Cochrane Collaboration. The recommended performance elements will help ensure scientific rigor and promote transparency—key committee criteria for judging possible SR standards.

The potential for bias to enter the selection process is significant and well documented. SR experts recommend a number of techniques and information sources that can help protect against an incomplete and biased collection of evidence. For example, the selection of studies to include in an SR should be prespecified in the research protocol. The research team must balance the imperative for a thorough search with constraints on time and resources. However, using only one screener does not sufficiently protect against a biased selection of studies. Experts agree that using two screeners can reduce error and subjectivity. Although the associated cost may be substantial, and representatives of several SR organizations did tell the committee and IOM staff that dual screening is too costly, the committee concludes that SRs may not be reliable without two screeners. A two-step process will save the time and expense of obtaining full-text articles until after initial screening of citations and abstracts.

Observational studies are important inputs for SRs of comparative effectiveness. The plan for using observational research should be clearly outlined in the protocol along with other selection criteria. Many CER questions cannot be fully answered without observational data on the potential harms, benefits, and long-term effects. In many instances, trial findings are not generalizable to individual patients. Neither experimental nor observational research should be used in an SR without strict methodological scrutiny.

Finally, detailed documentation of methods is essential to scientific inquiry. It is imperative in SRs. Study methods should be reported in sufficient detail so that searches can be replicated and appraised.


Many but not all SRs on the comparative effectiveness of health interventions include a quantitative synthesis (meta-analysis) of the findings of RCTs. Whether or not a quantitative or qualitative synthesis is planned, the assessment of what is known about an intervention’s effectiveness should begin with a clear and systematic description of the included studies (CRD, 2009; Deeks et al., 2008). This requires extracting both qualitative and quantitative data from each study, then summarizing the details on each study’s methods, participants, setting, context, interventions, outcomes, results, publications, and investigators. Data extraction refers to the process that researchers use to collect and transcribe the data from each individual study. Which data are extracted depends on the research question, types of data that are available, and whether meta-analysis is appropriate.17 Box 3-6 lists the types of data that are often collected.

Box Icon

BOX 3-6

Types of Data Extracted from Individual Studies. General Information Researcher performing data extraction

The first part of this chapter focused on key methodological judgments regarding the search for and selection of all relevant high-quality evidence pertinent to a research question. Data collection is just as integral to ensuring an accurate and fair accounting of what is known about the effectiveness of a health care intervention. Quality assurance and control are especially important because of the substantial potential for errors in data handling (Gøtzsche et al., 2007). The following section focuses on how standards can help minimize common mistakes during data extraction and concludes with the committee’s recommended standard and performance elements for managing data collection.

Preventing Errors

Data extraction errors are common and have been documented in numerous studies (Buscemi et al., 2006; Gøtzsche et al., 2007; Horton et al., 2010; Jones et al., 2005; Tramer et al., 1997). Gøtzsche and colleagues, for example, examined 27 meta-analyses published in 2004 on a variety of topics, including the effectiveness of acetaminophen for pain in patients with osteoarthritis, antidepressants for mood in trials with active placebos, physical and chemical methods to reduce asthma symptoms from house dust-mite allergens, and inhaled corticosteroids for asthma symptoms (Gøtzsche et al., 2007). The study focused on identifying the extent of errors in the meta-analyses that used a specific statistical technique (standardized mean difference). The researchers randomly selected two trials from each meta-analysis and extracted outcome data from each related trial report. They found numerous errors and were unable to replicate the results of more than a third of the 27 meta-analyses (37 percent). The studies had used the incorrect number of patients in calculations, incorrectly calculated means and standard deviations, and even got the direction of treatment effect wrong. The impact of the mistakes was not trivial; in some cases, correcting errors negated findings of effectiveness and, in other cases, actually reversed the direction of the measured effect.

In another study, Jones and colleagues (2005) found numerous errors in 42 reviews conducted by the Cochrane Cystic Fibrosis and Genetic Disorders Group. The researchers documented data extraction errors in 20 reviews (48 percent), errors in interpretation in 7 reviews (17 percent), and reporting errors in 18 reviews (43 percent). All the data-handling errors changed the summary results but, in contrast with the Gøtzsche study, the errors did not affect the overall conclusions.

Using Two Data Extractors

Data extraction is an understudied process. Little is known about how best to optimize accuracy and efficiency. One study found that SR experience appears to have little impact on error rates (Horton et al., 2010). In 2006, Horton and colleagues conducted a prospective cross-sectional study to assess whether experience improves accuracy. The researchers assigned data extractors to three different groups based on SR and data extraction experience. The most experienced group had more than 7 years of related experience. The least experienced group had less than 2 years of experience. Surprisingly, error rates were high regardless of experience, ranging from 28.3 percent to 31.2 percent.

The only known effective means of reducing data extraction errors is to have at least two individuals independently extract data (Buscemi et al., 2006). In a pilot study sponsored by AHRQ, Buscemi and colleagues compared the rate of errors that occurred when only one versus two individuals extracted the data from 30 RCTs on the efficacy and safety of melatonin for the management of sleep disorders (Buscemi et al., 2006). When only one reviewer extracted the data, a second reviewer checked the extracted data for accuracy and completeness. The two reviewers resolved discrepancies by mutual consensus. With two reviewers, each individual independently extracted the data, then resolved discrepancies through discussion or in consultation with a third party. Single extraction was faster, but resulted in 21.7 percent more mistakes.

Experts recommend that two data extractors should be used whenever possible (CRD, 2009; Higgins and Deeks, 2008; Van de Voorde and Leonard, 2007). The Cochrane Collaboration advises that more than one person extract data from every study (Higgins and Deeks, 2008). CRD concurs but also suggests that, at a minimum, one individual could extract the data if a second individual independently checks for accuracy and completeness (CRD, 2009).

Addressing Duplicate Publication

Duplicate publication is another form of reporting bias with the potential to distort the findings of an SR. The ICMJE defines redundant (or duplicate) publication as publication of a paper that overlaps substantially with one already published in print or electronic media (ICMJE, 2010). When this occurs, perceptions of the safety and effectiveness of a treatment may be incorrect because it appears that the intervention was tested in more patients than in reality (Tramer et al., 1997). If meta-analyses double count data, the findings obviously will be incorrect.

There have been reports of redundant publication of effectiveness research since at least the 1980s (Arrivé et al., 2008; Bailey, 2002; Bankier et al., 2008; DeAngelis, 2004; Gøtzsche, 1989; Huston and Moher, 1996; Huth, 1986; Mojon-Azzi et al., 2004; Rosenthal et al., 2003; Schein and Paladugu, 2001). Tramer and colleagues, for example, searched for published findings of trials on the effectiveness of the antinausea drug ondansetron to determine the extent of redundant publications (Tramer et al., 1997). The researchers found that the most commonly duplicated RCT reports were those papers that showed the greatest benefit from ondansetron. Twenty-eight percent of patient data were duplicated. As a result, the drug’s effectiveness as an antiemetic was overestimated by 23 percent. Gøtszche and colleagues reached similar conclusions in a study of controlled trials on the use of NSAIDs for rheumatoid arthritis (Gøtzsche, 1989).

Linking publications from the same study Detecting multiple publications of the same data is difficult particularly when the data are published in different places or at different times without proper attribution to previous or simultaneous publications (Song et al., 2010). The Cochrane Collaboration recommends electronically linking citations from the same studies so that they are not treated as separate studies and that data from each study are included only once in the SR analyses.

Data Extraction Forms

Data extraction forms are common-sense tools for collecting and documenting the data that will be used in the SR analysis. Numerous formats have been developed, but there is no evidence to support any particular form. Elamin and colleagues (2009) surveyed expert systematic reviewers to describe their experiences with various data extraction tools including paper and pencil formats, spreadsheets, web-based surveys, electronic databases, and special web-based software. The respondents did not appear to favor one type of form over another, and the researchers concluded that no one tool is appropriate for all SRs. AHRQ, CRD, and the Cochrane Collaboration all recommend that the form be pilot-tested to help ensure that the appropriate data are collected (Table 3-3).


The committee recommends the following standard to promote accurate and reliable data extraction:

Standard 3.5—Manage data collection

Required elements:

3.5.1At a minimum, use two or more researchers, working independently, to extract quantitative and other critical data from each study. For other types of data, one individual could extract the data while the second individual independently checks for accuracy and completeness. Establish a fair procedure for resolving discrepancies; do not simply give final decision-making power to the senior reviewer
3.5.2Link publications from the same study to avoid including data from the same study more than once
3.5.3Use standard data extraction forms developed for the specific systematic review
3.5.4Pilot-test the data extraction forms and process


Quality assurance (e.g., double data extraction) and quality control (e.g., asking a third person to check the primary outcome data entered into the data system) are essential when data are extracted from individual studies from the collected body of evidence. Neither peer reviewers of the SR draft report nor journal editors can detect these kinds of errors. The committee recommends the above perfor mance elements to maximize the scientific rigor of the SR. Consumers, patients, clinicians, and clinical practice guideline developers should not have to question the credibility or accuracy of SRs on the effectiveness of healthcare interventions. Using two researchers to extract data may be costly, but currently, there is no alternative way to ensure that the correct data are used in the synthesis of the collected body of evidence. The committee also recommends that the review team should use a standard data extraction form to help minimize data entry errors. The particular circumstances of the SR—such as the complexity or unique data needs of the project—should guide the selection of the form.


If an SR is to be based on the best available evidence on the comparative effectiveness of interventions, it should include a systematic, critical assessment of the individual eligible studies. The SR should assess the strengths and limitations of the evidence so that decision makers can judge whether the data and results of the included studies are valid. Yet, an extensive literature documents that SRs—across a wide range of clinical specialties—often either fail to appraise or fail to report the appraisal of the individual studies included in the review (Delaney et al., 2007; Dixon et al., 2005; Lundh et al., 2009; Moher et al., 2007a; Moja et al., 2005; Mrkobrada et al., 2008; Roundtree et al., 2008), This includes SRs in general surgery (Dixon et al., 2005), critical care (Delaney et al., 2007), nephrology (Mrkobrada et al., 2008), pediatric oncology (Lundh et al., 2009), and rheumatology (Roundtree et al., 2008).

Methodological studies have demonstrated that problems in the design, conduct, and analysis of clinical studies lead to biased findings. Table 3-4 describes types of bias and some of the measures clinical researchers use to avoid them. The systematic reviewer examines whether the study incorporates these measures to protect against these biases and whether or not the measures were effective. For example, in considering selection bias, the reviewer would note whether the study uses random assignment of participants to treatments and concealment of allocation,18 because studies that employ these measures are less susceptible to selection bias than those that do not. The reviewer would also note whether there were baseline differences in the assembled groups, because the presence of such differences may indicate that potential flaws in the study design indeed resulted in observable bias.

TABLE 3-4. Types of Bias in Individual Studies.


Types of Bias in Individual Studies.

This section of the chapter describes the concepts and related issues that are fundamental to assessing the individual studies in an SR. The committee’s related standards are presented at the end of the section.

Key Concepts

Internal Validity

An internally valid study is conducted in a manner that minimizes bias so that the results are likely due to a real effect of the intervention being tested. By examining features of each study’s design and conduct, systematic reviewers arrive at a judgment about the level of confidence one may place in each study, that is, the extent to which the study results can be believed. Assessing internal validity is concerned primarily (but not exclusively) with an examination of the risk of bias. When there are no or few flaws in the design, conduct, and reporting of a study, the results are more likely to be a true indicator of the effects of the compared treatments. When serious flaws are present, the results of a study are likely to be due to biases, rather than to real differences in the treatments that are compared.


The need to consider features of a study that might affect its relevance to decision makers is a key principle of CER. SRs use the “applicability,” “relevance,” “directness,” or “external validity” to capture this idea (Rothwell, 1995, 2005). In the context of SRs of CER, “applicability” has been defined as “the extent to which the effects observed in published studies are likely to reflect the expected results when a specific intervention is applied to the population of interest under ‘real-world’ conditions” (Atkins et al., 2010).

Because applicability is not an inherent characteristic of a study, it is not possible to devise a uniform system for assessing applicability of individual studies (Jüni et al., 2001). However, an SR can describe study characteristics that are likely to affect applicability. In the initial steps in the SR process, by consulting users and stakeholders, the review team should seek to understand the situations to which the findings of the review will be applied (see Chapter 2, Standards 2.3–2.5). The review team should then decide whether to incorporate relevance into the design of the inclusion criteria and into the protocol for extracting data from included studies.

For a particular review, the review team should develop a priori hypotheses about characteristics that are likely to be important and plan to include them when extracting data from studies (Green and Higgins, 2008). Across clinical topics, some study characteristics are likely to affect users’ perceptions of an individual study’s applicability in practice (Rothwell, 2006). These characteristics can be classified using the PICO(TS)19 framework and should be considered candidates for abstraction in most SRs of effectiveness (Table 3-5). Among RCTs of drug treatments, for example, some characteristics affecting the patients include whether eligibility criteria were narrow or broad, whether there was a run-in period in which some participants were excluded prior to randomization, and what the rates of outcomes were in the control or placebo group.

TABLE 3-5. Characteristics of Individual Studies That May Affect Applicability.


Characteristics of Individual Studies That May Affect Applicability.

Fidelity and Quality of Interventions

Users of SRs often need detailed information about interventions and comparators to judge the relevance and validity of the results. Fidelity and quality refer to two dimensions of carrying out an intervention that should be documented to allow meaningful comparisons between studies.

The fidelity of an intervention refers to the extent to which the intervention has been delivered as planned (CRD, 2009). In the context of an SR, an assessment of fidelity requires a priori identification of these key features and abstraction of how they were implemented in each study. Frameworks to assess fidelity in individual studies exist, although there has been little experience of their use in SRs (Carroll et al., 2007; Glasgow, 2006; Glasgow et al., 1999).

Fidelity is particularly important for complex interventions. A complex intervention is usually defined as one that has multiple components. For example, a program intended to help people lose weight might include counseling about diet and exercise, access to peers, education, community events, and other components (Craig et al., 2008). Many behavioral interventions, as well as interventions in the organization of care, are complex. Individual studies may differ widely in how they implement these components. For example, among specialized clinic programs to reduce complications from anticoagulant therapy, decisions about dosing might be made by pharmacists, nurses, physicians, or a computerized algorithm.

Assessing the quality of the intervention is particularly important in reviews of interventions that require technical skill, such as surgical procedures or physical therapy, and in reviews of evolving technologies, such as new devices. The effectiveness and safety of such interventions may vary, depending on the skill of the practitioners, and may change rapidly as practitioners gain experience with them or as modifications are made to correct problems encountered in development.

Variation in the implementation of key elements or features of a complex intervention can influence their effectiveness. The features of a complex intervention may reflect how it is modified to accommodate different practice settings and patients’ circumstances (Cohen et al., 2008). In these circumstances it can be difficult to distinguish between an ineffective intervention and a failed implementation.

Risk of Bias in Individual Studies

The committee chose the term “risk of bias” to describe the focus of the assessment of individual studies and the term “quality” to describe the focus of the assessment of a body of evidence (the subject of Chapter 4). The risk of bias terminology has been used and evaluated for assessing individual RCTs for more than two decades. A similar tool for observational studies has yet to be developed and validated.

As alternatives to “risk of bias,” many systematic reviewers and organizations that develop practice guidelines use terms such as “study quality,” “methodological quality,” “study limitations,” or “internal validity” to describe the critical appraisal of individual studies. Indeed, reviewers may assign a quality score to a study based on criteria assumed to relate to a study’s internal and sometimes external validity. “Study quality” is a broader concept than risk of bias, however, and might include choice of outcome measures, statistical tests, intervention (i.e., dosing, frequency, and intensity of treatments), and reporting. The term “quality” also encompasses errors attributable to chance (e.g., because of inadequate sample size) or erroneous inference (e.g., incorrect interpretation of the study results) (Lohr and Carey, 1999).

Analysis at the level of a group or body of studies can often verify and quantify the direction and magnitude of bias caused by methodological problems.20 For an individual study, however, one cannot be certain how specific flaws have influenced the estimate of effect; that is, one cannot be certain about the presence, magnitude, and direction of the bias. For this reason, for individual studies, systematic reviewers assess the risk of bias rather than assert that a particular bias is present. A study with a high risk of bias is not credible and may overestimate or underestimate the true effect of the treatment under study. This judgment is based on methodologic research examining the relationship among study characteristics, such as the appropriate use of randomization, allocation concealment, or masking, in relation to estimation of the “true” effect. When an SR has a sufficient number of studies, the authors should attempt to verify and quantify the direction and magnitude of bias caused by methodological problems directly using meta-analysis methods.

In recent years, systematic review teams have moved away from scoring systems to assess the quality of individual studies toward a focus on the components of quality and risk of bias (Jüni, 1999). Quality scoring systems have not been validated. Studies assessed as excellent quality using one scoring method may be subsequently assessed as lower quality using another scoring method (Moher et al., 1996). Moreover, with an emphasis on risk of bias, the SR more appropriately assesses the quality of study design and conduct rather than the quality of reporting.

The committee chose the term “risk of bias” to describe the focus of the assessment of individual studies and the term “quality” to describe the focus of the assessment of a body of evidence (the subject of Chapter 4). The risk of bias terminology has been used and evaluated for assessing individual RCTs for more than two decades. A similar tool for observational studies has yet to be developed and validated.

Risk of Bias in Randomized Controlled Trials

As a general rule, randomized trials, without question, have more protections against bias than observational studies and are less likely to produce biased or misleading results. Even among randomized trials, however, study design features influence the observed results. In the 1980s, for example, Chalmers and colleagues reviewed 145 RCTs of treatments for acute myocardial infarction to assess how blinding treatment assignment affected the results (Chalmers et al., 1981, 1983). Trials that allowed participants to know what treat ment they were assigned had greater treatment effects than studies that masked treatment assignment. The effect of masking was dramatic: Statistically significant differences in case-fatality rates were reported in 24.4 percent of the trials that did not blind participants versus 8.8 percent of the RCTs that masked treatment assignment.

Methodological research conducted in the past 15 years has sought to identify additional features of controlled trials that make them more or less susceptible to bias. This research on the empiric evidence of bias forms the basis of current recommendations for assessing the risk of bias in SRs of RCTs. Much of this research takes the form of meta-epidemiological studies that examine the association of individual study characteristics and estimates of the magnitude of effect among trials included in a set of meta-analyses. In a review published in 1999, Moher and colleagues found strong, consistent empiric evidence of bias for three study design features: allocation concealment, double blinding, and type of randomized trial (Moher et al., 1999). In two separate reviews, allocation concealment and double blinding were shown to be associated with study findings. Pildal and colleagues showed that trials that are inadequately concealed and not double blinded are more likely to show a statistically significant treatment effect (Pildal et al., 2008). Yet Wood and colleagues showed that this effect may be confined to subjective, as opposed to objective, outcome measures and outcomes other than all-cause mortality (Wood et al., 2008).

Since 1999, other trial features, such as stopping early (Montori et al., 2005), handling of missing outcome data (Wood et al., 2004), trial size (Nüesch et al., 2010), and use of intention-to-treat analysis have been evaluated empirically. A study conducted by the Cochrane Back Pain Review Group found empiric evidence of bias for 11 study design features (van Tulder et al., 2009) (Box 3-7).

Box Icon

BOX 3-7

Cochrane Back Pain Group Criteria for Internal Validity of Randomized Trials of Back Pain. Was the method of randomization adequate? Was the treatment allocation concealed?

A recent reanalysis confirmed this finding in Moher and colleagues’ (1998) original dataset (effect sizes were smaller for trials that met the criterion for 10 of the 11 items) and in back pain trials (11 of 11 items), but not in trials included in a sample of EPC reports (Hempell et al., 2011). The influence of certain factors, such as allocation concealment, appears to vary depending on the clinical area (Balk et al., 2002) and the type of outcome measured (Wood et al., 2008).

The implication is that systematic review teams should always assess the details of each study’s design to determine how potential biases associated with the study design may have influenced the observed results, because ignoring the possibility could be hazardous (Light and Pillemer, 1984).

Risk of Bias in Observational Studies

In the 1970s and 1980s, several thorough scientific reviews of medical or educational interventions established that the positive results of uncontrolled or poorly controlled studies did not always hold up in well-controlled studies. The discrepancy was most dramatic when randomized trials were compared with observational studies of the same intervention (Chalmers, 1982; DerSimonian and Laird, 1986; Glass and Smith, 1979; Hoaglin et al., 1982; Miller et al., 1989; Wortman and Yeaton, 1983).

The likelihood and magnitude of bias is often greater in observational studies because they lack randomization and concealment of allocation. Even when feasible, many observational studies fail to use appropriate steps to address the risk of bias, such as publication of a detailed protocol and blinding of outcome assessors. For example, observational studies commonly report the outcomes of patients who choose treatments based on their own preferences and the advice of their provider. However, factors that influence treatment choices can also influence outcomes (e.g., sicker patients may tend to choose more extreme interventions); thus, such studies often fail to meet the goal of initially comparable groups. This type of bias—called selection bias—produces imbalances in factors associated with prognosis and the outcomes of interest. Although a variety of statistical methods can be used to attempt to reduce the impact of selection bias, there is no way that analysis can be used to correct for unknown factors that may be associated with prognosis. Thus, it is generally acknowledged that “adjustment” in the analysis cannot be viewed as a substitute for a study design that minimizes this bias.

While selection bias is a widely recognized concern, observational studies are also particularly subject to detection bias, performance bias, and information biases.

Tools for Assessing Study Design

Tools for assessing study design have been used for over two decades (Atkins et al., 2001; Coles 2008; Cook et al., 1993; Frazier et al., 1987; Gartlehner et al., 2004; Lohr, 1998; Mulrow and Oxman, 1994). Although a large number of instruments or tools can be used to assess the quality of individual studies, they are all based on the principle that, whenever possible, clinical researchers conducting a comparative clinical study should use several strategies to avoid error and bias.

Instruments vary in clinical and methodological scope. For example, the Cochrane risk of bias tool (Box 3-8) pertains to randomized trials, whereas the U.S. Preventive Services Task Force (USPSTF) tool includes observational studies as well as randomized trials. Some instruments, such as the one in Box 3-7, are designed to be used in a specific clinical area. This instrument was validated in a set of trials related to back pain treatments (van Tulder et al., 2009).

Box Icon

BOX 3-8

Cochrane Risk of Bias Tool Domains. Sequence generation Allocation concealment

Instruments also differ in whether they are domain based or goal based. The Cochrane Risk of Bias Tool is an example of a domain-based instrument in which the author assesses the risk of bias in each of five domains. Using detailed criteria for making each judgment, the author must answer a specific question for each domain with “Yes” (low risk of bias) or “No” (high risk of bias.) Then, the author must make judgments about which domains are most important in the particular circumstances of the study, taking into account the likely direction and magnitude of the bias and empirical evidence that it is influential in similar studies. For example, in a study of mortality rates for severely ill patients taking different types of medications for heart disease, the investigators might decide that differential loss to follow-up among treatment groups is critical, but lack of blinding of outcome assessors is not likely to be an important cause of bias (Wood et al., 2008).

Like other tools, the Cochrane tool includes an “other” category to take account of biases that arise from aspects of study design, conduct, and reporting in specific circumstances. Examples include carry-over effects in cross-over trials, recruitment bias in cluster-randomized trials, and biases introduced by trials stopped early for benefit (Bassler et al., 2010).

Other instruments are goal based (criteria based). For example, in the USPSTF criteria (Box 3-9), the criterion “initial assembly of groups” refers to the Table 3-4 goal: “At inception, groups being compared [should be] similar in all respects other than the treatment they get.” This criterion is related to the first two domains in the Cochrane Risk of Bias tool (sequence generation and allocation concealment). However, instead of rating the study on these two domains, the review author using the USPSTF tool must integrate information about the method of allocating subjects (sequence generation and allocation concealment) with baseline information about the groups, and consider the magnitude and direction of bias, if any, in order to make a judgment about whether the goal of similar groups at inception of the study was met.

Box Icon

BOX 3-9

USPSTF Criteria for Grading the Internal Validity of Individual Studies (Randomized Controlled Trials [RCTs] and Cohort Studies). Initial assembly of comparable groups For RCTs: Adequate randomization, including concealment and whether potential confounders (more...)

Although the existence and consequences of these biases are widely acknowledged, tools to assess the risk of bias in observational studies of comparative effectiveness are poorly developed (Deeks et al., 2003). There is no agreed-on set of critical elements for a tool and few data on how well they perform when used in the context of an SR (Sanderson et al., 2007). The lack of validated tools is a major limitation for judging how much confidence to put in the results of observational studies, particularly for beneficial effects.


The committee recommends the following standard and elements of performance for assessing individual studies.

Standard 3.6—Critically appraise each study

Required elements:

3.6.1Systematically assess the risk of bias, using predefined criteria
3.6.2Assess the relevance of the study’s populations, interventions, and outcome measures
3.6.3Assess the fidelity of the implementation of interventions


SRs of CER should place a high value on highly applicable, highly reliable evidence about effectiveness (Helfand and Balshem 2010). The standards draw from the expert guidance of AHRQ, CRD, and the Cochrane Collaboration. The recommended performance elements will help ensure scientific rigor and promote transparency—key committee criteria for judging possible SR standards.

Many types of studies can be used to assess the effects of interventions. The first step in assessing the validity of a particular study is to consider its design in relation to appropriateness to the question(s) addressed in the review. Both components of “validity”—applicability and risk of bias—should be examined. For questions about effectiveness, when there are gaps in the evidence from RCTs, reviewers should consider whether observational studies could provide useful information, taking into account that, in many circumstances, observational study designs will not be suitable, either because the risk of bias is very high, or because observational studies that address the populations, comparisons, and outcomes that are not adequately addressed in RCTs are not available.

A well-designed, well-conducted RCT is the most reliable method to compare the effects of different interventions. Validated instruments to assess the risk of bias in RCTs are available. The committee does not recommend a specific tool or set of criteria for assessing risk of bias. Nevertheless, it is essential that at the outset of the SR—during the development of the research protocol—the review team choose and document its planned approach to critically appraising individual studies.21 The appraisal should then follow the prespecified approach. Any deviation from the planned approach should be clearly explained and documented in the final report.


  • AHRQ EHC (Agency for Healthcare Research and Quality, Effective Health Care Program). 2009. Therapies for children with autism spectrum disorders:Research protocol document. http:​//effectivehealthcare​​.cfm/search-for-guides-reviews-and-reports​/?pageaction=displayproduct&productid=366#953 (accessed June 25, 2010).
  • Alderson, P., and I. Roberts. 2005. Corticosteroids for acute traumatic brain injury. Cochrane Database of Systematic Reviews 2005(1):CD000196. [PMC free article: PMC7043302] [PubMed: 15674869]
  • Anderson, G. L., M. Limacher, A. R. Assaf, T. Bassford, S. A. A. Beresford, H. Black, D. Bonds, R. Brunner, R. Brzyski, B. Caan, R. Chlebowski, D. Curb, M. Gass, J. Hays, G. Heiss, S. Hendrix, B. V. Howard, J. Hsia, A. Hubbell, R. Jackson, K. C. Johnson, H. Judd, J. M. Kotchen, L. Kuller, A. Z. LaCroix, D. Lane, R. D. Langer, N. Lasser, C. E. Lewis, J. Manson, K. Margolis, J. Ockene, M. J. O’Sullivan, L. Phillips, R. L. Prentice, C. Ritenbaugh, J. Robbins, J. E. Rossouw, G. Sarto, M. L. Stefanick, L. Van Horn, J. Wactawski-Wende, R. Wallace, S. Wassertheil-Smoller, and the Women’s Health Initiative Steering Committee. 2004. Effects of conjugated, equine estrogen in postmenopausal women with hysterectomy: The Women’s Health Initiative randomized controlled trial. JAMA 291(14):1701–1712. [PubMed: 15082697]
  • APA (American Psychological Association). 2010. PsychINFO. http://www​​/databases/psycinfo/index.aspx (accessed June 1, 2010).
  • Arrivé, L., M. Lewin, P. Dono, L. Monnier-Cholley, C. Hoeffel, and J. M. Tubiana. 2008. Redundant publication in the journal Radiology. Radiology 247(3):836–840. [PubMed: 18403625]
  • Atkins, D. 2007. Creating and synthesizing evidence with decision makers in mind: Integrating evidence from clinical trials and other study designs. Medical Care 45(10 Suppl 2):S16–S22. [PubMed: 17909376]
  • Atkins, D., R. Harris, C. D. Mulrow, H. Nelson, M. Pignone, S. Saha, and H. C. Sox. 2001. Workshops: New recommendations and reviews from the U.S. Preventive Services Task Force. Journal of General Internal Medicine 16(Suppl 1):11–16.
  • Atkins, D., S. Chang, G. Gartlehner, D. I. Buckley, E. P. Whitlock, E. Berliner, and D. Matchar. 2010. Assessing the applicability of studies when comparing medical interventions. In Methods guide for comparative effectiveness reviews, edited by Agency for Healthcare Research and Quality. http://www​.effectivehealthcare​​.cfm/search-for-guides-reviews-and-reports​/?productid=603&pageaction=displayproduct (accessed January 19, 2011).
  • Bailey, B. J. 2002. Duplicate publication in the field of otolaryngology–head and neck surgery. Otolaryngology–Head and Neck Surgery 126(3):211–216. [PubMed: 11956527]
  • Bakkalbasi, N., K. Bauer, J. Glover, and L. Wang. 2006. Three options for citation tracking: Google Scholar, Scopus and Web of Science. Biomedical Digital Libraries 3(1):7. [PMC free article: PMC1533854] [PubMed: 16805916]
  • Balk, E. M., P. A. L. Bonis, H. Moskowitz, C. H. Schmid, J. P. A. Ioannidis, C. Wang, and J. Lau. 2002. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 287(22):2973–2982. [PubMed: 12052127]
  • Bankier, A. A., D. Levine, R. G. Sheiman, M. H. Lev, and H. Y. Kressel. 2008. Redundant publications in Radiology: Shades of gray in a seemingly black-and-white issue. Radiology 247(3):605–607. [PubMed: 18487529]
  • Barrowman, N. J., M. Fang, M. Sampson, and D. Moher. 2003. Identifying null meta-analyses that are ripe for updating. BMC Medical Research Methodology 3(1):13. [PMC free article: PMC212708] [PubMed: 12877755]
  • Bassler, D., M. Briel, V. M. Montori, M. Lane, P. Glasziou, Q. Zhou, D. Heels-Ansdell, S. D. Walter, G. H. Guyatt, Stopit-Study Group, D. N. Flynn, M. B. Elamin, M. H. Murad, N. O. Abu Elnour, J. F. Lampropulos, A. Sood, R. J. Mullan, P. J. Erwin, C. R. Bankhead, R. Perera, C. Ruiz Culebro, J. J. You, S. M. Mulla, J. Kaur, K. A. Nerenberg, H. Schunemann, D. J. Cook, K. Lutz, C. M. Ribic, N. Vale, G. Malaga, E. A. Akl, I. Ferreira-Gonzalez, P. Alonso-Coello, G. Urrutia, R. Kunz, H. C. Bucher, A. J. Nordmann, H. Raatz, S. A. da Silva, F. Tuche, B. Strahm, B. Djulbegovic, N. K. Adhikari, E. J. Mills, F. Gwadry-Sridhar, H. Kirpalani, H. P. Soares, P. J. Karanicolas, K. E. Burns, P. O. Vandvik, F. Coto-Yglesias, P. P. Chrispim, and T. Ramsay. 2010. Stopping randomized trials early for benefit and estimation of treatment effects: Systematic review and meta-regression analysis. JAMA 303(12):1180–1187. [PubMed: 20332404]
  • Bennett, D. A., and A. Jull. 2003. FDA: Untapped source of unpublished trials. Lancet 361(9367):1402–1403. [PubMed: 12727389]
  • Betran, A., L. Say, M. Gulmezoglu, T. Allen, and L. Hampson. 2005. Effectiveness of different databases in identifying studies for systematic reviews: Experience from the WHO systematic review of maternal morbidity and mortality. BMC Medical Research Methodology 5(1):6. [PMC free article: PMC548692] [PubMed: 15679886]
  • BIREME (Latin American and Caribbean Center on Health Sciences).2010. LILACS database. http://bvsmodelo​.bvsalud​.org/site/lilacs/I/ililacs.htm (accessed June 7, 2010).
  • Boissel, J. P. 1993. International Collaborative Group on Clinical Trial Registries: Position paper and consensus recommendations on clinical trial registries. Clinical Trials and Meta-analysis 28(4–5):255–266. [PubMed: 10146333]
  • Booth, A. 2006. “Brimful of STARLITE”: Toward standards for reporting literature searches. Journal of the Medical Library Association 94(4):421–429. [PMC free article: PMC1629442] [PubMed: 17082834]
  • Bravata, D. M., K. McDonald, A. Gienger, V. Sundaram, D. K. Owens, and M. A. Hlatky. 2007. Comparative effectiveness of percutaneous coronary interventions and coronary artery bypass grafting for coronary artery disease. Journal of General Internal Medicine 22(Suppl 1):47. [PubMed: 20704052]
  • Buscemi, N., L. Hartling, B. Vandermeer, L. Tjosvold, and T. P. Klassen. 2006. Single data extraction generated more errors than double data extraction in systematic reviews. Journal of Clinical Epidemiology 59(7):697–703. [PubMed: 16765272]
  • Campbell Collaboration. 2000. About the C2-SPECTR Database. http://geb9101​ (accessed June 1, 2010).
  • Carroll, C., M. Patterson, S. Wood, A. Booth, J. Rick, and S. Balain. 2007. A conceptual framework for implementation fidelity. Implementation Science 2(1):Article No. 40. [PMC free article: PMC2213686] [PubMed: 18053122]
  • Chalmers, T. C. 1982. The randomized controlled trial as a basis for therapeutic decisions. In The randomized clinical trial and therapeutic decisions, edited by J. M. Lachin, editor; , N. Tygstrup, editor; , and E. Juhl, editor. . New York: Marcel Dekker.
  • Chalmers, T. C., H. Smith, B. Blackburn, B. Silverman, B. Schroeder, D. Reitman, and A. Ambroz. 1981. A method for assessing the quality of a randomized control trial. Controlled Clinical Trials 2(1):31–49. [PubMed: 7261638]
  • Chalmers, T. C., P. Celano, H. S. Sacks, and H. Smith, Jr. 1983. Bias in treatment assignment in controlled clinical trials. New England Journal of Medicine 309(22):1358–1361. [PubMed: 6633598]
  • Chambers, D., M. Rodgers, and N. Woolacott. 2009. Not only randomized controlled trials, but also case series should be considered in systematic reviews of rapidly developing technologies. Journal of Clinical Epidemiology 62(12):1253–1260. [PubMed: 19349144]
  • Chan, A. W., and D. G. Altman. 2005. Identifying outcome reporting bias in randomised trials on PubMed: Review of publications and survey of authors. BMJ 330(7494):753. [PMC free article: PMC555875] [PubMed: 15681569]
  • Chan, A., K. Krleza-Jeric, I. Schmid, and D. G. Altman. 2004. a. Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. Canadian Medical Association Journal 171(7):735–740. [PMC free article: PMC517858] [PubMed: 15451835]
  • Chan, A. W., A. Hrobjartsson, M. T. Haahr, P. C. Gøtzsche, and D. G. Altman. 2004. b. Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles. JAMA 291(20):2457–2465. [PubMed: 15161896]
  • Chapman, A. L., L. C. Morgan, and G. Gartlehner. 2010. Semi-automating the manual literature search for systematic reviews increases efficiency. Health Information and Libraries Journal 27(1):22–27. [PubMed: 20402801]
  • Chou, R., and M. Helfand. 2005. Challenges in systematic reviews that assess treatment harms. Annals of Internal Medicine 142(12):1090–1099. [PubMed: 15968034]
  • Chou, R., N. Aronson, D. Atkins, A. S. Ismaila, P. Santaguida, D. H. Smith, E. Whitlock, T. J. Wilt, and D. Moher. 2010. AHRQ series paper 4: Assessing harms when comparing medical interventions: AHRQ and the Effective Health Care Program. Journal of Clinical Epidemiology 63(5):502–512. [PubMed: 18823754]
  • Cochrane Collaboration. 2010. a. Cochrane Central Register of Controlled Trials. http:​//onlinelibrary​​/cochrane_clcentral_articles_fs.html (accessed June 7, 2010).
  • Cochrane Collaboration. 2010. b. Cochrane training. http://www​ (accessed January 29, 2011).
  • Cochrane IMS. 2010. About RevMan 5. http://ims​ (accessed November 11, 2010).
  • Cohen, D. J., B. F. Crabtree, R. S. Etz, B. A. Balasubramanian, K. E. Donahue, L. C. Leviton, E. C. Clark, N. F. Isaacson, K. C. Stange, and L. W. Green. 2008. Fidelity versus flexibility: Translating evidence-based research into practice. American Journal of Preventive Medicine 35(5, Supplement 1):S381–S389. [PubMed: 18929985]
  • Coles, B. 2008. Cochrane information retrieval methods group. About the Cochrane Collaboration (methods groups) 3: Article No. CE000145.
  • Concato, J., N. Shah, and R. I. Horwitz. 2000. Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine 342(25):1887–1892. [PMC free article: PMC1557642] [PubMed: 10861325]
  • Cook, D. J., G. H. Guyatt, G. Ryan, J. Clifton, L. Buckingham, A. Willan, W. McIlroy, and A. D. Oxman. 1993. Should unpublished data be included in meta-analyses? Current convictions and controversies. JAMA 269(21):2749–2753. [PubMed: 8492400]
  • Cooper, M., W. Ungar, and S. Zlotkin. 2006. An assessment of inter-rater agreement of the literature filtering process in the development of evidence-based dietary guidelines. Public Health Nutrition 9(4):494–500. [PubMed: 16870022]
  • Craig, P., P. Dieppe, S. Macintyre, S. Michie, I. Nazareth, and M. Petticrew. 2008. Developing and evaluating complex interventions: The new Medical Research Council guidance. BMJ 337(7676):979–983. [PMC free article: PMC2769032] [PubMed: 18824488]
  • CRASH Trial Collaborators. 2004. Effect of intravenous corticosteroids on death within 14 days in 10,008 adults with clinically significant head injury (MRC CRASH trial): Randomised placebo-controlled trial. Lancet 364(9442):1321–1328. [PubMed: 15474134]
  • CRD (Centre for Reviews and Dissemination). 2009. Systematic reviews: CRD’s guidance for undertaking reviews in health care. York, UK: York Publishing Services, Ltd.
  • CRD. 2010. Database of Abstracts of Reviews of Effects (DARE). http://www​​.uk/crdweb/html/help.htm (accessed May 28, 2010).
  • Crumley, E. T., N. Wiebe, K. Cramer, T. P. Klassen, and L. Hartling. 2005. Which resources should be used to identify RCT/CCTs for systematic reviews: A systematic review. BMC Medical Research Methodology 5:24. [PMC free article: PMC1232852] [PubMed: 16092960]
  • Cummings, S. R., D. M. Black, D. E. Thompson, W. B. Applegate, E. Barrett-Connor, T. A. Musliner, L. Palermo, R. Prineas, S. M. Rubin, J. C. Scott, T. Vogt, R. Wallace, A. J. Yates, and A. Z. LaCroix. 1998. Effect of alendronate on risk of fracture in women with low bone density but without vertebral fractures: Results from the fracture intervention trial. JAMA 280(24):2077–2082. [PubMed: 9875874]
  • DeAngelis, C. D., J. M. Drazen, F. A. Frizelle, C. Haug, J. Hoey, R. Horton, S. Kotzin, C. Laine, A. Marusic, A. J. Overbeke, T. V. Schroeder, H. C. Sox, and M. B. Van der Weyden. 2004. Clinical trial registration: A statement from the International Committee of Medical Journal Editors. JAMA 292(11):1363–1364. [PubMed: 15355936]
  • Deeks, J. J., J. Dinnes, R. D’Amico, A. J. Sowden, C. Sakarovitch, F. Song, M. Petticrew, and D. G. Altman. 2003. Evaluating non-randomised intervention studies. Health Technology Assessment 7(27):1–173. [PubMed: 14499048]
  • Deeks, J., editor; , J. Higgins, editor; , and D. Altman, editor; , eds. 2008. Analysing data and undertaking meta-analyses. In Cochrane Handbook for Systematic Reviews of Interventions, edited by J. P. T. Higgins, editor; and S. Green, editor. , Chichester, UK: John Wiley & Sons.
  • Delaney, A., S. M. Bagshaw, A. Ferland, K. Laupland, B. Manns, and C. Doig. 2007. The quality of reports of critical care meta-analyses in the Cochrane Database of Systematic Reviews: An independent appraisal. Critical Care Medicine 35(2):589–594. [PubMed: 17205029]
  • DerSimonian, R., and N. Laird. 1986. Meta-analysis in clinical trials. Controlled Clinical Trials 7(3):177–188. [PubMed: 3802833]
  • Detke, M. J., C. G. Wiltse, C. H. Mallinckrodt, R. K. McNamara, M. A. Demitrack, and I. Bitter. 2004. Duloxetine in the acute and long-term treatment of major depressive disorder: A placebo- and paroxetine-controlled trial. European Neuropsychopharmacology 14(6):457–470. [PubMed: 15589385]
  • Devereaux, P. J., P. T. L. Choi, S. El-Dika, M. Bhandari, V. M. Montori, H. J. Schünemann, A. Garg, J. W. Busse, D. Heels-Ansdell, W. A. Ghali, B. J. Manns, and G. H. Guyatt. 2004. An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. Journal of Clinical Epidemiology 57(12):1232–1236. [PubMed: 15617948]
  • Dhruva, S. S., and R. F. Redberg. 2008. Variations between clinical trial participants and Medicare beneficiaries in evidence used for Medicare national coverage decisions. Archives of Internal Medicine 168(2):136–140. [PubMed: 18227358]
  • Dickersin, K. 1988. Report from the panel on the case for registers of clinical trials at the eighth annual meeting of the Society for Clinical Trials. Controlled Clinical Trials 9(1):76–80. [PubMed: 3356154]
  • Dickersin, K. 1990. The existence of publication bias and risk factors for its occurrence. JAMA 263(10):1385–1389. [PubMed: 2406472]
  • Dickersin, K., and I. Chalmers. 2010. Recognising, investigating and dealing with incomplete and biased reporting of clinical research: From Francis Bacon to the World Health Organisation. http://www​ (accessed June 11, 2010). [PMC free article: PMC3241511] [PubMed: 22179297]
  • Dickersin, K., and Y. I. Min. 1993. NIH clinical trials and publication bias. Online Journal of Current Clinical Trials April 28:Document no. 50. [PubMed: 8306005]
  • Dickersin, K., and D. Rennie. 2003. Registering clinical trials. JAMA 290(4):516–523. [PubMed: 12876095]
  • Dickersin, K., P. Hewitt, L. Mutch, I. Chalmers, and T. C. Chalmers. 1985. Perusing the literature: Comparison of MEDLINE searching with a perinatal trials database. Controlled Clinical Trials 6(4):306–317. [PubMed: 3907973]
  • Dickersin, K., Y. I. Min, and C. L. Meinert. 1992. Factors influencing publication of research results: Follow-up of applications submitted to two institutional review boards. JAMA 267(3):374–378. [PubMed: 1727960]
  • Dickersin, K., R. Scherer, and C. Lefebvre. 1994. Identifying relevant studies for systematic reviews. BMJ 309(6964):1286–1291. [PMC free article: PMC2541778] [PubMed: 7718048]
  • Dickersin, K., E. Manheimer, S. Wieland, K. A. Robinson, C. Lefebvre, S. McDonald, and the Central Development Group. 2002. a. Development of the Cochrane Collaboration’s Central Register of Controlled Clinical Trials. Evaluation and the Health Professions 25(1):38–64. [PubMed: 11868444]
  • Dickersin, K., C. M. Olson, D. Rennie, D. Cook, A. Flanagin, Q. Zhu, J. Reiling, and B. Pace. 2002. b. Association between time interval to publication and statistical significance. JAMA 287(21):2829–2831. [PubMed: 12038925]
  • Dickersin, K., E. Ssemanda, C. Mansell, and D. Rennie. 2007. What do the JAMA editors say when they discuss manuscripts that they are considering for publication? Developing a schema for classifying the content of editorial discussion. BMC Medical Research Methodology 7: Article no. 44. [PMC free article: PMC2121101] [PubMed: 17894854]
  • Dixon, E., M. Hameed, F. Sutherland, D. J. Cook, and C. Doig. 2005. Evaluating meta-analyses in the general surgical literature: A critical appraisal. Annals of Surgery 241(3):450–459. [PMC free article: PMC1356983] [PubMed: 15729067]
  • Dreyer, N. A., and S. Garner. 2009. Registries for robust evidence. JAMA 302(7):790–791. [PubMed: 19690313]
  • Dwan, K., D. G. Altman, J. A. Arnaiz, J. Bloom, A. Chan, E. Cronin, E. Decullier, P. J. Easterbrook, E. Von Elm, C. Gamble, D. Ghersi, J. P. A. Ioannidis, J. Simes, and P. R. Williamson. 2008. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE 3(8):e3081. [PMC free article: PMC2518111] [PubMed: 18769481]
  • EBSCO Publishing. 2010. The CINAHL database. http://www​.ebscohost​.com/thisTopic.php?marketID​=1&topicID=53 (accessed June 1, 2010).
  • Edwards, P., M. Clarke, C. DiGuiseppi, S. Pratap, I. Roberts, and R. Wentz. 2002. Identification of randomized controlled trials in systematic reviews: Accuracy and reliability of screening records. Statistics in Medicine 21(11):1635–1640. [PubMed: 12111924]
  • Egger, M., and T. Zellweger-Zahner. 1997. Language bias in randomised controlled trials published in English and German. Lancet 350(9074):326. [PubMed: 9251637]
  • Egger, M., P. Jüni, C. Bartlett, F. Holenstein, and J. Sterne. 2003. How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study. Health Technology Assessment 7(1):1–76. [PubMed: 12583822]
  • Elamin, M. B., D. N. Flynn, D. Bassler, M. Briel, P. Alonso-Coello, P. J. Karanicolas, G. H. Guyatt, G. Malaga, T. A. Furukawa, R. Kunz, H. Schünemann, M. H. Murad, C. Barbui, A. Cipriani, and V. M. Montori. 2009. Choice of data extraction tools for systematic reviews depends on resources and review complexity. Journal of Clinical Epidemiology 62(5):506–510. [PubMed: 19348977]
  • Embase. 2010. What is Embase? http://www​.info.embase​.com/what-is-embase (accessed May 28, 2010).
  • Ewart, R., H. Lausen, and N. Millian. 2009. Undisclosed changes in outcomes in randomized controlled trials: An observational study. Annals of Family Medicine 7(6):542–546. [PMC free article: PMC2775624] [PubMed: 19901314]
  • Eyding, D., M. Lelgemann, U. Grouven, M. Härter, M. Kromp, T. Kaiser, M. F. Kerekes, M. Gerken, and B. Wieseler. 2010. Reboxetine for acute treatment of major depression: Systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. BMJ 341:c4737. [PMC free article: PMC2954275] [PubMed: 20940209]
  • Falagas, M. E., E. I. Pitsouni, G. A. Malietzis, and G. Pappas. 2008. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses. Journal of the Federation of American Societies for Experimental Biology 22(2):338–342. [PubMed: 17884971]
  • Ferreira-Gonzalez, I., J. W. Busse, D. Heels-Ansdell, V. M. Montori, E. A. Akl, D. M. Bryant, J. Alonso, R. Jaeschke, H. J. Schunemann, G. Permanyer-Miralda, A. Domingo-Salvany, and G. H. Guyatt. 2007. Problems with use of composite end points in cardiovascular trials: Systematic review of randomised controlled trials. BMJ 334(7597):786–788. [PMC free article: PMC1852019] [PubMed: 17403713]
  • Flemming, K., and M. Briggs. 2007. Electronic searching to locate qualitative research: Evaluation of three strategies. Journal of Advanced Nursing 57(1):95–100. [PubMed: 17184378]
  • Fletcher, C. V. 2007. Translating efficacy into effectiveness in antiretroviral therapy. Drugs 67(14):1969–1979. [PubMed: 17883282]
  • Frazier, L. M., C. D. Mulrow, and L. T. Alexander, Jr. 1987. Need for insulin therapy in type II diabetes mellitus: A randomized trial. Archives of Internal Medicine 147(6):1085–1089. [PubMed: 3296982]
  • Furlan, A. D., E. Irvin, and C. Bombardier. 2006. Limited search strategies were effective in finding relevant nonrandomized studies. Journal of Clinical Epidemiology 59(12):1303–1311. [PubMed: 17098573]
  • Garritty, C., A. C. Tricco, M. Sampson, A. Tsertsvadze, K. Shojania, M. P. Eccles, J. Grimshaw, and D. Moher. 2009. A framework for updating systematic reviews. In Updating systematic reviews: The policies and practices of health care organizations involved in evidence synthesis . Garrity, C. M.Sc. thesis. Toronto, ON: University of Toronto.
  • Gartlehner, G., S. West, K. N. Lohr, L. Kahwati, J. Johnson, R. Harris, L. Whitener, C. Voisin, and S. Sutton. 2004. Assessing the need to update prevention guidelines: A comparison of two methods. International Journal for Quality in Health Care 16(5):399–406. [PubMed: 15375101]
  • Gartlehner, G., R. A. Hansen, P. Thieda, A. M. DeVeaugh-Geiss, B. N. Gaynes, E. E. Krebs, L. J. Lux, L. C. Morgan, J. A. Shumate, L. G. Monroe, and K. N. Lohr. 2007. Comparative effectiveness of second-generation antidepressants in the pharmacologic treatment of adult depression. Rockville, MD: Agency for Healthcare Research and Quality. [PubMed: 20704050]
  • Gillen, S., T. Schuster, C. Meyer zum Büschenfelde, H. Friess, and J. Kleeff. 2010. Preoperative/neoadjuvant therapy in pancreatic cancer: A systematic review and meta-analysis of response and resection percentages. PLoS Med 7(4):e1000267. [PMC free article: PMC2857873] [PubMed: 20422030]
  • Glanville, J. M., C. Lefebvre, J. N. V. Miles, and J. Camosso-Stefinovic. 2006. How to identify randomized controlled trials in MEDLINE: Ten years on. Journal of the Medical Library Association 94(2):130–136. [PMC free article: PMC1435857] [PubMed: 16636704]
  • Glasgow, R. E. 2006. RE-AIMing research for application: Ways to improve evidence for family medicine. Journal of the American Board of Family Medicine 19(1):11–19. [PubMed: 16492000]
  • Glasgow, R. E., T. M. Vogt, and S. M. Boles. 1999. Evaluating the public health impact of health promotion interventions: The RE-AIM framework. American Journal of Public Health 89(9):1322–1327. [PMC free article: PMC1508772] [PubMed: 10474547]
  • Glass, G. V., and M. L. Smith. 1979. Meta-analysis of research on class size and achievement. Educational Evaluation and Policy Analysis 1(1):2–16.
  • Glasziou, P., I. Chalmers, M. Rawlins, and P. McCulloch. 2007. When are randomised trials unnecessary? Picking signal from noise. BMJ 334(7589):349–351. [PMC free article: PMC1800999] [PubMed: 17303884]
  • Glasziou, P., E. Meats, C. Heneghan, and S. Shepperd. 2008. What is missing from descriptions of treatment in trials and reviews? BMJ 336(7659):1472–1474. [PMC free article: PMC2440840] [PubMed: 18583680]
  • Gluud, L. L. 2006. Bias in clinical intervention research. American Journal of Epidemiology 163(6):493–501. [PubMed: 16443796]
  • Golder, S., and Y. K. Loke. 2008. Is there evidence for biased reporting of published adverse effects data in pharmaceutical industry-funded studies? British Journal of Clinical Pharmacology 66(6):767–773. [PMC free article: PMC2675760] [PubMed: 18754841]
  • Golder, S., and Y. Loke. 2009. Search strategies to identify information on adverse effects: A systematic review. Journal of the Medical Library Association 97(2):84–92. [PMC free article: PMC2670220] [PubMed: 19404498]
  • Golder, S., and Y. K. Loke. 2010. Sources of information on adverse effects: A systematic review. Health Information & Libraries Journal 27(3):176–190. [PubMed: 20712712]
  • Golder, S., Y. Loke, and H. M. McIntosh. 2008. Poor reporting and inadequate searches were apparent in systematic reviews of adverse effects. Journal of Clinical Epidemiology 61(5):440–448. [PubMed: 18394536]
  • Goldsmith, M. R., C. R. Bankhead, and J. Austoker. 2007. Synthesising quantitative and qualitative research in evidence-based patient information. Journal of Epidemiology & Community Health 61(3):262–270. [PMC free article: PMC2652927] [PubMed: 17325406]
  • Gøtzsche, P. C. 1987. Reference bias in reports of drug trials. BMJ (Clinical Research Ed.) 295(6599):654–656. [PMC free article: PMC1257776] [PubMed: 3117277]
  • Gøtzsche, P. C. 1989. Multiple publication of reports of drug trials. European Journal of Clinical Pharmacology 36(5):429–432. [PubMed: 2666138]
  • Gøtzsche, P. C., A. Hrobjartsson, K. Maric, and B. Tendal. 2007. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 298(4):430–437. [PubMed: 17652297]
  • Green, S., and J. P. T. Higgins. 2008. Preparing a Cochrane review. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins, editor; and S. Green, editor. . Chichester, UK: John Wiley & Sons.
  • Gregoire, G., F. Derderian, and J. Le Lorier. 1995. Selecting the language of the publications included in a meta-analysis: Is there a Tower of Babel bias? Journal of Clinical Epidemiology 48(1):159–163. [PubMed: 7853041]
  • Hansen, R. A., G. Gartlehner, D. Kaufer, K. N. Lohr, and T. Carey. 2006. Drug class review of Alzheimer’s drugs: Final report. http://www​​/reports/final.cfm (accessed November 12, 2010). [PubMed: 20480924]
  • Harris, R. P., M. Helfand, S. H. Woolf, K. N. Lohr, C. D. Mulrow, S. M. Teutsch, D. Atkins, and Methods Work Group, Third U.S. Preventive Services Task Force. 2001. Current methods of the U.S. Preventive Services Task Force: A review of the process. American Journal of Preventive Medicine 20(3 Suppl):21–35. [PubMed: 11306229]
  • Hartling, L., F. A. McAlister, B. H. Rowe, J. Ezekowitz, C. Friesen, and T. P. Klassen. 2005. Challenges in systematic reviews of therapeutic devices and procedures. Annals of Internal Medicine 142(12 Pt 2):1100–1111. [PubMed: 15968035]
  • Helfand, M., and H. Balshem. 2010. AHRQ series paper 2: Principles for developing guidance: AHRQ and the Effective Health Care Program. Journal of Clinical Epidemiology 63(5):484–490. [PubMed: 19716268]
  • Helmer, D., I. Savoie, C. Green, and A. Kazanjian. 2001. Evidence-based practice: Extending the search to find material for the systematic review. Bulletin of the Medical Library Association 89(4):346–352. [PMC free article: PMC57963] [PubMed: 11837256]
  • Hempell, S., M. Suttorp, J. Miles, Z. Wang, M. Maglione, S. Morton, B. Johnsen, D. Valentine, and P. Shekelle. 2011. Assessing the empirical evidence of associations between internal validity and effect sizes in randomized controlled trials. Evidence Report/Technology Assessment No. HHSA 290 2007 10062 I (prepared by the Southern California Evidence-based Practice Center under Contract No. 290-2007-10062-I), Rockville, MD: AHRQ. [PubMed: 21834174]
  • Heres, S., S. Wagenpfeil, J. Hamann, W. Kissling, and S. Leucht. 2004. Language bias in neuroscience: Is the Tower of Babel located in Germany? European Psychiatry 19(4):230–232. [PubMed: 15196606]
  • Hernandez, D. A., M. M. El-Masri, and C. A. Hernandez. 2008. Choosing and using citation and bibliographic database software (BDS). Diabetic Education 34(3): 457–474. [PubMed: 18535319]
  • Higgins, J. P. T., and D. G. Altman. 2008. Assessing risk of bias in included studies. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins, editor; and S. Green, editor. . Chichester, U.: The Cochrane Collaboration.
  • Higgins, J. P. T., and J. J. Deeks. 2008. Selecting studies and collecting data. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins, editor; and S. Green, editor. . Chichester, UK: The Cochrane Collaboration.
  • Higgins, J. P. T., S. Green, and R. Scholten. 2008. Maintaining reviews: Updates, amendments and feedback. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins, editor; and S. Green, editor. . Chichester, UK: The Cochrane Collaboration.
  • Hirsch, L. 2008. Trial registration and results disclosure: Impact of U.S. legislation on sponsors, investigators, and medical journal editors. Current Medical Research and Opinion 24(6):1683–1689. [PubMed: 18462565]
  • Hoaglin, D. C., R. L. Light, B. McPeek, F. Mosteller, and M. A. Stoto. 1982. Data for decisions. Cambridge, MA: Abt Books.
  • Hopewell, S., L. Wolfenden, and M. Clarke. 2008. Reporting of adverse events in systematic reviews can be improved: Survey results. Journal of Clinical Epidemiology 61(6):597–602. [PubMed: 18411039]
  • Hopewell, S., K. Loudon, and M. J. Clarke et al. 2009. a. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database of Systematic Reviews 1:MR000006.pub3. [PMC free article: PMC8276556] [PubMed: 19160345]
  • Hopewell, S., M. J. Clarke, C. Lefebvre, and R. W. Scherer. 2009. b. Handsearching versus electronic searching to identify reports of randomized trials. Cochrane Database of Systematic Reviews 4: MR000001.pub2. [PMC free article: PMC7437388] [PubMed: 17443625]
  • Horton, J., B. Vandermeer, L. Hartling, L. Tjosvold, T. P. Klassen, and N. Buscemi. 2010. Systematic review data extraction: Cross-sectional study showed that experience did not increase accuracy. Journal of Clinical Epidemiology 63(3):289–298. [PubMed: 19683413]
  • Humphrey, L., B. K. S. Chan, S. Detlefsen, and M. Helfand. 2002. Screening for breast cancer. Edited by Oregon Health & Science University Evidence-based Practice Center under Contract No. 290-97-0018. Rockville, MD: Agency for Healthcare Research and Quality. [PubMed: 20722110]
  • Huston, P., and D. Moher. 1996. Redundancy, disaggregation, and the integrity of medical research. Lancet 347(9007):1024–1026. [PubMed: 8606568]
  • Huth, E. J. 1986. Irresponsible authorship and wasteful publication. Annals of Internal Medicine 104(2):257–259. [PubMed: 3946956]
  • ICMJE (International Committee of Medical Journal Editors). 2010. Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. http://www​ (accessed July 8, 2010).
  • Ioannidis, J. P. A., J. C. Cappelleri, H. S. Sacks, and J. Lau. 1997. The relationship between study design, results, and reporting of randomized clinical trials of HIV infection. Controlled Clinical Trials 18(5):431–444. [PubMed: 9315426]
  • Ioannidis, J. 1998. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. JAMA 279:281–286. [PubMed: 9450711]
  • IOM (Institute of Medicine). 2008. Knowing what works in health care: A roadmap for the nation. Edited by J. Eden, editor; , B. Wheatley, editor; , B. McNeil, editor; , and H. Sox, editor. . Washington, DC: The National Academies Press.
  • IOM. 2009. Initial national priorities for comparative effectiveness research. Washington, DC: The National Academies Press.
  • ISI Web of Knowledge. 2009. Web of science. http://images​.isiknowledge​.com/WOKRS49B3​/help/WOS/h_database.html (accessed May 28, 2010).
  • Jones, A. P., T. Remmington, P. R. Williamson, D. Ashby, and R. S. Smyth. 2005. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. Journal of Clinical Epidemiology 58(7):741–742. [PubMed: 15939227]
  • Jorgensen, A. W., K. L. Maric, B. Tendal, A. Faurschou, and P. C. Gotzsche. 2008. Industry-supported meta-analyses compared with meta-analyses with nonprofit or no support: Differences in methodological quality and conclusions. BMC Medical Research Methodology 8: Article no. 60. [PMC free article: PMC2553412] [PubMed: 18782430]
  • Jüni, P. 1999. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282:1054–1060. [PubMed: 10493204]
  • Jüni, P., M. Egger, D. G. Altman, and G. D. Smith. 2001. Assessing the quality of randomised controlled trials. In Systematic review in health care: Meta-analysis in context , edited by M. Egger, editor; , G. D. Smith, editor; , and D. G. Altman, editor. . London, UK: BMJ Publishing Group.
  • Jüni, P., F. Holenstein, J. Sterne, C. Bartlett, and M. Egger. 2002. Direction and impact of language bias in meta-analyses of controlled trials: Empirical study. International Journal of Epidemiology 31(1):115–123. [PubMed: 11914306]
  • Kelley, G., K. Kelley, and Z. Vu Tran. 2004. Retrieval of missing data for meta-analysis: A practical example. International Journal of Technology Assessment in Health Care 20(3):296. [PMC free article: PMC2443825] [PubMed: 15446759]
  • Khan, K. S., and J. Kleijnen. 2001. Stage II Conducting the review: Phase 4 Selection of studies. In CRD Report No. 4, edited by K. S. Khan, editor; , G. ter Riet, editor; , H. Glanville, editor; , A. J. Sowden, editor; and J. Kleijnen, editor. . York, U.K.: NHS Centre for Reviews and Dissemination, University of York.
  • Kirkham, J. J., K. M. Dwan, D. G. Altman, C. Gamble, S. Dodd, R. S. Smyth, and P. R. Williamson. 2010. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 340(7747):637–640. [PubMed: 20156912]
  • Kjaergard, L. L., and B. Als-Nielsen. 2002. Association between competing interests and authors’ conclusions: Epidemiological study of randomised clinical trials published in the BMJ. BMJ 325(7358):249. [PMC free article: PMC117638] [PubMed: 12153921]
  • Knowledge for Health. 2010. About POPLINE. http://www​ (accessed June 1, 2010).
  • Kuper, H., A. Nicholson, and H. Hemingway. 2006. Searching for observational studies: What does citation tracking add to PubMed? A case study in depression and coronary heart disease. BMC Medical Research Methodology 6:4. [PMC free article: PMC1403794] [PubMed: 16483366]
  • Lee, K., P. Bacchetti, and I. Sim. 2008. Publication of clinical trials supporting successful new drug applications: A literature analysis. PLoS Medicine 5(9):1348–1356. [PMC free article: PMC2553819] [PubMed: 18816163]
  • Lefebvre, C., E. Manheimer, and J. Glanville. 2008. Searching for studies. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins, editor; and S. Green, editor. . Chichester, U.K.: The Cochrane Collaboration.
  • Lemeshow, A. R., R. E. Blum, J. A. Berlin, M. A. Stoto, and G. A. Colditz. 2005. Searching one or two databases was insufficient for meta-analysis of observational studies. Journal of Clinical Epidemiology 58(9):867–873. [PubMed: 16085190]
  • Lexchin, J., L. A. Bero, B. Djulbegovic, and O. Clark. 2003. Pharmaceutical industry sponsorship and research outcome and quality: Systematic review. BMJ 326(7400):1167–1170. [PMC free article: PMC156458] [PubMed: 12775614]
  • Li, J., Q. Zhang, M. Zhang, and M. Egger. 2007. Intravenous magnesium for acute myocardial infarction. Cochrane Database of Systematic Reviews 2:CD002755. [PMC free article: PMC8407081] [PubMed: 17443517]
  • Liberati, A., D. G. Altman, J. Tetzlaff, C. Mulrow, P. C. Gotzsche, J. P. A. Ioannidis, M. Clarke, P. J. Devereaux, J. Kleijnen, and D. Moher. 2009. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. Annals of Internal Medicine 151(4):W1–W30. [PubMed: 19622512]
  • Light, R. L., and D. Pillemer. 1984. Summing up: The science of reviewing research. Cambridge, MA: Harvard University Press.
  • Linde, K., and S. N. Willich. 2003. How objective are systematic reviews? Differences between reviews on complementary medicine. Journal of the Royal Society of Medicine 96(1):17–22. [PMC free article: PMC539366] [PubMed: 12519797]
  • Lohr, K. 1998. Grading articles and evidence: Issues and options. Research Triangle Park, NC: RTI-UNC Evidence-based Practice Center.
  • Lohr, K. N., and T. S. Carey. 1999. Assessing “best evidence”: Issues in grading the quality of studies for systematic reviews. Joint Commission Journal on Quality Improvement 25(9):470–479. [PubMed: 10481816]
  • Louden, K., S. Hopewell, M. Clarke, D. Moher, R. Scholten, A. Eisinga, and S. D. French. 2008. A decision tree and checklist to guide decisions on whether, and when, to update Cochrane reviews. In A decision tool for updating Cochrane reviews. Chichester, U.K.: The Cochrane Collaboration.
  • Lundh, A., S. L. Knijnenburg, A. W. Jorgensen, E. C. van Dalen, and L. C. M. Kremer. 2009. Quality of systematic reviews in pediatric oncology: A systematic review. Cancer Treatment Reviews 35(8):645–652. [PubMed: 19836897]
  • Lynch, J. R., M. R. A. Cunningham, W. J. Warme, D. C. Schaad, F. M. Wolf, and S. S. Leopold. 2007. Commercially funded and United States-based research is more likely to be published: Good-quality studies with negative outcomes are not. Journal of Bone and Joint Surgery (American Volume) 89A(5):1010–1018. [PubMed: 17473138]
  • MacLean, C. H., S. C. Morton, J. J. Ofman, E. A. Roth, P. G. Shekelle, and Center Southern California Evidence-Based Practice. 2003. How useful are unpublished data from the Food and Drug Administration in meta-analysis? Journal of Clinical Epidemiology 56(1):44–51. [PubMed: 12589869]
  • Mathieu, S., I. Boutron, D. Moher, D. G. Altman, and P. Ravaud. 2009. Comparison of registered and published primary outcomes in randomized controlled trials. JAMA 302(9):977–984. [PubMed: 19724045]
  • McAuley, L., B. Pham, P. Tugwell, and D. Moher. 2000. Does the inclusion of grey literature influence estimates of intervention effectiveness reported in meta-analyses? Lancet 356(9237):1228–1231. [PubMed: 11072941]
  • McDonagh, M. S., K. Peterson, S. Carson, R. Fu, and S. Thakurta. 2010. Drug class review: Atypical antipsychotic drugs. Update 3. http://derp​​/final/AAP_final_report_update​%203_version%203_JUL_10.pdf (accessed November 4, 2010).
  • McGauran, N., B. Wieseler, J. Kreis, Y. Schuler, H. Kolsch, and T. Kaiser. 2010. Reporting bias in medical research: A narrative review. Trials 11:37. [PMC free article: PMC2867979] [PubMed: 20388211]
  • McGowan, J., and M. Sampson. 2005. Systematic reviews need systematic searchers. Journal of the Medical Library Association 93(1):74–80. [PMC free article: PMC545125] [PubMed: 15685278]
  • McKibbon, K. A., N. L. Wilczynski, R. B. Haynes, and T. Hedges. 2009. Retrieving randomized controlled trials from Medline: A comparison of 38 published search filters. Health Information and Libraries Journal 26(3):187–202. [PubMed: 19712211]
  • Miller, J. 2010. Registering clinical trial results: The next step. JAMA 303(8):773–774. [PubMed: 20179288]
  • Miller, J. N., G. A. Colditz, and F. Mosteller. 1989. How study design affects outcomes in comparisons of therapy II: Surgical. Statistics in Medicine 8(4):455–466. [PubMed: 2727469]
  • Moher, D., and A. Tsertsvadze. 2006. Systematic reviews: When is an update an update? Lancet 367(9514):881–883. [PubMed: 16546523]
  • Moher, D., P. Fortin, A. R. Jadad, P. Jüni, T. Klassen, J. LeLorier, A. Liberati, K. Linde, and A. Penna. 1996. Completeness of reporting of trials published in languages other than English: Implications for conduct and reporting of systematic reviews. Lancet 347(8998):363–366. [PubMed: 8598702]
  • Moher, D., B. Pham, A. Jones, D. J. Cook, A. R. Jadad, M. Moher, P. Tugwell, and T. P. Klassen. 1998. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 352(9128):609–613. [PubMed: 9746022]
  • Moher, D., D. J. Cook, S. Eastwood, I. Olkin, D. Rennie, and D. F. Stroup. 1999. Improving the quality of reports of mega-analyses of randomised controlled trials: The QUOROM statement. Lancet 354(9193):1896–1900. [PubMed: 10584742]
  • Moher, D., B. Pham, T. P. Klassen, K. F. Schulz, J. A. Berlin, A. R. Jadad, and A. Liberati. 2000. What contributions do languages other than English make on the results of meta-analyses? Journal of Clinical Epidemiology 53(9):964–972. [PubMed: 11004423]
  • Moher, D., B. Pham, M. L. Lawson, and T. P. Klassen. 2003. The inclusion of reports of randomised trials published in languages other than English in systematic reviews. Health Technology Assessment 7(41):1–90. [PubMed: 14670218]
  • Moher, D., J. Tetzlaff, A. C. Tricco, M. Sampson, and D. G. Altman. 2007. a. Epidemiology and reporting characteristics of systematic reviews. PLoS Medicine 4(3):447–455. [PMC free article: PMC1831728] [PubMed: 17388659]
  • Moher, D., A. Tsertsvadze, A. C. Tricco, M. Eccles, J. Grimshaw, M. Sampson, and N. Barrowman. 2007. b. A systematic review identified few methods and strategies describing when and how to update systematic reviews. Journal of Clinical Epidemiology 60(11):1095–1104. [PubMed: 17938050]
  • Moja, L., E. Telaro, R. D’Amico, I. Moschetti, L. Coe, and A. Liberati. 2005. Assessment of methodological quality of primary studies by systematic reviews: results of the metaquality cross sectional study. BMJ 330:1053. [PMC free article: PMC557223] [PubMed: 15817526]
  • Mojon-Azzi, S. M., X. Jiang, U. Wagner, and D. S. Mojon. 2004. Redundant publications in scientific ophthalmologic journals: The tip of the iceberg? Ophthalmology 111(5):863–866. [PubMed: 15121360]
  • Montori, V. M., P. J. Devereaux, N. K. Adhikari, K. E. A. Burns, C. H. Eggert, M. Briel, C. Lacchetti, T. W. Leung, E. Darling, D. M. Bryant, H. C. Bucher, H. J. Schünemann, M. O. Meade, D. J. Cook, P. J. Erwin, A. Sood, R. Sood, B. Lo, C. A. Thompson, Q. Zhou, E. Mills, and G. Guyatt. 2005. Randomized trials stopped early for benefit: A systematic review. JAMA 294(17):2203–2209. [PubMed: 16264162]
  • Moore, T. 1995. Deadly medicine: Why tens of thousands of hearts died in America’s worst drug disaster. New York: Simon & Schuster.
  • Morrison, A., K. Moulton, M. Clark, J. Polisena, M. Fiander, M. Mierzwinski-Urban, S. Mensinkai, T. Clifford, and B. Hutton. 2009. English-language restriction when conducting systematic review-based meta-analyses: Systematic review of published studies. Ottawa, CA: Canadian Agency for Drugs and Technologies in Health.
  • Mrkobrada, M., H. Thiessen-Philbrook, R. B. Haynes, A. V. Iansavichus, F. Rehman, and A. X. Garg. 2008. Need for quality improvement in renal systematic reviews. Clinical Journal of the American Society of Nephrology 3(4):1102–1114. [PMC free article: PMC2440265] [PubMed: 18400967]
  • Mulrow, C. D., and A. D. Oxman. 1994. Cochrane Collaboration handbook, The Cochrane Library . Chichester, U.K.: The Cochrane Collaboration.
  • Nallamothu, B. K., R. A. Hayward, and E. R. Bates. 2008. Beyond the randomized clinical trial: The role of effectiveness studies in evaluating cardiovascular therapies. Circulation 118(12):1294–1303. [PubMed: 18794402]
  • Nassir Ghaemi, S., A. A. Shirzadi, and M. Filkowski. 2008. Publication bias and the pharmaceutical industry: The case of lamotrigine in bipolar disorder. Medscape Journal of Medicine 10(9):211. [PMC free article: PMC2580079] [PubMed: 19008973]
  • National Library of Medicine. 2008. MEDLINE fact sheet. http://www​​/pubs/factsheets/medline.html (accessed May 28, 2010).
  • New York Academy of Medicine. 2010. Grey literature report. http://www​​/pages/grey_literature_report (accessed June 2, 2010).
  • Nieminen, P., G. Rucker, J. Miettunen, J. Carpenter, and M. Schumacher. 2007. Statistically significant papers in psychiatry were cited more often than others. Journal of Clinical Epidemiology 60(9):939–946. [PubMed: 17689810]
  • NLM (National Library of Medicine). 2009. Fact sheet: http://www​​/pubs/factsheets/clintrial.html (accessed June 16, 2010).
  • Norris, S., D. Atkins, W. Bruening, S. Fox, E. Johnson, R. Kane, S. C. Morton, M. Oremus, M. Ospina, G. Randhawa, K. Schoelles, P. Shekelle, and M. Viswanathan. 2010. Selecting observational studies for comparing medical interventions. In Methods guide for comparative effectiveness reviews, edited by Agency for Healthcare Research and Quality. http://www​.effectivehealthcare​​.cfm/search-for-guides-reviews-and-reports​/?pageaction=displayProduct&productID=454 (accessed January 19, 2011).
  • Nüesch, E., S. Trelle, S. Reichenbach, A. W. S. Rutjes, B. Tschannen, D. G. Altman, M. Egger, and P. Jüni. 2010. Small study effects in meta-analyses of osteoarthritis trials: Meta-epidemiological study. BMJ 341(7766):241. [PMC free article: PMC2905513] [PubMed: 20639294]
  • O’Connor, A. B. 2009. The need for improved access to FDA reviews. JAMA 302(2):191–193. [PubMed: 19584349]
  • Okike, K., M. S. Kocher, C. T. Mehlman, J. D. Heckman, and M. Bhandari. 2008. Publication bias in orthopedic research: An analysis of scientific factors associated with publication in the Journal of Bone and Joint Surgery (American Volume). 90A(3):595–601. [PubMed: 18310710]
  • Olson, C. M., D. Rennie, D. Cook, K. Dickersin, A. Flanagin, J. W. Hogan, Q. Zhu, J. Reiling, and B. Pace. 2002. Publication bias in editorial decision making. JAMA 287(21):2825–2828. [PubMed: 12038924]
  • Online Computer Library Center. 2010. The OAIster® database. http://www​ (accessed June 3, 2010).
  • OpenSIGLE. 2010. OpenSIGLE. http://opensigle​ (accessed June 2, 2010).
  • Peinemann, F., N. McGauran, S. Sauerland, and S. Lange. 2008. Disagreement in primary study selection between systematic reviews on negative pressure wound therapy. BMC Medical Research Methodology 8:41. [PMC free article: PMC2496910] [PubMed: 18582373]
  • Perlin, J. B., and J. Kupersmith. 2007. Information technology and the inferential gap. Health Affairs 26(2):W192–W194. [PubMed: 17259203]
  • Pham, B., T. P. Klassen, M. L. Lawson, and D. Moher. 2005. Language of publication restrictions in systematic reviews gave different results depending on whether the intervention was conventional or complementary. Journal of Clinical Epidemiology 58(8):769–776. [PubMed: 16086467]
  • Pildal, J., A. Hrobjartsson, K. J. Jorgensen, J. Hilden, D. G. Altman, and P. C. Gøtzsche. 2007. Impact of allocation concealment on conclusions drawn from meta-analyses of randomized trials (2007) vol. 36 (847–857). International Journal of Epidemiology 36(4):847–857. [PubMed: 17517809]
  • ProQuest. 2010. ProQuest dissertations & theses database. http://www​​/en-US/catalogs/databases​/detail/pqdt.shtml (accessed June 2, 2010).
  • Ravnskov, U. 1992. Cholesterol lowering trials in coronary heart disease: Frequency of citation and outcome. BMJ 305(6844):15–19. [PMC free article: PMC1882525] [PubMed: 1638188]
  • Ravnskov, U. 1995. Quotation bias in reviews of the diet–heart idea. Journal of Clinical Epidemiology 48(5):713–719. [PubMed: 7730926]
  • RefWorks. 2009. RefWorks. http://refworks​.com/content​/products/content.asp (accessed July 2, 2010).
  • Relevo, R., and H. Balshem. 2011. Finding evidence for comparing medical interventions. In Methods guide for comparative effectiveness reviews, edited by Agency for Healthcare Research and Quality. http://www​.effectivehealthcare​​.cfm/search-for-guides-reviews-and-reports​/?pageaction=displayProduct&productID=605 (accessed January 19, 2011).
  • Rising, K., P. Bacchetti, and L. Bero. 2008. Reporting bias in drug trials submitted to the Food and Drug Administration: Review of publication and presentation. PLoS Medicine 5(11):1561–1570. [PMC free article: PMC2586350] [PubMed: 19067477]
  • Rosenthal, E. L., J. L. Masdon, C. Buckman, and M. Hawn. 2003. Duplicate publications in the otolaryngology literature. Laryngoscope 113(5):772–774. [PubMed: 12792309]
  • Ross, J. S., G. K. Mulvey, E. M. Hines, S. E. Nissen, and H. M. Krumholz. 2009. Trial publication after registration in A cross-sectional analysis. PLoS Medicine 6(9):e1000144. [PMC free article: PMC2728480] [PubMed: 19901971]
  • Rothwell, P. M. 1995. Can overall results of clinical trials be applied to all patients? Lancet 345(8965):1616–1619. [PubMed: 7783541]
  • Rothwell, P. M. 2005. External validity of randomised controlled trials: To whom do the results of this trial apply? Lancet 365(9453):82–93. [PubMed: 15639683]
  • Rothwell, P. M. 2006. Factors that can affect the external validity of randomised controlled trials. PLOS Clinical Trials 1(1):e9. [PMC free article: PMC1488890] [PubMed: 16871331]
  • Roundtree, A. K., M. A. Kallen, M. A. Lopez-Olivo, B. Kimmel, B. Skidmore, Z. Ortiz, V. Cox, and M. E. Suarez-Almazor. 2008. Poor reporting of search strategy and conflict of interest in over 250 narrative and systematic reviews of two biologic agents in arthritis: A systematic review. Journal of Clinical Epidemiology 62(2):128–137. [PubMed: 19013763]
  • Royle, P., and R. Milne. 2003. Literature searching for randomized controlled trials used in Cochrane reviews: Rapid versus exhaustive searches. International Journal of Technology Assessment in Health Care 19(4):591–603. [PubMed: 15095765]
  • Royle, P., and N. Waugh. 2003. Literature searching for clinical and cost-effectiveness studies used in health technology assessment reports carried out for the National Institute for Clinical Excellence appraisal system. Health Technology Assessment 7(34):1–51. [PubMed: 14609481]
  • Royle, P., L. Bain, and N. Waugh. 2005. Systematic reviews of epidemiology in diabetes: Finding the evidence. BMC Medical Research Methodology 5(1):2. [PMC free article: PMC545080] [PubMed: 15638944]
  • Sampson, M., and J. McGowan. 2006. Errors in search strategies were identified by type and frequency. Journal of Clinical Epidemiology 59(10):1057.e 1–1057.e9. [PubMed: 16980145]
  • Sampson, M., K. G. Shojania, C. Garritty, T. Horsley, M. Ocampo, and D. Moher. 2008. Systematic reviews can be produced and published faster. Journal of Clinical Epidemiology 61(6):531–536. [PubMed: 18471656]
  • Sampson, M., J. McGowan, E. Cogo, J. Grimshaw, D. Moher, and C. Lefebvre. 2009. An evidence-based practice guideline for the peer review of electronic search strategies. Journal of Clinical Epidemiology 62(9):944–952. [PubMed: 19230612]
  • Sanderson, S., I. D. Tatt, and J. P. Higgins. 2007. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: A systematic review and annotated bibliography. International Journal of Epidemiology 36(3):666–676. [PubMed: 17470488]
  • Savoie, I., D. Helmer, C. J. Green, and A. Kazanjian. 2003. Beyond MEDLINE: Reducing bias through extended systematic review search. International Journal of Technology Assessment in Health Care 19(1):168–178. [PubMed: 12701949]
  • Schein, M., and R. Paladugu. 2001. Redundant surgical publications: Tip of the iceberg? Surgery 129(6):655–661. [PubMed: 11391360]
  • Scherer, R. W., P. Langenberg, and E. Von Elm. 2007. Full publication of results initially presented in abstracts. Cochrane Database of Systematic Reviews 2:MR000005. [PubMed: 17443628]
  • Schmidt, L. M., and P. C. Gøtzsche. 2005. Of mites and men: Reference bias in narrative review articles: A systematic review. Journal of Family Practice 54(4):334–338. [PubMed: 15833223]
  • Schulz, K. F., L. Chalmers, R. J. Hayes, and D. G. Altman. 1995. Empirical evidence of bias: Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273(5):408–412. [PubMed: 7823387]
  • Scopus. 2010. Scopus in detail. http://info​​/scopus-in-detail/contentcoverage-guide/ (accessed May 28, 2010).
  • Shikata, S., T. Nakayama, Y. Noguchi, Y. Taji, and H. Yamagishi. 2006. Comparison of effects in randomized controlled trials with observational studies in digestive surgery. Annals of Surgery 244(5):668–676. [PMC free article: PMC1856609] [PubMed: 17060757]
  • Shojania, K. G., M. Sampson, M. T. Ansari, J. Ji, S. Doucette, and D. Moher. 2007. How quickly do systematic reviews go out of date? A survival analysis. Annals of Internal Medicine 147(4):224–233. [PubMed: 17638714]
  • Silagy, C. A., P. Middleton, and S. Hopewell. 2002. Publishing protocols of systematic reviews: Comparing what was done to what was planned. JAMA 287(21):2831–2834. [PubMed: 12038926]
  • Sismondo, S. 2008. Pharmaceutical company funding and its consequences: A qualitative systematic review. Contemporary Clinical Trials 29(2):109–113. [PubMed: 17919992]
  • Song, F., S. Parekh-Bhurke, L. Hooper, Y. K. Loke, J. J. Ryder, A. J. Sutton, C. B. Hing, and I. Harvey. 2009. Extent of publication bias in different categories of research cohorts: A meta-analysis of empirical studies. BMC Medical Research Methodology 9(1):79–93. [PMC free article: PMC2789098] [PubMed: 19941636]
  • Song, F., S. Parekh, L. Hooper, Y. K. Loke, J. Ryder, A. J. Sutton, C. Hing, C. S. Kwok, C. Pang, and I. Harvey. 2010. Dissemination and publication of research findings: An updated review of related biases. Health Technology Assessment 14(8):1–193. [PubMed: 20181324]
  • Sterne, J., editor; , M. Egger, editor; , and D. Moher, editor; , eds. 2008. Chapter 10: Addressing reporting biases. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins, editor; and S. Green, editor. . Chichester, U.K.: The Cochrane Collaboration.
  • Sutton, A. J., S. Donegan, Y. Takwoingi, P. Garner, C. Gamble, and A. Donald. 2009. . Journal of Clinical Epidemiology 62(3):241–251. [PubMed: 18783919]
  • Thompson, R. L., E. V. Bandera, V. J. Burley, J. E. Cade, D. Forman, J. L. Freudenheim, D. Greenwood, D. R. Jacobs, Jr., R. V. Kalliecharan, L. H. Kushi, M. L. McCullough, L. M. Miles, D. F. Moore, J. A. Moreton, T. Rastogi, and M. J. Wiseman. 2008. Reproducibility of systematic literature reviews on food, nutrition, physical activity and endometrial cancer. Public Health Nutrition 11(10):1006–1014. [PubMed: 18053295]
  • Thomson Reuters. 2010. EndNote web information. http://endnote​.com/enwebinfo.asp (accessed July 2, 2010).
  • Tramer, M. R., D. J. Reynolds, R. A. Moore, and H. J. McQuay. 1997. Impact of covert duplicate publication on meta-analysis: A case study. BMJ 315(7109):635–640. [PMC free article: PMC2127450] [PubMed: 9310564]
  • Tricco, A. C., J. Tetzlaff, M. Sampson, D. Fergusson, E. Cogo, T. Horsley, and D. Moher. 2008. Few systematic reviews exist documenting the extent of bias: A systematic review. Journal of Clinical Epidemiology 61(5):422–434. [PubMed: 18394534]
  • Turner, E. H., A. M. Matthews, E. Linardatos, R. A. Tell, and R. Rosenthal. 2008. Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine 358(3):252–260. [PubMed: 18199864]
  • Van de Voorde, C., and C. Leonard. 2007. Search for evidence and critical appraisal . Brussels, Belgium: Belgian Health Care Knowledge Centre.
  • van Tulder, M. W., M. Suttorp, S. Morton, L. M. Bouter, and P. Shekelle. 2009. Empirical evidence of an association between internal validity and effect size in randomized controlled trials of low-back pain. Spine (Phila PA 1976) 34(16):1685–1692. [PubMed: 19770609]
  • Vandenbroucke, J. P. 2004. Benefits and harms of drug treatments: Observational studies and randomised trials should learn from each other. BMJ 329(7456):2–3. [PMC free article: PMC443425] [PubMed: 15231587]
  • Vedula, S. S., L. Bero, R. W. Scherer, and K. Dickersin. 2009. Outcome reporting in industry-sponsored trials of gabapentin for off-label use. New England Journal of Medicine 361(20):1963–1971. [PubMed: 19907043]
  • Voisin, C. E., C. de la Varre, L. Whitener, and G. Gartlehner. 2008. Strategies in assessing the need for updating evidence-based guidelines for six clinical topics: An exploration of two search methodologies. Health Information and Libraries Journal 25(3):198–207. [PubMed: 18796080]
  • von Elm, E., G. Poglia, B. Walder, and M. R. Tramer. 2004. Different patterns of duplicate publication: An analysis of articles used in systematic reviews. JAMA 291(8):974–980. [PubMed: 14982913]
  • Walker, C. F., K. Kordas, R. J. Stoltzfus, and R. E. Black. 2005. Interactive effects of iron and zinc on biochemical and functional outcomes in supplementation trials. American Journal of Clinical Nutrition 82(1):5–12. [PubMed: 16002793]
  • WAME (World Association of Medical Editors). 2010. Publication ethics policies for medical journals . http://www​​/publication-ethics-policies-for-medical-journals (accessed November 10, 2010).
  • Wennberg, D. E., F. L. Lucas, J. D. Birkmeyer, C. E. Bredenberg, and E. S. Fisher. 1998. Variation in carotid endarterectomy mortality in the Medicare population: Trial hospitals, volume, and patient characteristics. JAMA 279(16):1278–1281. [PubMed: 9565008]
  • Whitlock, E. P., E. A. O’Connor, S. B. Williams, T. L. Beil, and K. W. Lutz. 2008. Effectiveness of weight management programs in children and adolescents. Rockville, MD: AHRQ. [PMC free article: PMC4781137] [PubMed: 19408967]
  • WHO (World Health Organization). 2006. African Index Medicus. http://indexmedicus​ (accessed June 2, 2010).
  • WHO. 2010. International Clinical Trials Registry Platform. http://www​ (accessed June 17, 2010).
  • Wieland, S., and K. Dickersin. 2005. Selective exposure reporting and Medline indexing limited the search sensitivity for observational studies of the adverse effects of oral contraceptives. Journal of Clinical Epidemiology 58(6):560–567. [PubMed: 15878469]
  • Wilczynski, N. L., R. B. Haynes, A. Eady, B. Haynes, S. Marks, A. McKibbon, D. Morgan, C. Walker-Dilks, S. Walter, S. Werre, N. Wilczynski, and S. Wong. 2004. Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: An analytic survey. BMC Medicine 2:23. [PMC free article: PMC441418] [PubMed: 15189561]
  • Wilt, T. J. 2006. Comparison of endovascular and open surgical repairs for abdominal aortic aneurysm. Rockville, MD: AHRQ. [PMC free article: PMC4780951] [PubMed: 17764213]
  • Wood, A. J. J. 2009. Progress and deficiencies in the registration of clinical trials. New England Journal of Medicine 360(8):824–830. [PubMed: 19228628]
  • Wood, A. M., I. R. White, and S. G. Thompson. 2004. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clinical Trials 1(4):368–376. [PubMed: 16279275]
  • Wood, L., M. Egger, L. L. Gluud, K. F. Schulz, P. Jüni, D. G. Altman, C. Gluud, R. M. Martin, A. J. Wood, and J. A. Sterne. 2008. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: Meta-epidemiological study. BMJ 336(7644):601–605. [PMC free article: PMC2267990] [PubMed: 18316340]
  • Wortman, P. M., and W. H. Yeaton. 1983. Synthesis of results in controlled trials of coronary bypass graft surgery. In Evaluation studies review annual, edited by R. L. Light, editor. . Beverly Hills, CA: Sage.
  • Yoshii, A., D. A. Plaut, K. A. McGraw, M. J. Anderson, and K. E. Wellik. 2009. Analysis of the reporting of search strategies in Cochrane systematic reviews. Journal of the Medical Library Association 97(1):21–29. [PMC free article: PMC2605027] [PubMed: 19158999]
  • Zarin, D. A. 2005. Clinical trial registration. New England Journal of Medicine 352(15):1611. [PubMed: 15829551]
  • Zarin, D. A., T. Tse, and N. C. Ide. 2005. Trial registration at between May and October 2005. New England Journal of Medicine 353(26):2779–2787. [PMC free article: PMC1568386] [PubMed: 16382064]



See Chapter 2 for the committee’s recommended standards for establishing the research protocol.


For more information on the Cochrane Information Retrieval Methods Group, go to http://irmg​


MeSH (Medical Subject Headings) is the National Library of Medicine’s controlled vocabulary thesaurus.


The Morrison study excluded complementary and alternative medicine interventions.


In literature searching, “sensitivity” is the proportion of relevant articles that are identified using a specific search strategy; “precision” refers to the proportion of articles identified by a search strategy that are relevant (CRD 2009).


Public Law 105-115 sec. 113.


Public Law 110-85.


Phase I trials are excluded.


Required data include demographic and baseline characteristics of the patients, the number of patients lost to follow-up, the number excluded from the analysis, and the primary and secondary outcomes measures (including a table of values with appropriate tests of the statistical significance of the values) (Miller 2010).


NDA data were not easily accessed at the time of the MacLean study; the investigators had to collect the data through a Freedom of Information Act request.


The ACP Journal Club, once a stand-alone bimonthly journal, is now a monthly feature of the Annals of Internal Medicine. The club’s purpose is to feature structured abstracts (with commentaries from clinical experts) of the best original and review articles in internal medicine and other specialties. For more information go to www​


Personal communication, Stephanie Chang, Medical Officer, AHRQ (March 12, 2010).


See Chapter 5 for a complete review of SR reporting issues.


Qualitative and quantitative synthesis methods are the subject of Chapter 4.


Allocation concealment is a method used to prevent selection bias in clinical trials by concealing the allocation sequence from those assigning participants to intervention groups. Allocation concealment prevents researchers from (unconsciously or otherwise) influencing the intervention group to which each participant is assigned.


“PICOTS” is a commonly used mnemonic for guiding the formulation of an SR’s research question. The acronym refers to: Population, Intervention, Comparator, Outcomes, Timing, and Setting. Some systematic review teams use an abbreviated form such as PICO or PICOS.


Chapter 4 addresses the assessment of a body of evidence.


See Chapter 2, Standard 2.6 (Develop a systematic review protocol).

Copyright 2011 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK209517


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.5M)
  • Disable Glossary Links

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...