Appendix DMethods used to assess quality of studies

Publication Details

Study quality was objectively assessed using predetermined criteria for internal validity, which were based on a combination of the US Preventive Services Task Force and the National Health Service Centre for Reviews and Dissemination1, 2 criteria.

All included studies, regardless of design, were assessed for quality and assigned a rating of “good,” “fair,” or “poor”. Studies that have a fatal flaw were rated poor quality. A fatal flaw is reflected in failing to meet combinations of criteria that may be related in indicating the presence of bias. An example would be inadequate procedures for allocation concealment combined with important differences in prognostic factors at baseline. Studies that meet all criteria were rated good quality; the remainder were rated fair quality. As the fair-quality category was broad, studies with this rating varied in their strengths and weaknesses: The results of some fair-quality studies were likely to be valid, while others were only possibly valid. A poor-quality trial was not valid; the results were at least as likely to reflect flaws in the study design as a true difference between the compared drugs.

Criteria for assessing applicability (external validity) are also listed, although they were not used to determine study quality.

Systematic Reviews

1. Does the systematic review report a clear review question and clearly state inclusion and exclusion criteria for primary studies?

A good-quality review focuses on a well-defined question or set of questions, which ideally refer to the inclusion/exclusion criteria by which decisions are made about whether to include or exclude primary studies. These criteria would relate to the four components of study design, indications (patient populations), interventions (drugs), and outcomes of interest. A good-quality review also includes details about the process of decision-making, that is, how many reviewers were involved, whether the studies were examined independently, and how disagreements between reviewers were resolved.

2. Is there evidence of a substantial effort to find all relevant research?

If details of electronic database searches and other identification strategies are given, the answer to this question usually is yes. Ideally, search terms, date restrictions, and language restrictions are presented. In addition, descriptions of hand-searches, attempts to identify unpublished material, and any contact with authors, industry, or research institutes should be provided. The appropriateness of the database(s) searched by the authors should also be considered. For example, if only MEDLINE is searched for a systematic review about health education, then it is unlikely that all relevant studies will be located.

3. Is the validity of included studies adequately assessed?

If the review systematically assesses the quality of primary studies, it should include an explanation of the basis for determining quality (for example, method of randomization, whether outcome assessment was blinded, whether analysis was on an intention-to-treat basis) and the process by which assessment is carried out (that is, how many reviewers are involved, whether the assessment is independent, and how discrepancies between reviewers are resolved). Authors may have used either a published checklist or scale or one that they designed specifically for their review.

4. Is sufficient detail of the individual studies presented?

The review should show that the included studies are suitable to answer the question posed and that a judgment on the appropriateness of the authors' conclusions can be made. If a paper includes a table giving information on the design and results of individual studies or includes a narrative description of the studies, this criterion is usually fulfilled. If relevant, the tables or text should include information on study design, sample size for each study group, patient characteristics, interventions, settings, outcome measures, follow-up, drop-out rate (withdrawals), effectiveness results, and adverse events.

5. Are the primary studies summarized appropriately?

The authors should attempt to synthesize the results from individual studies. In all cases, there should be a narrative summary of results, which may or may not be accompanied by a quantitative summary (meta-analysis).

For reviews that use a meta-analysis, heterogeneity between studies should be assessed using statistical techniques. If heterogeneity is present, the possible reasons (including chance) should be investigated. In addition, the individual evaluations should be weighted in some way (for example, according to sample size or according to inverse of the variance) so that studies that are thought to provide the most reliable data have greater impact on the summary statistic.

Controlled Trials

Assessment of Internal Validity

  1. Was the assignment to the treatment groups really random?
    • Adequate approaches to sequence generation:
      • Computer-generated random numbers
      • Random numbers tables
    • Inferior approaches to sequence generation:
      • Use of alternation, case record number, birth date, or day of week
    • Not reported
  2. Was the treatment allocation concealed?
    • Adequate approaches to concealment of randomization:
      • Centralized or pharmacy-controlled randomization
      • Serially-numbered identical containers
      • On-site computer based system with a randomization sequence that is not readable until allocation
      • Other approaches sequence to clinicians and patients
    • Inferior approaches to concealment of randomization:
      • Use of alternation, case record number, birth date, or day of week
      • Open random numbers lists
      • Serially numbered envelopes (even sealed opaque envelopes can be subject to manipulation)
    • Not reported
  3. Were the groups similar at baseline in terms of prognostic factors?
  4. Were the eligibility criteria specified?
  5. Were outcome assessors blinded to treatment allocation?
  6. Was the care provider blinded?
  7. Was the patient kept unaware of the treatment received?
  8. Did the article include an intention-to-treat analysis or provide the data needed to calculate it (that is, number assigned to each group, number of subjects who finished in each group and their results)?
  9. Did the study maintain comparable groups?
  10. Did the article report attrition, crossovers, adherence, and contamination?
  11. Is there important differential loss to follow-up or overall high loss to follow-up? (Study should give number for each group.)

Assessment of External Validity (Generalizability)

  1. How similar is the population to the population to whom the intervention would be applied?
  2. How many patients were recruited?
  3. What were the exclusion criteria for recruitment? (Study should give number excluded at each step.)
  4. What was the funding source and role of funder in the study?
  5. Did the control group receive the standard of care?
  6. What was the length of follow-up? (Give numbers at each stage of attrition.)

Nonrandomized studies

Assessment of Internal Validity

  1. Was the selection of patients for inclusion unbiased? (Was any group of patients systematically excluded?)
  2. Was there important differential loss to follow-up or overall high loss to follow-up? (Numbers should be given for each group.)
  3. Were the events investigated specified and defined?
  4. Was there a clear description of the techniques used to identify the events?
  5. Was there unbiased and accurate ascertainment of events (that is, by independent ascertainers using a validated ascertainment technique)?
  6. Were potential confounding variables and risk factors identified and examined using acceptable statistical techniques?
  7. Was the duration of follow-up reasonable for investigated events? (Did it meet the stated threshold?)

Assessment of External Validity

  1. Was the population described adequately?
  2. How similar was the population to the population to whom the intervention would be applied?
  3. How many patients were recruited?
  4. What were the exclusion criteria for recruitment? (The study should give numbers excluded at each step.)
  5. What was the funding source and role of funder in the study?


Anonymous. Undertaking systematic reviews of research on effectiveness: CRD's guidance for those carrying out or commissioning reviews. CRD Report. 2001(4)
Harris RP, Helfand M, Woolf SH. Current methods of the US Preventive Services Task Force: a review of the process. American Journal of Preventive Medicine. 2001;20(3 Suppl):21–35. [PubMed: 11306229]