Topic Development

The topic for this report was nominated in a public process. With input from technical experts, the Scientific Resource Center (SRC) for the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program drafted the initial key questions and, after approval from AHRQ, posted them to a public Web site. The public was invited to comment on these questions. After reviewing the public commentary, the SRC drafted final key questions and submitted them to AHRQ for approval.

Search Strategy

We conducted a comprehensive search of the scientific literature to identify systematic reviews, randomized controlled trials, and nonrandomized comparative studies relevant to the key questions. Searches of electronic databases used the National Library of Medicine’s Medical Subject Headings (MeSH) keyword nomenclature developed for MEDLINE® and adapted for use in other databases. Searches included terms for drug interventions, hypertension, and study design, and were limited to studies published in English after 1988. The texts of the major search strategies are given in Appendix A. We also reviewed selected materials received from the SRC, the reference lists of relevant review articles, and citations identified by peer and public reviewers of the draft report. We did not undertake a systematic search for unpublished data.

To identify literature describing direct comparisons of ACEIs versus ARBs we searched:

  • MEDLINE® (1966 to May Week 3 2006).
  • The Cochrane Central Register of Controlled Trials.
  • A register of systematic reviews underway in the Cochrane Hypertension Review Group.
  • Scientific information packets submitted through the SRC by AstraZeneca, Bristol-Myers Squibb, Kos, and Merck.

We conducted additional searches in MEDLINE® for studies of ARBs versus other (non-ACEI) comparators and ACEIs versus other (non-ARB) comparators for potential use in the event that evidence from direct head-to-head trials proved to be insufficient for some or all of the outcomes of interest in this review. The search strategies used to identify this potentially relevant indirect comparator literature are included in Appendix A. The process used to screen this literature and evaluate its relevance is described in Appendix B.

Our searches identified a total of 1,185 citations. We imported all citations into an electronic database (ProCite® 4).

Study Selection

We developed criteria for inclusion and exclusion based on the patient populations, interventions, and outcome measures specified in the key questions. The abstract screening criteria we used (Appendix C) were designed to identify potentially relevant indirect comparator studies (ACEI versus non-ARB or placebo and ARB versus non-ACEI or placebo), as well as direct head-to-head comparator studies. We retrieved the full text of all potentially relevant abstracts for further review. In the case of direct comparator studies, we applied a second, more stringent set of criteria for inclusion and exclusion (Appendix C). Full-text screening of the indirect comparative literature proceeded along a separate track, which is described in Appendix B.

The remainder of this section describes in greater detail the criteria we used to screen the direct comparator literature.

Population and Condition of Interest

As specified in the key questions, this review focused on adult patients (age 18 years or older) with essential hypertension, as defined by study authors. We included studies with patients of mixed ages and mixed diagnoses only if results were reported separately for the relevant subgroups.

Interventions and Comparators of Interest

We included the ACEIs and ARBs listed in Table 1. In addition to straightforward comparisons of a single ACEI versus a single ARB, we also included “grouped” comparisons (e.g., a specific ARB versus “ACEIs” or unspecified “ARBs” versus unspecified “ACEIs”) and comparisons of an ACEI + drug X versus an ARB + drug X (e.g., losartan + hydrochlorothiazide [HCTZ] versus enalapril + HCTZ). We excluded comparisons of an ACEI + drug X versus an ARB + drug Y (e.g., enalapril + manidipine vs. irbesartan + HCTZ).

Table 1

Table 1

Characteristics and labeled indications of ACEIs and ARBs evaluated in this report

Studies with treatment protocols that permitted the addition of other antihypertensive medications during the trial if certain blood pressure targets were not met were included provided the cointervention protocols were the same in both groups.

Outcomes of Interest

We considered a wide range of outcomes pertaining to the long-term benefits and harms of ACEIs versus ARBs. These are listed above in the section on “Scope and Key Questions.” In somewhat greater detail, and in order of relative priority, these outcomes were:

  • Blood pressure control (we preferred seated trough blood pressure, where reported).
  • Mortality (all-cause, cardiovascular disease-specific, and cerebrovascular disease-specific).
  • Morbidity (especially major cardiovascular events [MI, stroke] and measures of quality of life).
  • Safety (focusing on serious adverse event rates, overall adverse event rates, and withdrawals due to adverse events).
  • Specific adverse events (including, but not limited to, cough and angioedema).
  • Persistence/adherence.
  • Rate of use of a single antihypertensive for blood pressure control.
  • Other intermediate outcomes:
    • Lipid levels (high-density lipoprotein [HDL], low-density lipoprotein [LDL], total cholesterol [TC], and triglyceride [TG]).
    • Rates of progression to type 2 diabetes.
    • Markers of carbohydrate metabolism/diabetes control (glycated hemoglobin [HbA1c], insulin or other diabetes medication dosage, fasting plasma glucose, or aggregated measures of serial glucose measurements).
    • Measures of LV mass/function (left ventricular mass index [LVMI] and ejection fraction [LVEF]).
    • Measures of kidney disease (creatinine/glomerular filtration rate [GFR], proteinuria).

The key questions ask about the comparative long-term benefits and harms of ACEIs versus ARBs for treating essential hypertension, but do not define precisely what is meant by “long-term.” We initially interpreted this to mean 6 months or longer, but decided after the abstract screening to reduce this to 12 weeks or longer. We made this decision for two reasons: (1) the distribution of length of followup was highly skewed toward shorter duration, so that a longer threshold would have excluded nearly all head-to-head studies of ACEIs and ARBs; (2) a strong differential benefit or harm detected in a short-duration study could be important to identify, especially if similar effects were suggested, perhaps less strongly, by longer-term studies.

Types of Studies

We included comparative clinical studies of any design, including randomized controlled trials (RCTs), nonrandomized controlled clinical trials, retrospective and prospective cohort studies, and case-control studies.

We excluded studies with fewer than 20 total patients in the ACEI and ARB treatment arms.

Data Extraction

We developed a data abstraction form/evidence table template for abstracting data from the included studies (Appendix D) and used the same form for all study designs and to capture data relevant to all three key questions. Abstractors worked in pairs: the first abstracted the data, and the second over-read the article and the accompanying abstraction to check for accuracy and completeness. The completed evidence table is provided in Appendix E.

We extracted the following data from included trials: geographical location; funding source; study design; interventions (including dose, duration, dose titration protocol [if any], and cointerventions [if any]); population characteristics (including age, sex, race/ethnicity, baseline blood pressure, concurrent medications, and comorbidities); recruitment setting; inclusion and exclusion criteria; numbers screened, eligible, enrolled, and lost to followup; and results for each outcome.

Quality Assessment

We used predefined criteria to assess the quality of individual controlled trials and prospective or retrospective observational (cohort) studies. To assess the quality of clinical trials and cohort studies, we adapted criteria developed by the U.S. Preventive Services Task Force (USPSTF) and the CRD.7,8

Individual studies were graded as “good,” “fair,” or “poor” in quality according to the following definitions:

A “good” study has the least bias and results are considered valid. A good study has a clear description of the population, setting, interventions, and comparison groups; uses a valid approach to allocate patients to alternative treatments; has a low dropout rate; and uses appropriate means to prevent bias, measure outcomes, and analyze and report results.

A “fair” study is susceptible to some bias, but probably not sufficient to invalidate the results. The study may be missing information, making it difficult to assess limitations and potential problems. As the fair-quality category is broad, studies with this rating vary in their strengths and weaknesses. The results of some fair-quality studies are possibly valid, while others are probably valid.

A “poor” rating indicates significant bias that may invalidate the results. These studies have serious errors in design, analysis, or reporting; have large amounts of missing information; or have discrepancies in reporting. The results of a poor-quality study are at least as likely to reflect flaws in the study design as to indicate true differences between the compared interventions.

If a study was rated as fair or poor, assessors were instructed to note important limitations on internal validity based on the USPSTF/CRD criteria, as adapted here:

  1. Initial assembly of comparable groups:
    • - For RCTs: Adequate randomization, including concealment and whether potential confounders were distributed equally among groups.
    • - For cohort studies: Consideration of potential confounders with either restriction or measurement for adjustment in the analysis; consideration of inception cohorts.
  2. Maintenance of comparable groups (includes attrition, crossovers, adherence, and contamination).
  3. Important differential loss to followup or overall high loss to followup.
  4. Measurements: Equal, reliable, and valid (includes masking of outcome assessment).
  5. Clear definition of interventions.
  6. All important outcomes considered.
  7. Analysis: Adjustment for potential confounders for cohort studies, or intention-to-treat analysis for RCTs.

Assessment of each study’s quality was made by a single rater and then evaluated by a second rater. Finally, quality assessments were reviewed across studies. Disagreements were resolved by consensus. Final quality assessments for individual studies are included in the evidence table (Appendix E).


We did not provide a global rating of applicability (such as “high” or “low”) because applicability may differ substantially based on the user of this report. However, applicability of research studies was assessed by noting the most important potential limitations in a study’s applicability from among the list described by Rothwell.9 These criteria, slightly adapted by the SRC, are reproduced in Appendix F. Assessors were instructed to list the most important (up to three) limitations affecting applicability, if any, based on this list.

Throughout this report, we highlight effectiveness studies conducted in primary care or office-based settings that use less stringent eligibility criteria, assess health outcomes, and have longer followup periods than most efficacy studies. The results of effectiveness studies are more applicable to the spectrum of patients that will use a drug, have a test, or undergo a procedure than results from highly selected populations in efficacy studies.

Rating the Body of Evidence

We assessed the strength of the body of evidence for each key question using the GRADE framework.10 In rating the strength of evidence we considered the number of studies, the size of the studies, strength of study design, and the quality of individual studies. In addition, as part of the GRADE framework, we assessed the consistency across studies of the same design, consistency across different study designs, the magnitude of effect, and applicability. Finally, if applicable, we considered the likelihood of publication bias and (especially for observational studies) the potential influence of plausible confounders. We commented specifically when it was difficult or impossible to assess certain of these dimensions. The overall strength of a given body of evidence was rated qualitatively using the following four-level scale:

High – Further research is very unlikely to change our confidence in the estimate of effect.

Moderate – Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.

Low – Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.

Very low – Any estimate of effect is very uncertain.

Data Synthesis

Given that many studies did not have the statistical power to determine equivalence for the outcomes relevant to this review (which were often not the primary outcomes evaluated by study investigators), we considered pooling in an attempt to overcome the type II error.

In evaluating groups of studies reporting the same or similar outcomes for potential data synthesis, we primarily considered clinical homogeneity. In this assessment, we tended to be inclusive of individual studies unless their populations were clearly dissimilar (e.g., when considering renal outcomes we chose to exclude from pooled analysis studies of patients with renal failure). We considered groups of studies to be suitable candidates for a quantitative synthesis when we were able to identify at least four clinically relatively similar studies that assessed the same outcome (e.g., when considering effects on lipids, we chose not to pool, as the group included different lipid measures.) While not proof of the validity of this approach, it is notable that there were no situations in which pooled estimates of relative efficacy regarding a particular outcome were contrary to the global impression of the reviewers.

When we calculated summary effect sizes, we stratified these by study design, separating RCTs from observational studies. We used Comprehensive Meta-analysis Version 2 (Borenstein M, Hedges L, Higgins J, Rothstein H. Comprehensive Meta-analysis Version 2, Biostat, Englewood NJ [2005]) to test for heterogeneity and to pool (while recognizing that the ability of statistical methods to detect heterogeneity is limited, particularly when the number of studies is small). In the presence of statistical heterogeneity, we evaluated likely explanatory clinical and methodological study characteristics to determine whether they could explain the heterogeneity observed. If, after this further scrutiny, studies appeared to be clinically and methodologically similar, we performed pooling even in the presence of statistical heterogeneity. Pooled estimates combining both study designs were also calculated in order to estimate confidence limits for an overall effect.

When pooling was performed, we used the random-effects model for the primary analysis; in addition, we present summary estimates derived using the fixed-effect model as a sensitivity analysis. Furthermore, for count outcomes, we calculated a summary of the relative effect (odds ratio) and absolute effect (risk difference). When the results from statistical testing were similar, we present the outcome that we judged to be most clinically relevant. We also present the number-needed-to-treat (NNT) when effects are statistically significant. In calculating the NNT, we used either the inverse of the risk difference (when risk difference is presented as the pooling measure), or the inverse of an estimated difference based on an average control event rate and a relative measure of effect (when odds ratio is used as the measure for pooling).

Given the dearth of studies of the same ACEI versus ARB comparison, and the presumed general similarity of each class, when studies were combined, pooling was performed without regard to the specific drug within the ACEI or ARB class. Also, we did not specifically consider study design in deciding whether to pool, but when we did pool, we stratified the analysis to examine differences between observational studies and randomized controlled trials, as described above.

In deciding whether to pool indirect comparison studies, we adopted a similar approach. However, given the more tenuous nature of indirect comparisons, we used specific quantitative criteria for pooling (see Appendix B).