Proton pump inhibitors decrease secretion of gastric acid. They act by blocking the last enzyme in the system that actively transports acid from gastric parietal cells into the gastrointestinal lumen, hydrogen–potassium adenosine triphosphatase, also known as the proton pump. Omeprazole, the first drug in this class, was introduced in 1989. Since then, 4 other proton pump inhibitors have been introduced: lansoprazole (1995), rabeprazole (1999), pantoprazole (2000), and esomeprazole (2001). In 2003 omeprazole became available over-the-counter in the United States. The formulation for the over-the-counter product is omeprazole magnesium, available in other countries as omeprazole multiple unit pellet system. Omeprazole is also available in combination with sodium bicarbonate (Zegerid). Table 1 provides an accounting of indications of different proton pump inhibitors.

Table 1

Table 1

Proton pump inhibitors and their US Food and Drug Administration-approved indications

Proton pump inhibitors are mainly used to treat symptoms of gastroesophageal reflux disease and gastritis. Often, they are used only after therapy with histamine-2 (H2) receptor antagonists, commonly called H2 blockers, has been unsuccessful for symptoms of reflux. Proton pump inhibitors also are used to treat peptic ulcers (duodenal and gastric) and drug-induced ulcers, such as those associated with nonsteroidal anti-inflammatory drugs; the bacterium that causes ulcers, Helicobacter pylori, is eradicated by treatment with a proton pump inhibitor and antibiotics. Proton pump inhibitors also are used to promote healing of erosive esophagitis. Esophagitis can lead to scarring and narrowing of the esophagus (stricture) or to Barrett esophagus, which is a risk factor for esophageal cancer.

Evidence-based reviews usually emphasize health outcomes—events or conditions that patients can feel or experience. Heartburn, waking at night, acid regurgitation, and quality of life are health outcomes. But severity of symptoms is not a reliable indicator of esophagitis; patients without esophagitis can experience severe heartburn, and some patients who have esophagitis have no symptoms. Consequently, esophagitis is diagnosed by direct visualization via endoscopy. Esophagitis appears as a tear, break, or ulceration in the lining of the esophagus. When esophagitis has healed, the ulceration has been completely reepithelialized, as viewed during endoscopy. This endoscopically verified healing often is used as an intermediate outcome measure for esophagitis.

For ulcer disease, quick relief of symptoms is an important health outcome. But in the long run, the most important determinant of functional status and quality of life is prevention of recurrence of ulcers and their complications (bleeding, hospitalization, and death). Historically, studies of proton pump inhibitors for ulcer disease have been too short to address these outcomes directly. So instead, they report intermediate outcome measures. In the past the most common intermediate outcome measure was endoscopic healing, meaning that on endoscopy after treatment the ulcer is gone. But because ulcer disease tends to recur even when the initial ulcer has completely healed, endoscopic healing, while important as a predictor of relapse, is an imperfect indicator of long-term morbidity from ulcer disease. Since the discovery that Helicobacter pylori causes many peptic ulcers, eradication of Helicobacter pylori has emerged as a more important indicator of the long-term outcome of treatment. Long-term studies have shown that eradication reduces the risk of ulcers and ulcer complications for several years.

Purpose and Limitations of Systematic Reviews

Systematic reviews, also called evidence reviews, are the foundation of evidence-based practice. They focus on the strength and limits of evidence from studies about the effectiveness of a clinical intervention. Systematic reviews begin with careful formulation of research questions. The goal is to select questions that are important to patients and clinicians then to examine how well the scientific literature answers those questions. Terms commonly used in systematic reviews, such as statistical terms, are provided in Appendix A and are defined as they apply to reports produced by the Drug Effectiveness Review Project.

Systematic reviews emphasize the patient’s perspective in the choice of outcome measures used to answer research questions. Studies that measure health outcomes (events or conditions that the patient can feel, such as fractures, functional status, and quality of life) are preferred over studies of intermediate outcomes (such as change in bone density). Reviews also emphasize measures that are easily interpreted in a clinical context. Specifically, measures of absolute risk or the probability of disease are preferred to measures such as relative risk. The difference in absolute risk between interventions depends on the number of events in each group, such that the difference (absolute risk reduction) is smaller when there are fewer events. In contrast, the difference in relative risk is fairly constant between groups with different baseline risk for the event, such that the difference (relative risk reduction) is similar across these groups. Relative risk reduction is often more impressive than absolute risk reduction. Another useful measure is the number needed to treat (or harm). The number needed to treat is the number of patients who would need be treated with an intervention for 1 additional patient to benefit (experience a positive outcome or avoid a negative outcome). The absolute risk reduction is used to calculate the number needed to treat.

Systematic reviews weigh the quality of the evidence, allowing a greater contribution from studies that meet high methodological standards and, thereby, reducing the likelihood of biased results. In general, for questions about the relative benefit of a drug, the results of well-executed randomized controlled trials are considered better evidence than results of cohort, case-control, and cross-sectional studies. In turn, these studies provide better evidence than uncontrolled trials and case series. For questions about tolerability and harms, observational study designs may provide important information that is not available from controlled trials. Within the hierarchy of observational studies, well-conducted cohort designs are preferred for assessing a common outcome. Case-control studies are preferred only when the outcome measure is rare and the study is well conducted.

Systematic reviews pay particular attention to whether results of efficacy studies can be generalized to broader applications. Efficacy studies provide the best information about how a drug performs in a controlled setting. These studies attempt to tightly control potential confounding factors and bias; however, for this reason the results of efficacy studies may not be applicable to many, and sometimes to most, patients seen in everyday practice. Most efficacy studies use strict eligibility criteria that may exclude patients based on their age, sex, adherence to treatment, or severity of illness. For many drug classes, including the antipsychotics, unstable or severely impaired patients are often excluded from trials. In addition, efficacy studies frequently exclude patients who have comorbiddisease, meaning disease other than the one under study. Efficacy studies may also use dosing regimens and follow-up protocols that are impractical in typical practice settings. These studies often restrict options that are of value in actual practice, such as combination therapies and switching to other drugs. Efficacy studies also often examine the short-term effects of drugs that in practice are used for much longer periods. Finally, efficacy studies tend to assess effects by using objective measures that do not capture all of the benefits and harms of a drug or do not reflect the outcomes that are most important to patients and their families.

Systematic reviews highlight studies that reflect actual clinical effectiveness in unselected patients and community practice settings. Effectiveness studies conducted in primary care or office-based settings use less stringent eligibility criteria, more often assess health outcomes, and have longer follow-up periods than most efficacy studies. The results of effectiveness studies are more applicable to the “average” patient than results from the highly selected populations in efficacy studies. Examples of effectiveness outcomes include quality of life, frequency or duration of hospitalizations, social function, and the ability to work. These outcomes are more important to patients, family, and care providers than surrogate or intermediate measures, such as scores based on psychometric scales.

Efficacy and effectiveness studies overlap. For example, a study might use very narrow inclusion criteria like an efficacy study, but, like an effectiveness study, might examine flexible dosing regimens, have a long follow-up period, and measure quality of life and functional outcomes. For this report we sought evidence about outcomes that are important to patients and would normally be considered appropriate for an effectiveness study. However, many of the studies that reported these outcomes were short-term and used strict inclusion criteria to select eligible patients. For these reasons, it was neither possible nor desirable to exclude evidence based on these characteristics. Labeling a study as either an efficacy or an effectiveness study, although convenient, is of limited value; it is more useful to consider whether the patient population, interventions, time frame, and outcomes are relevant to one’s practice or to a particular patient.

Studies anywhere on the continuum from efficacy to effectiveness can be useful in comparing the clinical value of different drugs. Effectiveness studies are more applicable to practice, but efficacy studies are a useful scientific standard for determining whether characteristics of different drugs are related to their effects on disease. Systematic reviews thoroughly cover the efficacy data in order to ensure that decision makers can assess the scope, quality, and relevance of the available data. This thoroughness is not intended to obscure the fact that efficacy data, no matter how large the quantity, may have limited applicability to practice. Clinicians can judge the relevance of studies’ results to their practice and should note where there are gaps in the available scientific information.

Unfortunately, for many drugs there exist few or no effectiveness studies and many efficacy studies. Yet clinicians must decide on treatment for patients who would not have been included in controlled trials and for whom the effectiveness and tolerability of the different drugs are uncertain. Systematic reviews indicate whether or not there exists evidence that drugs differ in their effects in various subgroups of patients, but they do not attempt to set a standard for how results of controlled trials should be applied to patients who would not have been eligible for them. With or without an evidence report, these decisions must be informed by clinical judgment.

In the context of development of recommendations for clinical practice, systematic reviews are useful because they define the strengths and limits of the evidence, clarifying whether assertions about the value of an intervention are based on strong evidence from clinical studies. By themselves, they do not say what to do. Judgment, reasoning, and applying one’s values under conditions of uncertainty must also play a role in decision making. Users of an evidence report must also keep in mind that not proven does not mean proven not; that is, if the evidence supporting an assertion is insufficient, it does not mean the assertion is untrue. The quality of the evidence on effectiveness is a key component, but not the only component, in making decisions about clinical policy. Additional criteria include acceptability to physicians and patients, potential for unrecognized harm, applicability of the evidence to practice, and consideration of equity and justice.

Scope and Key Questions

The purpose of this review is to compare the benefits and harms of different proton pump inhibitors. The Oregon Evidence-based Practice Center wrote preliminary key questions, identifying the populations, interventions, and outcomes of interest, and, based on these, the eligibility criteria for studies. These were reviewed and revised by representatives of organizations participating in the Drug Effectiveness Review Project. The participating organizations of the Drug Effectiveness Review Project are responsible for ensuring that the scope of the review reflects the populations, drugs, and outcome measures of interest to both clinicians and patients. The participating organizations approved the following key questions to guide Update 5 of this review:

  1. What is the comparative effectiveness of different proton pump inhibitors in patients with symptoms of gastroesophageal reflux disease?
  2. What is the comparative effectiveness of different proton pump inhibitors in treating peptic ulcer and nonsteroidal anti-inflammatory drug-induced ulcer?
  3. What is the comparative effectiveness of different proton pump inhibitors in preventing ulcer in patients taking a nonsteroidal anti-inflammatory drug?
  4. What is the comparative effectiveness of different proton pump inhibitors in eradicating Helicobacter pylori infection?
  5. Is there evidence that a particular treatment strategy is more effective or safer than another (for example, stepping down to a lower dose, treatment as needed compared with daily treatment, high dose compared with standard dose, or switching to an H2 antagonist) for treatment longer than 8 weeks in patients with gastroesophageal reflux disease or ulcer?
  6. What are the comparative safety and adverse events of different proton pump inhibitors in patients being treated for symptoms of gastroesophageal reflux disease, peptic ulcer, and nonsteroidal anti-inflammatory drug-induced ulcer?
  7. Are there subgroups of patients based on demographics, other medications, or comorbidities (including nasogastric tubes and inability to swallow solid oral medication) for which a particular proton pump inhibitor or preparation is more effective or associated with fewer adverse effects?