Appendix DDetails of the NSAID/COX-2 inhibitor health economic model

Publication Details


The NSAID/Cox-2 model investigates what the cost-effective treatment is for a person with osteoarthritis (OA) who is to be prescribed an oral NSAID or COX-2 inhibitor. The cost effectiveness of adding a gastroprotective agent (GPA) is also considered. This paper gives a detailed overview of the comparators investigated in the model, the relevant patient populations, the parameters, and the structure of the model itself. The results of the model are also presented and discussed.

Comparator treatments included in the model

This analysis compares oral analgesic and anti-inflammatory drugs for which there are sufficient data to allow reliable comparisons. The drugs are compared in terms of gastrointestinal (GI) and cardiovascular (CV) adverse events as well as effectiveness. The doses of NSAIDs and COX-2 inhibitors used in the key trials were deemed to be unusually high, so we adjusted the observed adverse event rates for lower doses more commonly used in practice. The different comparators are shown in Box 1. Following withdrawal of the license for lumiracoxib, this product has been removed from the model.

Box Icon

Box 1

Treatment regimens.

It is assumed that treatment with standard NSAIDs or COX-2 inhibitors is stopped and patients switch to paracetamol after any serious GI or CV event including symptomatic ulcer, complicated GI bleeds, myocardial infarction (MI), stroke or heart failure. After serious GI events, patients are also assumed to continue to take a GPA for life. With minor GI symptoms (dyspepsia), patients are assumed to take GPA for a month and to continue with previous treatment. The model estimates results over a fixed treatment period, after which any patients who have not experienced serious adverse events are assumed to switch to paracetamol.

Sources of adverse event data

There is a massive amount of adverse event data for standard NSAIDs and COX-2 inhibitors. Observational as well as clinical trial data, different trial designs, patient populations and outcome definitions make combining these data extremely difficult. Instead it was decided that for the base-case analysis, data from the largest recent RCTs for the key drugs would be used (CLASS (Medical Officer Review 2000; Silverstein and Faich 2000), MEDAL (Laine et al. 2006; Laine et al. 2007; PhVWP assessment report 2006b) and TARGET (Farkouh and Kirschner 2004; Schnitzer et al. 2004)). The Guideline Development Group were concerned that this meant discarding observational data, in which patient numbers are often far larger than in RCTs. To take this into account a secondary analysis using data from a selection of the most relevant observational studies was conducted. As is often the case with observational data, concerns remain about possible bias in the results.

A number of problems present themselves when using specific statistics from CLASS, MEDAL and TARGET. These are addressed in turn below, and the interventions used in these trials are shown in Table 1. Most important is that the studies included in the base-case analysis necessarily mean that only diclofenac, ibuprofen, naproxen, celecoxib, and etoricoxib can be compared, as shown in Box 1. Data shown in Figure 1 shows that these are largely the most prescribed NSAIDs, although meloxciam and etodolac are also prescribed fairly often. Other NSAIDs (aceclofenac, acemetacin, azapropazone, dexibuprofen, dexketoprofen, diflunisal, fenbufen, fenoprofen, flurbiprofen, indometacin, ketoprofen, mefenamic acid, nabumetone, piroxicam, sulindac, tenoxicam, tiaprofenic acid) are prescribed rarely.

Table 1. Treatments used in CLASS, MEDAL and TARGET.

Table 1

Treatments used in CLASS, MEDAL and TARGET.

Figure 1. Number of prescriptions for NSAIDs, England 2006 (PPA 2007).

Figure 1

Number of prescriptions for NSAIDs, England 2006 (PPA 2007).

In 2006 COX-2 inhibitors were prescribed substantially less than standard NSAIDs, with celecoxib and etoricoxib making up the majority of COX-2 inhibitor prescriptions.

Given these data, it would have been ideal to include meloxicam and etodolac in the model, if not all NSAIDs that are currently prescribed. However, this is not possible due to a lack of good quality data showing the risk of the key adverse events included in the model. For the majority of the other drugs the BNF states that the drug is as effective as either naproxen, diclofenac, or ibuprofen, but with more side effects, (Anon 2007a) suggesting it is reasonable to exclude these drugs. However it would have been preferable to include those drugs which are prescribed fairly regularly, and which are not obviously worse with regards to side-effects compared to the included NSAIDs. Specifically, this means meloxicam and etodolac.

Previous NICE guidance analysed the available evidence for meloxicam and etodolac for GI adverse events (Nice Appraisal Team 2000). The analysis found that both drugs were associated with a decrease in GI adverse events (including serious GI adverse events), to a similar extent as celecoxib. The current cost per 3 month period of treatment is £24.42 for etodolac 600 mg per day, and £12.66 for meloxicam 7.5 mg per day. This is slightly more than the standard NSAIDs (shown in Table 8), but substantially less than the COX-2 inhibitors. This data suggests that meloxicam 7.5 mg and etodolac 600 mg should not be discounted from use, but the lack of CV data for the drugs means that they can not be included in the model and so are not named in recommendations based on the model findings.

Table 8. Drug costs (PPA, Drug Tariff, November 2007).

Table 8

Drug costs (PPA, Drug Tariff, November 2007).

Also important to note is that topical NSAIDs and opioids are not included in the model, even though they may be regarded as substitutes to oral NSAIDs and COX-2 inhibitors. Data were too sparse to include these interventions in this model, and they are each dealt with in other sections of this guideline.


Dose is a key issue within the model. The doses given in key clinical trials are generally high for NSAIDs (but within licensed levels), while they are far above licensed levels for COX-2 inhibitors. Modelling such high doses has little meaning for clinical practice. Hence standard doses based on ADQs (explained below in Box 2) are primarily considered. The higher dose of NSAIDs found in clinical trials were also tested in sensitivity analyses, as they are sometimes given in practice, and indeed some Scottish data suggests that for diclofenac 150 mg rather than 100 mg might be the more relevant dose to consider (see Box 3) (University of Dundee 2004).

Box Icon

Box 2

ADQs. Average Daily Quantities (ADQ) is a measure of prescribing volume based upon prescribing behaviour in England. It represents the assumed average maintenance dose per day for a drug used for its main indication in adults. The ADQ is not a recommended (more...)

Box Icon

Box 3

MEMO prescribing data. Typically prescribing data does not allow the average dose per day prescribed or the average dose taken by the patient to be calculated. However the Scottish MEMO data allow the daily doses taken by patients while on prescription (more...)

Adverse events associated with NSAID and COX-2 inhibitor use are believed to be dose-related. Hence, adverse event rates found in clinical trials must be adjusted for the lower doses assumed in the model. Accurate data to suggest a precise estimate for this relationship is lacking, and so an assumption similar to one previously made in the literature is used in the model. That is, a relative risk such that if dose is reduced by 50%, adverse events reduce by 25% (Bloor and Maynard 1996). Intense sensitivity analysis was undertaken on this important assumption (see below).

Patient populations

Each of the included studies (MEDAL, TARGET, CLASS) present some results for specific sections of the patient population (eg non-aspirin users), but for none of the individual outcomes considered in the model do all the studies give the data required for these specific sections of the population. Therefore total study populations have been used. Important differences in the study populations are shown in Table 2 below:

Table 2. Study populations.

Table 2

Study populations.

Despite these differences these studies have been used because they are the largest RCTs which consider GI and CV adverse events, and which include an NSAID comparator. Although MEDAL and CLASS included patients with RA, as well as OA, the GDG did not consider that this would bias the results. The MEDAL programme included more patients taking low-dose aspirin than TARGET or CLASS. This might be expected to reduce the absolute rates of CV adverse events observed in this trial, but to increase the observed rates of GI adverse events. Conversely, concurrent use of PPI by MEDAL patients might be expected to reduce observed rates of GI events. The net effect of these differences on baseline rates of GI and CV events is unclear. However, the proportions of patients taking aspirin or PPI were very relative risks obtained from MEDAL should still be comparable with those from TARGET and CLASS, despite the differences in the trial populations.

Subgroup analysis is undertaken for two age groups because good data exist which show older people to be at higher risk of GI adverse events. In line with the previous NICE technology appraisal the age groups considered are people aged under 65 and people aged 65 and over (Nice Appraisal Team 2000). Based on the literature, it is assumed that patients aged 65 or over have 2.96 times the probability of developing a symptomatic or complicated GI event (but not GI symptoms/dyspepsia) (Hippisley-Cox et al. 2005). Essentially, the analysis for the older age group is a proxy for all patients with an increased GI adverse event risk. Therefore the results of this analysis should be considered for any patient with any raised GI risk factor.

Within the model a patient becomes at high risk once they are aged 65 or over, or if they experience a serious GI event. Evidence in the literature suggests that patients who have had dyspepsia, symptomatic ulcer, or complicated ulcer are at higher risk of future complicated ulcers (Garcia Rodriguez and HernandezDiaz 2001). In the model it is assumed that these factors can be used to calculate higher risks of complicated ulcers following a symptomatic ulcer or a complicated ulcer. It is also assumed that the same factors apply for the future risk of symptomatic ulcers, following a symptomatic or complicated ulcer. Importantly, it is assumed that GI symptoms/dyspepsia does not increase the risk of future events because expert opinion suggests that this is a symptom rather than an event. We also assume that the risk of GI symptoms/dyspepsia is not affected by past GI events.

Table 3. Relative risk of serious GI events depending on history of serious GI events.

Table 3

Relative risk of serious GI events depending on history of serious GI events.

The model assumes that the risk of cardiovascular events increases with age based on recent UK incidence data (Hippisley-Cox et al. 2007) (see Table 4).

Table 4. Incidence of cardiovascular disease by age.

Table 4

Incidence of cardiovascular disease by age.

The model also includes an increased risk of future CV events immediately after experiencing an initial event, and in post CV event states (Anon 2006). These assumptions are used in the model.

Table 5. Incidence of CV events depending on history.

Table 5

Incidence of CV events depending on history.

In sensitivity analysis the results are also considered for patients with an increased risk of CV events.

Model structure

The model is in the form of a Markov model with a 3 month cycle length. The probability of moving between states is based on within-state decision trees which are informed by clinical evidence and expert opinion. The health states that make up the Markov model represent a range of possible adverse events.

The model seeks to compare the cost effectiveness of individual NSAIDs and COX-2 inhibitors for which sufficient adverse event data exists. Patients do not move between treatments in the model (apart from the addition of a PPI in some circumstances, and switching to paracetamol following serious adverse events or at the end of the treatment period). This is a simplifying assumption which keeps the model manageable. Therefore the model considers first-line NSAID or COX-2 inhibitor treatment.

The model can be split into two key components:

  • Markov model health states
  • Within state decision trees to determine type of adverse event (if any).

Markov model

Illustrated in Figure 2 is a very simplified version of the model. The diagram shows the Markov model drawn for one treatment option (Diclofenac). The structure of the model is the same for all treatment options. The possible health states considered in the model are as follows:

Figure 2. Markov model for one treatment option (Diclofenac 150 mg).

Figure 2

Markov model for one treatment option (Diclofenac 150 mg).

  • no complications
  • GI symptoms/dyspepsia
  • symptomatic ulcer
  • post-symptomatic ulcer
  • complicated GI bleed
  • post-complicated GI bleed
  • myocardial infarction (MI)
  • post MI
  • stroke
  • post Stroke
  • heart failure (HF)
  • post HF
  • post treatment (given no serious adverse events during the treatment period)

The diagram illustrates (with red arrows) that a patient in the ‘No complications’ state can move to any of the initial adverse event states, or if the treatment period is set such that treatment will cease in the following period, the patient can move to the ‘Post treatment’ state. The same is true for the ‘GI symptoms/dyspepsia’ state (illustrated with blue arrows), since it is not classed as a serious adverse event, and carries no heightened risk of a further event for the patient.

The other adverse event states are treated differently. ‘Symptomatic ulcer’, ‘Complicated GI bleed’, ‘MI’, ‘Stroke’ and ‘HF’ are all considered serious adverse events which increase the short and long term probability of experiencing future adverse events. As such, each has an initial event state acting as a tunnel state leading to a post event state. This allows a short-term as well as a long-term impact on utility, costs and risks to be considered. In both the short term and the long term states the patient is treated with paracetamol (with a PPI in the ‘Symptomatic ulcer’ and ‘Complicated GI bleed’ states). Once in the post event state the patient remains there until death (illustrated for MI with pink arrows in the diagram), with an average utility score and cost allocated to the state based on the increased risk of future adverse events.

A patient can only have one GI or cardiovascular AE in each 3 month cycle. This may not be entirely realistic but is necessary to make the model workable, and is unlikely to change the results significantly.


The model has a lifetime duration, to allow additional costs to be accrued over a long time period following a serious adverse event. The duration of treatment is changeable, and can be adjusted between 3 months (one cycle length) and lifetime. This allows the cost effectiveness of giving the drugs for different time periods to be calculated. This is relevant as patients take the drugs for different amounts of time depending on what type of OA they have. They may also stop taking drugs after surgery. Surgery itself does not need to be included in the model since the drugs are not disease modifying and so will not effect when surgery occurs.

In the model, the treatment duration is designated and then when this period is up patients who have not had a serious AE (ie those who have had no complication or who have only experienced GI symptoms/dyspepsia) move to the ‘Post treatment – no AE during treatment’ state. All other patients are either already be dead, or already only taking paracetamol (which we assume all patients continue to take for the remainder of their lifetime).

Depending on the treatment duration and the age of the patient cohort, the model then calculates the average costs and utility scores for each remaining cycle in the model (the number of future cycles will be based on life expectancy). This allows for future AEs that may occur, based on assumptions regarding the probabilities of those AEs occurring.


Once one serious adverse event has occurred, another can not explicitly occur in the model. However the model allows for the fact that future adverse events may occur, and for the fact that the probability of these events is likely to be higher in patients who have already experienced these events. For example once a patient has arrived in the GI bleed state a decision tree taking into account possible future adverse events is used to estimate mean costs and utility per cycle.

Within state decision trees

The within state decision trees determine what type of AE is incurred by a patient (if any), so that costs and consequences can be accurately calculated. The probabilities within the trees differ depending on what treatment the patient is receiving. This reflects the clinical evidence from the literature. Probabilities also differ in post event states where there is often a heightened probability of a repeat event. The tree used here (shown in Figure 3) is similar but not identical to a tree built by the CCOHTA when assessing the cost effectiveness of celecoxib and rofecoxib (Maetzel et al. 2002).

Figure 3. Within state decision tree.

Figure 3

Within state decision tree.

It is the within state decision tree which dictates where the patient will go in the following cycle. For example, whatever the initial health state of the patient, the decision tree is set up with the appropriate probabilities taking into account treatment, age, and the current health state, and the patient moves on according to these probabilities.

Model inputs


Quality Adjusted Life Year (QALY) scores are used as the key outcome of the model. However, there are few trials reporting utility outcomes, which would be needed to calculate QALY scores. There are two key areas for which utility data is important in the model. These are:

  • efficacy of the different treatments
  • comparative utility scores for the different adverse events

Evidence is very mixed as to whether COX-2 inhibitors offer better efficacy than standard NSAIDs. Indeed their major benefit is argued to be the reduced GI adverse event rates. However, there is evidence that both standard NSAIDs and COX-2 inhibitors offer better efficacy than paracetamol and placebo. We conducted a meta-analysis of total WOMAC scores from the evidence used in the systematic review undertaken for the osteoarthritis guideline, for the drugs included in our model. Comparisons between individual drugs and classes of drugs were made. The results are shown below, in Box 4.

Box Icon

Box 4

Meta analysis results: Total WOMAC score.

The results of the meta-analysis suggest that there is no significant difference in efficacy between COX-2 inhibitors and standard NSAIDs. There is significant heterogeneity in the meta-analysis but a random effects model was used and we conclude that the evidence is mixed and doesn’t suggest one class of drugs being more efficacious than the other. For this reason the final comparison showing standard NSAIDs and COX-2 inhibitors pooled versus placebo is used in the model. The first comparison shown in Box 4 – paracetamol versus placebo – is used to emphasise that paracetamol is also associated with increased efficacy compared to placebo, but to a lesser extent than standard NSAIDs and Cox-2s.

To convert these WOMAC efficacy scores into utility gains associated with the different drugs the Transfer To Utility (TTU) technique was used (Barton et al. 2007).

Utility = 0.7526 + 0.0004(WOMAC)-0.0001(WOMAC)2

This method is not perfect, but allows more data to be used and is a reasonable way of estimating utility scores where little direct utility data exists. The TTU method is discussed more in Appendix C of this guideline. The utility scores for placebo, paracetamol, and NSAIDs/COX-2 inhibitors, estimated using the meta-analysis and the TTU technique are shown in Table 6, below.

Table 6. Estimated utility score for people with OA.

Table 6

Estimated utility score for people with OA.

Two key points relating to efficacy estimates used in the model remain. Firstly, assuming equal efficacy between standard NSAIDs and COX-2 inhibitors may result in bias if they in fact are not equally efficacious. However without more evidence this assumption is reasonable. Also, the key comparison in this model is between standard NSAIDs and COX-2 inhibitors. Therefore, even if the TTU technique is not accepted, this should not affect the results of the model, as standard NSAIDs and COX-2 inhibitors are assumed to be the same here. The only comparisons that would be affected are standard NSAIDs or COX-2 inhibitors versus paracetamol or no treatment.

Secondly, we have assumed that different doses of the same drug are equally efficacious. For example diclofenac 100 mg is as efficacious as diclofenac 150 mg, and results in the same utility score (not taking into account adverse events). This is due to no evidence regarding the differential effects of different doses, and a lack of resources to investigate this further. It is unclear that there is an incremental effect regarding doses of the same NSAID, rather it may be more a case of a responder versus a non-responder. However, if higher doses are in fact more efficacious, this assumption will bias against these higher doses. This is discussed more in the sensitivity analysis section of this appendix.

Comparative utility scores for the different adverse events are important in the model because the adverse events associated with the different drugs are the key drivers of health effects as well as costs. However, data for the utility scores required were sparse, largely because of the time periods considered. Often utility scores are reported for adverse events without being specific about the time periods the utility scores relate to. This is important because a utility score for a period after a MI will be very different one year after the event compared to one or two months after the event. Because of the structure of our model, we required utility scores for the 3 months immediately after the event, as well as for longer term, post 3 months time periods.

CCOHTA recently reported a survey to extract 3 month utility scores for experiencing dyspepsia, confirmed ulcer, and complicated GI bleed (medical and surgical) (Maetzel et al. 2002). These scores were used for our ‘GI symptoms/dyspepsia’, ‘Symptomatic ulcer’ and ‘Complicated GI bleed’ states. The only short term specific utility data found for CV adverse events was for stroke (Pickard et al. 2002). More data was found for longer term utility scores for MI, and stroke (Anon 2006), and these were used to adjust the short term stroke utility score to calculate 3 month scores for MI. Ideally utility scores for the different states would all have been obtained form the same source, however this did not prove possible. However, the most important issues are that the utility scores for the different events appear correct in relation to one another, and that the base short term utility values used (GI events and stroke) appear well obtained. This is the case (shown in Table 7).

Table 7. Adverse event utility weights.

Table 7

Adverse event utility weights.

Importantly, the guideline development group decided that Heart Failure events that occur due to NSAID or COX-2 inhibitors use are likely to be short term and relatively unserious. Therefore the 3 month utility used for heart failure is that found in the Harvard CE Registry (Anon 2001), and after 3 months the patient is assumed to revert to the OA-only utility score.

Utility scores in the years post treatment reflect average scores based on a decision tree of possible adverse events occurring in the future.

The utility weights for OA symptoms and adverse events (Table 6 and Table 7) are multiplied by age-specific utility scores for the general UK population taken from the Health Survey for England (Department of Health 1998).


These are taken from the literature and national unit costs (Anon 2006; Brown et al. 2006; Curtis and Netten 2006; Department of Health 2006). Costs for post-treatment states are based on the average of events that may occur in the future. Costs included are drug costs, doctor consultation costs, procedure costs etc, based on an NHS perspective.

Drug costs (shown in Table 8) were calculated from the most recently available Drug Tariff (November 2007,

For GI adverse events decision trees were used to estimate average costs for each event, based on assumptions made in the 2006 HTA paper on gastroprotection (Brown et al. 2006). Costs of each branch in the decision trees were calculated using HRG codes and average length of stay as given by Department of Health reference cost data (Department of Health 2006). GP contacts and outpatient visits were assumed and included, again based on data from the 2006 HTA paper on gastroprotection (Brown et al. 2006).

CV adverse events costs are based on HRG codes and average length of stay as given by DH reference cost data, as well as post-event costs used in the NICE Hypertension update guideline.

Post event costs were calculated using the decision trees for post event probabilities for GI events, and using post event costs taken from the NICE Hypertension update guideline for CV events. This assumes no follow-up management for GI events, other than continued prescribing of a PPI for patients after a symptomatic or complicated ulcer, whereas follow-up management is assumed for CV events, resulting in much higher post event costs.

Costs for the individual states are considered in more detail in Box 5.

Box Icon

Box 5

Adverse event costs. The proportions of patients following the different treatment paths after experiencing an adverse event are based on Brown et al (Brown et al. 2006). All patients with a GI adverse event are assumed to have a helicobacter test. No (more...)

The assumed average costs of each adverse event calculated are shown in Table 9.

Table 9. Adverse event costs.

Table 9

Adverse event costs.

Adverse events

Adverse events are of key importance in the model. Since standard NSAIDs and COX-2 inhibitors are assumed to be the same in terms of efficacy, the only areas in which they differ are drug costs and adverse event rates. Much of the existing economic modelling literature that compares standard NSAIDs and COX-2 inhibitors only includes GI adverse events, because the risk of CV adverse events has only been highlighted in recent times. Some studies include MI but no other CV events, whereas the data suggests that stroke and heart failure are also important events which seem to be affected by standard NSAID and COX-2 inhibitors use. Good data exists for a number of drugs relating to these adverse events, and as such these should be included in order to carry out a more complete economic analysis.

Despite this, some adverse events which are thought to be influenced by standard NSAID and COX-2 inhibitor use are not included in this model. For example, hypertension and oedema have been observed in clinical trials, however the resource implications of these are substantially less than other CV events, and indeed these conditions would be expected to lead to a more serious CV event which are included in the model.

Given current data, we have included all the adverse events that we believed were important and which sufficient data was available for. These are:

  • GI symptoms/dyspepsia
  • Symptomatic ulcer
  • Complicated GI event (eg perforation, complicated ulcer or bleed)
  • MI
  • Stroke
  • Heart failure

As discussed previously, adverse event data is taken from CLASS, MEDAL and TARGET. TARGET data is taken from the published papers based on this trial (Farkouh and Kirschner 2004; Schnitzer et al. 2004). Additional data on symptomatic ulcers was found on the MHRA website (Novartis 2006). MEDAL data is taken from the published papers as well as the September 2006 MHRA assessment report of EDGE I, EDGE II and MEDAL (Laine et al. 2006; Laine et al. 2007; PhVWP ASSESSMENT REPORT 2006b). CLASS data is taken from the June 2000 FDA Medical Officer Review of Celebrex (celecoxib) (Medical Officer Review 2000), rather than the 6-month data presented in Silverstein et al 2000 (Silverstein and Faich 2000).

Note that for etoricoxib, data on the proportion of patients who discontinued treatment due to GI symptoms is used to calculate the relative risk of GI symptoms/dyspepsia for etoricoxib compared to diclofenac. This is because as yet data is not available for the actual number of patients who experienced GI symptoms in MEDAL.

A number of important issues related to adverse events are discussed in turn below:

Dose: adverse event relationship

As discussed previously, in order to make the model useful in the real world it was necessary to model realistic doses. In CLASS celecoxib is given at 4 times its recommended dose for OA (800 mg vs 200 mg). Most patients in the MEDAL programme took the recommended dose of etoricoxib (60 mg per day), but some patients took a higher dose of 90 mg per day (mean dose 78 mg), and the results for the outcomes required for the model were not split by dose. The doses of standard NSAIDs are also high in these trials, and so some adjustment to the results for these drugs was also made to bring them in line with ADQs.

This assumption is clearly extremely important, as the adverse event rates are being adjusted from the trials in order to arrive at an estimate for the realistic dose of the drug. Failing to do this would result in meaningless results for the real world. This assumption may be seen to bias against the drugs which are given at closer to their recommended dose in the included trials if reducing dose does not in fact reduce adverse events. However expert consensus suggests that this is not the case, and intense sensitivity analysis around this assumption allows the effects of altering this assumption to be investigated.

The results presented in this document are the mean probabilistic sensitivity analysis (PSA) results, based on 1000 iterations of the model. In each of these iterations the dose assumption was allowed to vary between 0 and 1, with a mean of 0.5 and a beta distribution with alpha and beta values of 5. This is arbitrary but means that the distribution is bell shaped but fairly flat, allowing for a lot of variation in the parameter. Therefore, although the mean estimate of the parameter is 0.5, the model takes into account that the true value could be anywhere between 0 and 1, with figures closer to 0.5 slightly more likely than those further than 0.5, but still allowing the more peripheral values (close to 0, and close to 1) to occur fairly often.

In addition, we tested the impact of changes in the dose assumption through deterministic sensitivity analysis.

Chain of comparisons used in the model

This relates to how comparable relative risks for the different drugs adverse event rates were estimated using CLASS, MEDAL and TARGET. Because diclofenac was used in CLASS and MEDAL, and ibuprofen was used in CLASS and TARGET, there is always a common comparator between two drugs, allowing a network of comparisons to be linked. This type of indirect comparison is not ideal, particularly as mentioned previously there are some differences in the patient populations in the different trials, but in the absence of a trial directly comparing each and every relevant drug it is a reasonable and a robust method to use (Bucher et al. 1997; Song et al. 2003). In fact, because of the drug linkages between the studies all of the relative risks are directly calculated between two drugs, but this calculated relative risk is only an indirect comparison compared to the other drugs.

The only link in the network where there is a choice of which comparison to use to calculate relative risks is for naproxen. This can either be calculated using a comparison with lumiracoxib, or using a comparison with ibuprofen, both within the TARGET trial. We chose to use the comparison with lumiracoxib, although the license for this drug has now been withdrawn, as the focus of the TARGET trial was to compare lumiracoxib with standard NSAIDs, rather than comparing individual standard NSAIDs. This also seems sensible because the sub-studies within TARGET (lumiracoxib vs ibuprofen and lumiracoxib vs naproxen) appear to have unequal patient characteristics. The network is the same for each AE except heart failure, where individual data are not given for naproxen and ibuprofen in TARGET. This is shown in Box 6.

Box Icon

Box 6

Chain of comparisons used to estimate adverse event risks. Diclofenac: Absolute event rate calculated using the diclofenac arm in MEDAL for all events except GI symptoms / dyspepsia (not available from MEDAL), for which the diclofenac arm in CLASS was (more...)

PPI use

Previous NICE guidance (NICE COX-2 inhibitors technology appraisal and dyspepsia guideline) has not suggested that gastroprotective agents should be co-prescribed with COX-2 inhibitors (Anon 2006; Nice Appraisal Team 2000). However, some evidence has been published suggesting that COX-2 inhibitors with a PPI may be a beneficial combination. Also, PPIs have recently become significantly cheaper due to coming off patent. This suggests that COX-2 inhibitor + PPI is a reasonable comparator.

Scheiman et al studied the effect of adding a PPI to a COX-2 inhibitor in a randomised controlled trial. Their results were unexpected, as they suggest that lower dose PPI use results in a much larger reduction in GI adverse events than higher dose PPI use. However, results for both arms of the trial suggest that adding a PPI to a COX-2 inhibitor results in fewer GI adverse events.(Scheiman et al. 2006). Conservatively, it was decided to assume a low dose PPI cost, but using the relative risk from Scheiman et al associated with the higher dose PPI. The same relative risk reduction was assumed for all COX-2 inhibitors (see Table 11). Corroborative evidence for the effectiveness of adding a PPI to a COX-2 inhibitor is provided from another RCT (Chan et al. 2007). This study found a significant reduction in recurrence after hospital admission for upper gastrointestinal bleeding when a PPI was added to celecoxib (0% vs 8.9% over 12 months, 95% CI for the risk difference 4.1% to 13.7%). We did not use this evidence in the analysis, however, as it relates to a higher risk population, and the 0% event rate in the PPI arm means that we could not calculate a relative risk, as required for the model.

Table 11. Relative Risk of GI adverse events when add a PPI, compared to no PPI.

Table 11

Relative Risk of GI adverse events when add a PPI, compared to no PPI.

For standard NSAIDs concurrent use of a PPI is a more accepted intervention. A recent HTA studying gastroprotective agents is used to calculate a reduction in GI symptoms, symptomatic ulcer, and complicated GI event risk associated with coprescription with a PPI (Brown et al. 2006). This reduction in risk is assumed to be the same for each standard NSAID, as the HTA does not split out for individual NSAIDs.

We did not consider the use of alternative gastroprotective agents (H2 receptor antagonists) with either conventional NSAIDs or COX-2 inhibitors, since the clinical effectiveness evidence for H2RAs alongside these drugs is much weaker than that for PPIs, and there is now very little difference in cost between these classes of drugs.

Observational data

Observational data was not found for etoricoxib, and so celecoxib was the only COX-2 inhibitor included in this version of the model. The relative risk estimates drawn from RCT data were used.

We have assumed the dose in the observational studies to be the ADQ.

Table 12. Three-month transition probabilities from observational data – adjusted doses.

Table 12

Three-month transition probabilities from observational data – adjusted doses.

For ‘GI symptoms/dyspepsia’ and ‘Symptomatic ulcer’, the Hippisley-Cox case control study for uncomplicated GI events was used (Hippisley-Cox et al. 2005). For ‘Complicated GI bleed’ the same case control study for complicated GI events was used. This was chosen because it reported statistics for all the drugs we are comparing, which promotes consistency in the model. For the same reason, case control evidence from Andersohn et al was used for stroke (Andersohn et al. 2006). Evidence was more difficult to find for heart failure, but Mamdani reports HF admission relative risks for celecoxib and ‘NSAIDs’, which were mainly diclofenac (59%), ibuprofen (12%) and naproxen (17%) (Mamdani et al. 2004). For this event, the standard NSAIDs were therefore assumed to have the same relative risk. For MI, the most up to date observational meta-analysis was supplied to us by the MHRA (PhVWP assessment report 2006a). This meta-analysis presents relative risks for diclofenac, naproxen, and ibuprofen, but not celecoxib. Therefore estimates from Hernandez-Diaz et al were used for the celecoxib relative risk (Hernandez-Diaz et al. 2006).

Paracetamol was assumed to carry the same risk as placebo in the observational version of the model, apart from for GI symptoms / dyspepsia, where the same risk as ibuprofen was assumed. The effect of adding a PPI to the drugs was assumed to be the same as in the RCT version in the model, with relative risks as shown in Table 11.


Estimated costs and QALYs were discounted at 3.5% per year, as recommended in the NICE reference case.

Sensitivity analysis

Probabilistic sensitivity analysis was undertaken. Beta distributions were used for adverse event incidence probabilities, the dose adjustment factor for adverse events and utilities. Log-normal distributions were used for PPI relative risks and utility multipliers. All other parameters were held constant.

Deterministic sensitivity analysis was also used to examine the impact of age and baseline GI and CV risks, duration of treatment, alternative doses and prices for the drugs, the assumed relationship between adverse effects and dose, the source of estimates for effects on adverse events (from MEDAL, TARGET and CLASS and observational data).


Base-case analysis

The model results for 55 year old patients (low GI and CV risk) over three months of treatment are shown in Table 13.

Table 13. Model results: base-case analysis, 55 year old, 3 months of treatment.

Table 13

Model results: base-case analysis, 55 year old, 3 months of treatment.

One clear result is that addition of a PPI (omeprazole 20 mg) is cost effective. This can be seen in Figure 4, which shows that for all NSAIDs / COX-2 inhibitors addition of a PPI increases the estimated QALY gain at little or no additional cost to the health service (once savings from treating side effects are taken into account). This result is robust for all of the sensitivity analyses conducted below, and for the rest of this results section, we assume that all NSAIDs/COX-2 inhibitors would be prescribed with a PPI.

Figure 4. Mean treatment effects and costs: base-case analysis, 55 year old, 3 months of treatment.

Figure 4

Mean treatment effects and costs: base-case analysis, 55 year old, 3 months of treatment.

Of the included NSAIDs/COX-2 inhibitors, celecoxib 200 mg is the most cost-effective option, with an incremental cost-effectiveness ratio (ICER) of around £9,500 per QALY gained (see Table 14). In patients who cannot take celecoxib due to contraindication or intolerance, but who wish to take a COX-2 selective agent, etoricoxib 60 mg is of borderline cost effectiveness (£25,800 per QALY). NICE recommends that its advisory bodies should usually apply a cost-effectiveness threshold in the region of £20,000 to £30,000 per QALY (National Institute for health and clinical excellence 2008).

Table 14. Incremental cost-effectiveness results: base-case analysis.

Table 14

Incremental cost-effectiveness results: base-case analysis.

For patients who are not able to take a COX-2 inhibitor, paracetamol is slightly cheaper than standard NSAIDs with a PPI. But, although it incurs fewer GI or CV events, paracetamol is not as effective at controlling the symptoms of osteoarthritis. Consequently, standard NSAIDs with a PPI do appear to be a cost-effective alternative to paracetamol in this patient group.

There is little difference between diclofenac, ibuprofen and naproxen in terms of relative cost effectiveness.

Duration of treatment

The results are very similar over a longer duration of treatment (see Table 14).

Raised GI risk

The pattern of results is slightly different for 65 year old patients at higher risk of adverse events compared with the 55 year old cohort (relative risks of 2.96 and 1.94 for GI and CV events respectively ) (Table 15 and Figure 5). Celecoxib 200 mg is still the most cost-effective option (with an incremental cost-effectiveness ratio of £10,300 per QALY compared with no treatment). However, etoricoxib 60 mg is not cost effective in these patients (£67,600 per QALY).

Table 15. Model results: base-case analysis, 65 year old, 3 months of treatment.

Table 15

Model results: base-case analysis, 65 year old, 3 months of treatment.

Figure 5. Mean treatment effects and costs: base-case analysis, 65 year old, 3 months of treatment.

Figure 5

Mean treatment effects and costs: base-case analysis, 65 year old, 3 months of treatment.

The higher baseline risk of GI events in this group makes standard NSAIDs less effective, even when combined with a PPI. In fact, the model estimates that the QALY gain from improved control of OA symptoms is outweighed by the loss from NSAID-induced adverse events. Thus, paracetamol is the most cost-effective alternative to celecoxib + PPI in patients at high GI risk.

It should be emphasised that this difference is due to the assumed risk of adverse events, not age per se: the results for 55 year old patients with equivalent GI and CV risks are very similar to those of 65 year old patients.

Raised CV risk

Increased risk of CV adverse events when taking standard NSAIDs and COX-2 inhibitors has been brought to the public eye in recent times. The base case of the model considers patients with the characteristics of those in CLASS, MEDAL and TARGET, who generally do not display high CV risk factors. It is important to consider the model results for patients with heightened CV risk.

Increasing the relative risk of cardiovascular events reduces the effectiveness, and hence cost effectiveness, of all standard NSAIDs and COX-2 selective agents. The average risk for 55 year old patients in the model is 42 events per 10,000 person years (MIs, strokes or heart failure). At twice this risk, none of the standard NSAIDs are cost effective compared with paracetamol (Figure 6).

Figure 6. Mean costs and effects, 55 year old with twice the age-specific CV risk.

Figure 6

Mean costs and effects, 55 year old with twice the age-specific CV risk.

However, in our model celecoxib+ PPI is still estimated to be cost effective for 55 year old patients at twice, or even four times, the cardiovascular risk for their age (Table 16). Sixty-five year old patients, must have a relative risk of around 3 or 4 times the average for their age before celecoxib+ PPI ceases to be cost effective.

Table 16. Incremental cost effectiveness of celecoxib + PPI: raised cardiovascular risk.

Table 16

Incremental cost effectiveness of celecoxib + PPI: raised cardiovascular risk.

This analysis suggests that the cost effectiveness of celecoxib+ PPI is not very sensitive to patients’ baseline cardiovascular risk. But it should be noted that this result depends on the robustness of the CLASS data on adverse events, which we question below.


The results of the probabilistic sensitivity analysis are illustrated in Figure 7 and Figure 8 respectively. These cost effectiveness acceptability curves (CEACs) show the estimated probability that each option is the most cost-effective treatment (on the y-axis) as a function of the amount that we are willing to pay for a QALY (on the x-axis). They reinforce the conclusion that at a NICE cost-effectiveness threshold of around £20,000 to £30,000 per QALY and using the RCT data, celecoxib with PPI is the most cost-effective option for a range of patient groups. However, these graphs also illustrate the considerable uncertainty over the relative ranking of the other drugs.

Figure 7. Probability that each drug is the most cost-effective option as a function of willingness to pay per QALY: base-case analysis, 55 year old, 3 months of treatment.

Figure 7

Probability that each drug is the most cost-effective option as a function of willingness to pay per QALY: base-case analysis, 55 year old, 3 months of treatment.

Figure 8. Probability that each drug is the most cost-effective option as a function of willingness to pay per QALY: base-case analysis, 65 year old, 3 months of treatment.

Figure 8

Probability that each drug is the most cost-effective option as a function of willingness to pay per QALY: base-case analysis, 65 year old, 3 months of treatment.

In particular, note that although ibuprofen+ PPI has a higher estimated probability of being the most cost-effective option compared with diclofenac+ PPI, it has a lower expected net benefit. This apparently contradictory result is due to a skew in the estimated distribution of net benefits introduced by non-linearities in the model. It should not be taken to imply that ibuprofen+ PPI is more cost effective than diclofenac+ PPI.

In addition to this probabilistic sensitivity analysis, we conducted various deterministic analyses to examine the sensitivity of results to various other uncertainties. These other scenarios are discussed below.

Discussion and sensitivity analysis

Key drivers of the model results

Observational data

The results of the model using the observational adverse event data do differ from the base-case RCT data results (see Table 17 and Figure 9). However, some important results remain similar. In particular, both the RCT and the observational versions of the model show that it is cost effective to co-prescribe a PPI with a standard NSAID. Key differences in the results are that the observational version of the model suggests that ibuprofen 1200 mg is the most cost-effective standard NSAID. Celecoxib 200 mg + PPI is also relatively less cost effective based on the observational data (£30,400 and £21,000 per QALY compared with ibuprofen + PPI for 55 and 65 year old patients respectively). This borderline cost effectiveness means that the results are sensitive to small increases in baseline CV or GI risks.

Table 17. Model results: observational data, 55 year old, 3 months of treatment.

Table 17

Model results: observational data, 55 year old, 3 months of treatment.

Figure 9. Mean costs and effects: observational data, 55 year old, 3 months of treatment.

Figure 9

Mean costs and effects: observational data, 55 year old, 3 months of treatment.

The fact that ibuprofen comes out as the most cost-effective standard NSAID in the observational version of the model is not a surprise. We have already shown that the standard NSAID results are very similar in the RCT version of the model, and the results probably should not preclude any of the included standard NSAIDs being prescribed (with a PPI). It is also well known that ibuprofen appears one of the safest standard NSAIDs in observational data. However, although this data draws upon much larger sample sizes than RCT data, bias is an important problem. Ibuprofen is likely to be used at a lower dose in the real world relative to other NSAIDs, with patients likely to be moved on to a higher dose of an alternative NSAID if something stronger is required. This brings substantial dosing bias to the observational data which places question marks over the results of the observational version of the model.

Stroke risks

Celecoxib comes out very favourably in the RCT model results. This is particularly interesting because for serious GI events, against which COX-2 inhibitors are supposed to protect patients, celecoxib appears slightly worse than etoricoxib (see Table 10). However, celecoxib comes out particularly favourably for stroke. In CLASS, the rate of cerebrovascular disorders was 0.002 in the celecoxib 800 mg arm of the trial, and 0.005 in the diclofenac 150 mg arm. The rate of stroke was much more similar between COX-2 inhibitors and standard NSAIDs in TARGET and MEDAL. However some care has to be taken with these results because the stroke relative risks from these RCTs are based on very low numbers of events. Because stroke is the most expensive adverse event included in the model, and also the one that has the most detrimental effect on utility, it is a key driver in the model.

Table 10. Three-month transition probabilities from RCT data – adjusted doses.

Table 10

Three-month transition probabilities from RCT data – adjusted doses.

We tested the impact of stroke by setting the relative risks for the coxibs equal to those observed in MEDAL, which was the largest of the three RCTs and was also designed specifically to estimate cardiovascular event rates. The results of this analysis are summarised in Table 18. Under this scenario, neither celecoxib 200 mg + PPI nor etoricoxib 60 mg + PPI was cost effective. This shows the importance of uncertainty over stroke risks to the results of this analysis.

Table 18. Sensitivity analysis on cardiovascular risks for COX-2 inhibitors.

Table 18

Sensitivity analysis on cardiovascular risks for COX-2 inhibitors.

Etoricoxib 30 mg

The above results relate to a 60 mg daily dose of etoricoxib. A lower dose of 30 mg per day is now available in the UK. Compared with the mean dose of 78 mg in the MEDAL programme, this represents a 62% reduction in dose, which translates to a 31% reduction in observed event rates using our modelling assumptions.

In the base-case model, this is sufficient to make etoricoxib more cost effective, though still not as good as celecoxib at the current price of £13.99 per month (Error! Reference source not found.). Although etoricoxib 30 mg + PPI is a cost-effective alternative to standard NSAIDs for patients at low GI risk, celecoxib 200 mg + PPI is cost effective compared with etoricoxib 30 mg + PPI for these patients. For patients at high GI risk, celecoxib 200 mg + PPI is still the most cost-effective option, although etoricoxib 30 mg + PPI would be cost effective for patients who were suitable for a COX-2 inhibitor, but could not take celecoxib.

Table 19. Sensitivity analysis for etoricoxib 30 mg.

Table 19

Sensitivity analysis for etoricoxib 30 mg.

However, if we assume that all COX-2 inhibitors have the same stroke risks (based on the MEDAL results), then etoricoxib 30 mg would be the most cost effective of the included drugs (Table 20).

Table 20. Sensitivity analysis for etoricoxib 30 mg, MEDAL stroke risks for COX-2 inhibitors.

Table 20

Sensitivity analysis for etoricoxib 30 mg, MEDAL stroke risks for COX-2 inhibitors.

The NCC has also been advised of a forthcoming change in the NHS net price of etoricoxib 60 mg. However, applying the same assumptions as for all other included drugs, this higher dose would now be dominated in our model by etoricoxib 30 mg. This is because the lower dose is cheaper and would be expected to be associated with similar efficacy but fewer adverse effects than the higher dose. For this reason we did not re-run our analysis for etoricoxib 60 mg at the revised price.

Dose of NSAIDs

The dose of medication impacts on the model through the assumed relationship with adverse event rates. Since lower doses are assumed to be equally effective at controlling OA symptoms, but incur lower rates of GI and CV events, they will be more cost effective than higher doses of the same drug. However, the modelled doses of the standard NSAIDs do not necessarily reflect current practice. Some prescribing data suggests it may be more appropriate to consider a diclofenac dose of 150 mg per day, rather than 100 mg (University of Dundee 2004). MEMO data also shows that naproxen 1000 mg may be a more appropriate dose to consider, rather than 750 mg. Ibuprofen may also be prescribed at 2400 mg per day, rather than 1200 mg as assumed in the model. We tested these alternative doses in sensitivity analysis (Table 21). This shows that the relative cost effectiveness of the standard NSAIDs depends on the dose required to achieve a therapeutic response in an individual patient.

Table 21. Results of sensitivity analysis on daily dose of standard NSAIDs.

Table 21

Results of sensitivity analysis on daily dose of standard NSAIDs.

Heart failure

The estimates for heart failure risk may be controversial, as some clinical pharmacology studies have suggested that etoricoxib is likely to be worse for renal parameters such as systolic blood pressure than other NSAIDs (Medicines and Healthcare Products Regulatory Agency 2005). The explanation for this is that our RCT estimates are based only on CLASS, MEDAL and TARGET, to allow consistency in the estimates and also due to difficulties of pooling results from studies with different populations, study designs, and outcome definitions. Although etoricoxib did appear worse than diclofenac 150 mg for heart failure in MEDAL, the difference between celecoxib 800 mg and diclofenac 150 mg was even greater in CLASS, and was estimated to be fairly similar when the high dose of celecoxib was taken into account. Also, ibuprofen 2400 mg appears substantially worse than diclofenac 150 mg for heart failure in CLASS. Hence we end with heart failure estimates which are all quite similar for the COX-2 inhibitors, and with ibuprofen and naproxen both appearing worse than diclofenac. This is of particular importance because a lack of detailed data from the TARGET study means that we have had to assume that naproxen and ibuprofen have the same risk for heart failure. Based upon other CV risks, this may bias against naproxen.

It should be noted that the utility of heart failure is considered in the model in such a way as to mean that this event has less impact on the results than the other CV events (see Table 7). We re-ran the model assuming that all drugs incurred the same risk of heart failure (rates estimated from the MEDAL trial for diclofenac 100 mg). This made little difference to the overall results. However, this analysis did change the relative ranking of the standard NSAIDs for patients at low GI risk, making naproxen appear similarly cost effective as diclofenac and ibuprofen.


Another concern about the adverse event estimates may be that ibuprofen 1200 mg is estimated to have a substantially higher risk of MI than diclofenac 100 mg. This is due to the relatively high rate of MIs in the ibuprofen 2400 mg arm of the CLASS study. Although this risk is reduced due to our dose:adverse event assumption, the risk associated with ibuprofen 1200 mg still appears high. However, sensitivity analysis was undertaken to test whether assuming ibuprofen 1200 mg and naproxen 750 mg had an equal risk of MI as diclofenac 100 mg affected the model results (Table 22). In this scenario, ibuprofen 1200 mg + PPI was more cost effective than the other standard NSAIDs.

Table 22. Sensitivity analysis on cardiovascular risks for standard NSAIDs.

Table 22

Sensitivity analysis on cardiovascular risks for standard NSAIDs.

Also, it may be surprising that naproxen does not come out more favourably in the model, considering well documented evidence of a lower MI risk with the drug. However although this appears to be the case, naproxen also appears substantially worse for serious GI events, and slightly worse for stroke when compared to the other standard NSAIDs. This results in less favourable results for naproxen.

Hip fracture

The model was re-run adding in an increased cost and decreased utility associated with patients taking PPIs, based on recent data linking PPI usage to hip fracture (Vestergaard et al. 2006; Yang et al. 2006). Ideally hip fracture would be incorporated into the model as a separate health state if more data is collected showing that it is related to PPI use. Adding in hip fracture to the model as an increased cost and decreased utility associated with PPI use based on data from the literature (Stevenson et al. 2007) had very little effect on the model, and did not change the results.

Dose-adverse effect relationship

The overall cost-effectiveness results are not sensitive to the assumed relationship between dose and adverse effects. Celecoxib+ PPI remains the most cost-effective option (with an ICER below £20,000 per QALY) when we assume that a 50% change in dose gives a 0% or 50% change in adverse events.

However, the estimated benefits of naproxen 750 mg, ibuprofen 1200 mg and diclofenac 100 mg are sensitive to the dose: adverse event relationship. A lower adjustment makes naproxen appear relatively more attractive.

These results reinforce the uncertainty over the ranking of the COX-2 inhibitors and standard NSAIDs.


We conducted a cost-effectiveness analysis, comparing standard NSAIDs and COX-2 inhibitors for which there was sufficient evidence to draw reliable conclusions: paracetamol 3000 mg, diclofenac 100 mg, naproxen 750 mg, ibuprofen 1200 mg, celecoxib 200 mg, etoricoxib 60 mg and 30 mg. We also tested the cost effectiveness of adding a gastroprotective agent to each of these NSAIDs / COX-2 inhibitors. It should be noted that we did not consider the cost effectiveness of other NSAIDs, meloxicam or etodolac, due to lack of suitable data.

The analysis was based on an assumption that the NSAIDs and COX-2 inhibitors are equally effective at controlling OA symptoms, but that they differ in terms of GI and CV risks. The adverse event risks were taken from three key studies: MEDAL, CLASS and TARGET. As the doses of both standard NSAIDs and COX-2 inhibitors were very high in these trials, we adjusted the observed rates to estimate the impact of more commonly-used and licensed doses. The effectiveness of NSAIDs / COX-2 inhibitors and paracetamol at controlling OA symptoms was estimated from a meta-analysis of RCTs. Given these assumptions, lower doses of a drug will always be more cost effective than a higher dose of the same drug. In practice, though, some individuals may require higher doses than we have assumed in order to achieve an adequate therapeutic response.

One clear result of our analysis is that it is cost effective to add a PPI (omeprazole 20 mg) to standard NSAIDs and COX-2 inhibitors. We did not test the relative cost effectiveness of other gastroprotective agents, because of the superior effectiveness evidence for PPIs, and the currently very low cost of omeprazole at this dose.

Given our assumptions and current drug costs, celecoxib 200 mg is the most cost effective of the included NSAIDs / COX-2 inhibitors. This result was not sensitive to the assumed duration of treatment (from 3 months to 2 years), or to the baseline risk of GI events in the population (55 years vs 65 years). It was also relatively insensitive to the baseline risk of CV events; only at very high levels of cardiovascular risk (approximately six times the average rate for a 55 year old) did celecoxib cease to be cost effective. Etoricoxib 30 mg would be a cost-effective alternative for patients who are suitable for a COX-2 inhibitor but cannot take celecoxib.

However, it is important to note substantial uncertainties over the relative rates of adverse events associated with the COX-2 inhibitors estimated from the MEDAL, TARGET and CLASS studies. In particular, the estimated risk of stroke for celecoxib from CLASS was surprisingly low. If this is an underestimate, then etoricoxib 30 mg could be more cost effective than celecoxib 200 mg.

Observational data implies a less attractive cost-effectiveness ratio for celecoxib (around £30,000 per QALY), though this estimate may be biased. There was no observational data for the other COX-2 inhibitors.

For patients who cannot, or do not wish to, take a COX-2 inhibitor, the relative cost effectiveness of paracetamol and standard NSAIDs depends on their individual risk profile, as well as the dose required to achieve an adequate therapeutic response. Recommendations are given in the full guideline.

The relative costs of diclofenac 100 mg, naproxen 750 mg and ibuprofen 1200 mg prescribed concurrently with a PPI are similar, and uncertainties over the relative incidence of adverse events with these drugs make it difficult to draw clear conclusions about their comparative cost effectiveness.

This analysis has highlighted the high level of uncertainty over the comparative cost effectiveness of different NSAIDs and COX-2 inhibitors. Changes in the best estimates of the rates of some adverse events could change the results. Given that adverse events are the key driver of the model, this is the area where research would be most desirable. It should be noted though, that more data than was used in the model does exist, from both randomised and observational studies. For this guideline we were not able to combine all the available data to inform the model. However, if this was possible, this may decrease the need for additional research.