NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Dretzke J, Edlin R, Round J, et al. A Systematic Review and Economic Evaluation of the Use of Tumour Necrosis Factor-Alpha (TNF-α) Inhibitors, Adalimumab and Infliximab, for Crohn's Disease. Southampton (UK): NIHR Evaluation, Trials and Studies Coordinating Centre (UK); 2011 Feb. (Health Technology Assessment, No. 15.6.)

Cover of A Systematic Review and Economic Evaluation of the Use of Tumour Necrosis Factor-Alpha (TNF-α) Inhibitors, Adalimumab and Infliximab, for Crohn's Disease

A Systematic Review and Economic Evaluation of the Use of Tumour Necrosis Factor-Alpha (TNF-α) Inhibitors, Adalimumab and Infliximab, for Crohn's Disease.

Show details

3Assessment of clinical effectiveness

Methods for reviewing clinical effectiveness

Search strategy

The search strategy was designed to update that undertaken for the previous technology assessment of the clinical effectiveness and cost-effectiveness of infliximab in adults with moderate-to-severe CD5 and to encompass the new anti-TNF therapies identified for review. A search was undertaken to find existing good quality systematic reviews in order to document the evidence base to date. Searches for primary studies were restricted to RCTs. The following sources were searched for relevant primary studies:

  • bibliographic databases: Cochrane Library [Cochrane Central Register of Controlled Trials (CENTRAL)] 2007, Issue 2; MEDLINE (Ovid) 2000 to May/June 2007; MEDLINE In-Process & Other Non-Indexed Citations (Ovid) 4 June 2007 and 26 June 2007; EMBASE (Ovid) 2000 to May/June 2007. Searches were based on index and text words that encompass the condition: CD and the interventions: adalimumab, certolizumab pegol, infliximab and natalizumab. [Natalizumab and certolizumab pegol were originally part of this technology appraisal so were included in the searches. They were subsequently dropped from the report after completion of searches (see Protocol modification).] Where it was appropriate, a methodological ‘filter’ was applied to identify RCTs
  • EMEA, FDA and other relevant websites
  • citations of relevant studies
  • contact with experts
  • research registries of ongoing trials including National Research Register 2007, Issue 2, Current Controlled Trials and ClinicalTrials​.gov
  • submissions from industry
  • hand search of conference abstracts in 2006 and 2007: British Society of Gastroenterology, Digestive Disease Week, United European Gastroenterology Meeting, European Crohn's and Colitis Organisation, Federation of Clinical Immunology Societies.

Searches were not limited by language. Full search strategies can be found in Appendix 3.

Inclusion and exclusion criteria

Only studies meeting the following inclusion criteria were included:

  • Study design: RCTs (study designs other than RCTs were excluded).
  • Population: adults (≥ 18 years) and children (6–17 years) with moderate-to-severe, active CD intolerant or resistant to conventional treatment; adults (≥ 18 years) with fistulising CD resistant to conventional treatment. ‘Moderate-to-severe’ disease includes patients with an average CDAI score of ≥ 220 or those who are described by trial authors as having moderate-to-severe disease.
  • Intervention: adalimumab or infliximab (any dosage/treatment regimen).
  • Comparator: conventional treatment without TNF-α inhibitors including no treatment, placebo, dietary intervention, drug treatment with aminosalicylates, methotrexate, corticosteroids (prednisolone, budesonide and hydrocortisone), azathioprine, metronidazole or surgical intervention. Adalimumab and infliximab compared with each other. Different dosage or treatment regimens of the same drug.
  • Outcomes: at least one of the following: overall survival, progression-free survival, HRQoL, disease activity (remission, response, relapse, changes in disease activity indices, number of fistulas for fistulising disease), need for surgery, hospitalisation rates and adverse effects of treatment.
  • Trials that looked at both induction and maintenance of remission were included.

Based on the above inclusion/exclusion criteria, study selection was made independently by two reviewers. Discrepancies were resolved by discussion, with involvement of a third reviewer when necessary. All discrepancies were resolved in this way.

Data extraction strategy

Information on study characteristics, study quality and results for each trial was extracted by one reviewer and checked by a second reviewer. Four reviewers were involved in data extraction. A standardised data extraction form was used, based on the form designed for the previous TAR on infliximab.5 The data extraction template can be found in Appendix 4. Where necessary the template was adapted to accommodate details relevant to a specific trial. Where required, information was extracted from graphs as follows (see Appendix 5): the graph was scanned into a word document, overlaid with an appropriate template with graph gridlines, and printed and enlarged to A3 size, and information was extracted using the gridline template. To reduce error in this procedure, extracted information was checked by comparing graph readings with any available values in the report text and/or by redrawing the graph using the extracted data and comparing this with the original (see Appendix 5 for examples). Data extraction discrepancies were resolved by discussion, with involvement of a third reviewer when necessary. All discrepancies were resolved in this way.

Quality assessment strategy

Quality assessment was based on the published papers only and note was taken that absence of a quality criterion may be due to lack of reporting rather than actual poor methodological quality. Authors were not contacted for further information. Quality assessment was descriptive, a quality scoring system was not used. The quality criteria assessed were based on guidelines suggested by the Cochrane Collaboration, inviting consideration of threats arising from selection, performance, attrition and detection biases. Individual checklist items were: randomisation, concealment, blinding, comparability of groups, follow-up of trial participants, handling of missing data [intention-to-treat (ITT) analysis], power calculation and selective reporting (see Appendix 4 for checklist). Study quality was assessed by one reviewer and checked by a second. Discrepancies were resolved by discussion, with involvement of a third reviewer when necessary. All discrepancies were resolved in this way.

Handling of manufacturer and other submissions

The main industry submissions (including appendices) were checked for additional relevant trials and additional clinical effectiveness data for included trials. Because editorial constraints meant the results available in published accounts of the trials were necessarily selective, information in the submitted Clinical Study Reports was sourced as required for purposes of balance and completeness. It was not possible to systematically review all such additional information submitted owing to the volume of the submissions [e.g. more than 38,000 pages for the clinical study report of ACCENT (A Crohn's disease Clinical trial Evaluating inflixmab in a New long-term Treatment regimen) I,2,3 more than 5000 pages for the Clinical Study Report of Targan et al.,57 both included studies]. No references to specific sections of the clinical study reports were made in the main industry submissions. [Please note that the clinical study reports for the CLASSIC (CLinical assessment of Adalimumab Safety and efficacy Studied as Induction therapy in Crohn's disease), CHARM (Crohn's Trial of the Fully Human Antibody Adalimumab for Remission Maintenance) and GAIN (Gauging Adalimumab efficacy in Infliximab Nonresponders) RCTs that were received from the manufacturers of adalimumab started on section 4 and had no page numbers or tables of contents. Also some of the appendices were missing, particularly ones referred to in the text as having all of the raw results in tables. Therefore it is unclear whether some pages are missing from the middle of these reports or not and potentially the most useful appendices were not supplied.] For details on how the submitted economic models were assessed see Chapter 4, Critique of the submission on infliximab by Schering-Plough.

Analysis strategy

The clinical effectiveness section of this report mainly focuses on the results from RCTs and/or RCT trial arms in which the drugs were administered within the limits of their current respective licence indication (see Appendix 6). Results of trials are organised and reported in four categories:

  • induction trials in adult populations predominantly or wholly constituted of non-fistulising patients
  • maintenance trials in adult populations predominantly or wholly constituted of non-fistulising CD patients
  • trials in populations wholly constituted of patients with fistulising CD
  • trials in paediatric patients.

Results are reported within these four categories on a trial-by-trial basis except with regard to AEs and side effects which were considered simultaneously across all included trials across both drugs. Most outcome results are presented in forest plots so as to provide an overview of the quantitative spread of effect sizes. These are accompanied with brief narrative commentary. In some instances outcome results are tabulated. Both placebo and intervention rates and both risk difference and risk ratio effect sizes are presented for most outcomes in Tabulation of included studies. The confidence intervals (CIs) quoted were not adjusted for repeated measures.

The clinical heterogeneity of trials, or the existence of only a single trial, precluded pooling of data in meta-analysis. The feasibility of undertaking indirect comparison analysis was considered in-depth in order to assess the relative effectiveness of different drugs because there were no RCTs directly comparing both drugs included in this technology appraisal. However, indirect comparisons were not done because of the variation in placebo effect sizes in the RCTs (induction trials), the lack of similarity in the apparently common comparator (i.e. placebo arm maintenance trials), and the reporting of subgroup results only at follow-up (i.e. variously defined responders only) in many of the RCTs.

Protocol modification

The protocol originally encompassed assessment of the clinical effectiveness and cost-effectiveness of infliximab, adalimumab, certolizumab pegol and natalizumab within their licensed indications for moderate-to-severe CD. At the time of producing the protocol, certolizumab pegol and natalizumab were not licensed for CD, but imminent licensing was anticipated by the commissioners of this report (NICE). After the start of the review process it became clear that neither drug would achieve a licence within the time frame required for this technology assessment and consequently both drugs were dropped from the review. This occurred after completion of the search strategy. As of November 2010 these drugs remain unlicensed for CD.

Results

Quantity of research available

Eleven relevant trials were identified,2,3,45,46,57,58,6267 some supported by multiple publications. Figure 1 details the trial identification process.

FIGURE 1. Study identification process.

FIGURE 1

Study identification process.

At the time of writing of this report, 11 hard copies of ordered publications were still outstanding or not available; none of these are likely to contain new trial data (see Appendix 7 for details of publications).

Eleven RCTs were included in total.2,3,45,46,57,58,6267 Seven trials meeting the inclusion criteria were identified through the main database searches.2,3,45,46,58,63,65,67 Two additional studies57,62 from the previous TAR on infliximab were included,5 as were two trials from 2007 which had been published after the search cut-off date.64,66

Searching through the main industry submissions from both manufacturers did not yield any additional RCTs. The search for conference abstracts yielded no further relevant trials. An abstract of the study by Hommes et al.61 was identified, which is referred to in Chapter 5, Other relevant factors. This study did not meet the criterion of a population of CD patients who are resistant or intolerant to conventional treatment.

The search for ongoing trials yielded four potentially relevant RCTs, all of adalimumab (see Appendix 8). All were at the recruitment stage (or not yet recruiting) at the time the information was verified by the respective manufacturers. Two were trials (induction and maintenance) of adalimumab in Japanese patients with moderate-to-severe CD. Two multicentre trials of adalimumab were in patients with moderate-to-severe ileocolonic CD and in children with moderate-to-severe CD respectively. Two ongoing trials of infliximab were identified, but did not meet the inclusion criteria as they compared either infliximab with infliximab plus methotrexate or infliximab with infliximab plus azathioprine. No ongoing trials of head-to-head comparisons of adalimumab and infliximab were identified. No preliminary reports of any of these ongoing trials were identified in the manufacturer submissions.

Tabulation of included studies

All of the included RCTs recruited patients having ‘moderate-to-severe CD’ defined according to CDAI scores of between 220 and 450, or 220 and 400; it is therefore likely that they do not reflect the intended licensed population of severe active CD (i.e. CDAI score of more than 300).

The included studies encompassed two trial designs, induction therapy and maintenance therapy, in any of three populations: adults predominantly or wholly non-fistulising, fistulising adults and children. Table 5 gives an overview of the included studies with reference to trial design and recruited patient population.

TABLE 5. Overview of the 11 included trials.

TABLE 5

Overview of the 11 included trials.

Of the 11 included RCTs,2,3,45,46,57,58,6267 nine compared infliximab or adalimumab with placebo.2,3,57,58,6267 Two RCTs compared different doses of infliximab only and these were both in children.45,46 Two RCTs of infliximab were in patients with fistulising disease.62,65 Both induction and maintenance trials were identified for both drugs. All RCTs were multicentre studies conducted mainly in North America and Europe. No RCTs of head-to-head comparisons of adalimumab and infliximab were identified. No RCTs of adalimumab in children were identified. Based on the information in the published papers, all RCTs were either industry sponsored or in part industry sponsored, had participants from industry involved in study design or manuscript writing, or had one or more authors with industry involvement.

In the induction trials, patients received short-duration anti-TNF or placebo to see if a favourable clinical response was induced. In the maintenance trials, all patients received short-term induction therapy with anti-TNF and then continued with longer term anti-TNF or placebo.

In the maintenance trials most published results reported only the follow-up of patients who initially responded to the induction therapy, and results for ‘non-responders’ were generally not provided.

The most widely reported outcomes were based on CDAI scores (see Appendix 1 for details). Although group mean or median CDAI scores were usually recorded at various times of follow-up, the variance of these scores was incompletely reported and trials emphasised binary outcome measures derived by dichotomising CDAI scores. Three such binary measures were used:

  • response 70: defined as a reduction of 70 or more in CDAI score relative to baseline
  • response 100: defined as a reduction of 100 or more in CDAI score relative to baseline
  • remission: defined as a CDAI score of less than 150.

The definitions of the binary measures given above were often qualified by stipulation of additional criteria usually including no requirement for a change in concomitant medication because of worsening clinical condition and no requirement for surgery.

This section describes the results about the effectiveness of the anti-TNF interventions. The results reviewed were taken mainly from publications. When judged necessary for purposes of completeness and balance, information in the unpublished industry trial reports was also sourced.

There are four sections in the clinical effectiveness results: induction treatment in adults (predominantly non-fistulising), maintenance in adults (predominantly non-fistulising), treatments in adult patients exclusively with fistulising CD, and paediatric CD (≤ 18 years old). Within each section infliximab is reported before adalimumab and the earliest trial publication date first. Each of the four sections are organised for each trial as follows:

  • description of intervention used in the trial and other unusual points about the trial design
  • report of outcomes organised as A, response 70; B, response 100; C, remission; D, other outcomes; and E, other considerations, in the first two sections, primary and secondary outcomes in the last two sections
  • quality assessment
  • summary for that trial (in box).

Adverse events and side effects are considered simultaneously across all included trials for both drugs at the end of the clinical effectiveness section (see Adverse events), just before the discussion of clinical effectiveness (see Discussion of results and assessment of effectiveness).

Induction trials in adult populations (wholly or predominantly non-fistulising)

Induction trials are patients who were not receiving anti-TNF therapy at the time of randomisation. Three trials were identified.57,63,64 One, Targan et al.,57 compared infliximab with placebo. A further publication, D'Haens et al.,68 reported on a subgroup from Targan et al.57 and so will not be further discussed. Two trials compared adalimumab with placebo (CLASSIC I63 and GAIN64). Apart from the subgroup study the trials recruited patients who had initial CDAI scores between 220 and 450. The outcomes reported are summarised in Table 6 and trial details are summarised in Table 7.

TABLE 6. Outcomes measured in induction trials with mainly non-fistulising adult populations.

TABLE 6

Outcomes measured in induction trials with mainly non-fistulising adult populations.

TABLE 7. Main study and population characteristics: induction trials in predominantly or wholly non-fistulising adult populations.

TABLE 7

Main study and population characteristics: induction trials in predominantly or wholly non-fistulising adult populations.

Targan et al., 199757 (infliximab)

This RCT had four arms.57 Patients were randomised to a single i.v. infusion of placebo (n = 25) or of infliximab at 5 mg/kg (n = 27), 10 mg/kg (n = 28) or 20 mg/kg (n = 28). Disease status (remission, response 70 and CDAI score) was monitored at baseline and at weeks 2 and 4 after infusion. The 4-week blinded phase was followed by an open-label phase with a further 12 weeks of follow-up. The primary outcome measure was defined as a response 70 at week 4 with no change in any concomitant medication.

A, response 70

Response 70 at week 4 was the primary outcome. Results for response 70 at weeks 2 and 4 are summarised in Figure 2. For response 70 at week 4 there was a statistically significant difference in favour of the infliximab groups (combined) compared with placebo (p < 0.001). The percentage of placebo patients achieving response 70 was ≤ 16% at both time points and for infliximab groups at week 4, and was between 50% and 81% depending on dose regimen. Point estimates of percentage response were associated with considerable uncertainty. The rate of response 70 at week 4 for the combined infliximab groups was 61% (95% CI 51% to 71%). At week 4 the risk difference (infliximab–placebo) was between 0.34 and 0.65, and risk ratio (infliximab/placebo) was between 3.1 and 5.1 depending on dose. Both risk difference and risk ratio at week 4 reached statistical significance in favour of intervention.

FIGURE 2. Response 70 rates in Targan et al.

FIGURE 2

Response 70 rates in Targan et al. At week 4 risk difference p < 0.001, p = 0.0045, p < 0.001, for 5, 10 and 20 mg/kg dose regimens respectively. At week 4 risk ratio p < 0.001, p = 0.022, p < 0.004 for 5, 10 and 20 mg/kg (more...)

Table 8 summarises the comparison between different dose regimens for response 70 at week 4. The low-dose regimen (5 mg/kg) appeared more effective than the 10 mg/kg regimen (p = 0.009). The difference between dose regimens for other comparisons did not reach statistical significance.

TABLE 8. Risk difference between dose regimens in response 70 at week 4 in Targan et al.

TABLE 8

Risk difference between dose regimens in response 70 at week 4 in Targan et al.

B, response 100 was not reported
C, remission

Figure 3 summarises remission rates. At 4 weeks, between 25% and 48% of patients in the infliximab groups were in remission, depending on dose, but only one placebo patient achieved remission.

FIGURE 3. Remission rates in Targan et al.

FIGURE 3

Remission rates in Targan et al. At week 4 risk difference p < 0.001, p = 0.0206 and p = 0.0206, for 5, 10 and 20 mg/kg dose regimens respectively. At week 4 risk ratio p = 0.013, p = 0.076 and p = 0.076 for 5, 10 and 20 mg/kg dose regimens respectively. (more...)

There was a discrepancy between remission rates published in Targan et al.57 and rates presented in the manufacturer's submission. The latter for the 5 mg/kg group at week 4 were placebo rate 4% (1/24), infliximab rate 0% (0/24). These remission rates generate a negative risk difference (infliximab–placebo) at week 4 (−0.04). CIs for risk ratios (infliximab/placebo) in the manufacturer's submission were described as ‘unadjusted’, but were unexpectedly narrow compared with those calculated using standard software packages or using the standard error of ln (risk ratio) given by:69 ([ei]−2 + [ep]−2 − [Ti]−2 − [Tp]−2)0.6, where ei, and ep, are the number of patients with the outcome in the intervention and placebo arms respectively, and Ti and Tp are total number of patients in the intervention and placebo arms respectively. (This discrepancy in CIs applies to CDAI-based binary risk ratios for all trials in the infliximab industry submission.)

Maintenance of initial response to single infusion At week 4 there were 54/83 (65%) responders (response 70) to infliximab (combined dose groups); by 12 weeks (see E, open-label phase below) there were 34 responders (41%). At week 4, 27/83 (33%) patients given infliximab had gained remission and at 12 weeks 20 patients (24%) were in remission.

D, other outcomes

At week 4 favourable responses to treatment were reported for CDAI scores, for QoL scores (IBDQ), and for C-reactive protein (CRP) levels. The results reported are summarised in Table 9.

TABLE 9. Mean (standard deviation) values for CDAI, IBDQ and CRP concentrations at baseline and week 4.

TABLE 9

Mean (standard deviation) values for CDAI, IBDQ and CRP concentrations at baseline and week 4.

Figure 4 shows the mean difference in IBDQ score (infliximab–placebo) at week 4. Mean difference reached statistical significance only for patients who received the low-dose regimen.

FIGURE 4. Mean IBDQ scores and mean difference at baseline and week 4 of Targan et al.

FIGURE 4

Mean IBDQ scores and mean difference at baseline and week 4 of Targan et al. NS, not significant.

E, other considerations – open-label phase

In the open-label phase of the trial, extending by at least 12 weeks from week 4, non-responder patients at week 4 were eligible for a 10 mg/kg infusion of infliximab. The distribution of this second infusion among the patient groups is summarised in Table 10. Of the original 25 placebo group patients, 19 non-responders received infliximab; 29 non-responder patients who had received a first dose of infliximab received the second dose. Table 10 lists the percentage of the patients (not responsive at week 4) in each group who subsequently achieved response 70 at follow-up weeks 4, 8 and 12 after the second infusion.

TABLE 10. Numbers of patients receiving second infusion in open-label phase of Targan et al.

TABLE 10

Numbers of patients receiving second infusion in open-label phase of Targan et al.

Of patients unresponsive to the first dose of infliximab, 28% (8/29) responded by week 12 following the second dose, compared with 53% (10/19) of patients whose second infusion was their first exposure to active intervention. During this open-label phase there was a lack of a true placebo control group and the results therefore only suggest that some patients poorly responsive to an initial infusion may respond subsequently on receipt of further infusion. Whether a 10 mg/kg second dose represents the most appropriate dose regimen for this second-dose strategy is unknown.

Quality assessment (based on published report)

Randomisation, allocation concealment, and blinding (up to week 4) were all adequate. Baseline characteristics were similar between groups except for CRP levels and for the proportion of patients with ileal involvement. Placebo CRP level {mean 12.8 [standard deviation (SD) 13.9]} was substantially lower than that for the active intervention groups [mean (SD): 22.1 (23.6), 23.2 (34.2) and 22.4 (23.9) for the 5 mg/kg, 10 mg/kg and 20 mg/kg groups respectively]. The potential impact on results of the imbalanced CRP levels is difficult to determine. Follow-up appeared almost complete. The original study protocol did not specify the use of ITT analysis, but the publication stated that patients were analysed according to assignment. A power calculation was conducted; this assumed a 30% response in the placebo group presumably reflecting the authors' assessment of placebo rates reported in other CD trials. The actual placebo response rate observed was less than half this value (16%) and was low compared with other similar trials. The low placebo rate and imbalance of placebo CRP level may indicate an atypical placebo population possibly stemming from the small sample size of the group (n = 25).

CLASSIC I63 (adalimumab)

In this trial,63 patients (n = 299) were randomised to two subcutaneous injections 2 weeks apart of either placebo (n = 74) or adalimumab at dose regimens of 40 mg then 20 mg (n = 74), at 80 mg then 40 mg (n = 75), or at 160 mg then 80 mg (n = 76). Patients were excluded if they had previously received any anti-TNF treatment. At baseline 11% of patients had fistulas. Outcomes were monitored at weeks 1, 2 and 4 after the first injection. The primary outcome was defined as the proportion of patients in remission at week 4 in the two high-dose adalimumab groups versus the placebo group (tested using chi-squared test).

Targan et al., 1997.57 Summary of effectiveness evidence

A single i.v. infusion of infliximab (5, 10 or 20 mg/kg) was more effective than placebo at delivering a clinical response (a reduction of ≥ 70 points in CDAI score) at week 4 of follow-up (p < 0.005 for risk differences and p < 0.022 for risk ratios). Estimates of the percentage of patients responding to infliximab were associated with considerable uncertainty, and at 4 weeks ranged between 50% and 80% depending on dose. Of the dose regimens used, the lowest appeared to be the most effective, suggesting the possibility that the most appropriate dose could be less than the lowest used in the trial (5 mg/kg). A proportion of patients (~30%) not responsive at week 4 did respond subsequently when given a second dose of infliximab (10 mg/kg); although it is likely this ‘second-dose’ response required active intervention, this was not properly demonstrated because the trial lacked a true placebo comparator after week 4. The most effective dose regimen for a ‘second-dose’ response was uncertain. After week 4 nearly all trial participants had received active intervention, and inferences about the relation of outcomes to infliximab were obscured. The Targan et al.57 trial was completed more than a decade ago and no further induction trial of infliximab in this population has been conducted, so the uncertainties described above remain to be addressed.

A, response 70

At week 4 for the less robust measure of a clinical improvement by more than 70 points in CDAI score from baseline (response 70), a statistically significant result was observed for both risk difference and risk ratio for all three dose regimens (results are summarised in Figure 5).

FIGURE 5. Rates of response 70 in CLASSIC I.

FIGURE 5

Rates of response 70 in CLASSIC I. At week 4 for risk difference p = 0.029, p = 0.005 and p = 0.004 for 40/20, 80/40 and 160/80 dose regimens, At week 4 for risk ratio p = 0.0357, p = 0.0088 and p = 0.0073 for 40/20, 80/40 and 160/80 dose regimens. LCI, (more...)

B, response 100

At week 4 the risk difference for response 100 (intervention–placebo) reached statistical significance only for the highest dose regimen while risk ratio (intervention/placebo) reached statistical significance for the two higher dose regimen groups. The results for response 100 are summarised in Figure 6.

FIGURE 6. Rates of response 100 in CLASSIC I.

FIGURE 6

Rates of response 100 in CLASSIC I. At week 4 for risk difference p = 0.279, p = 0.060 and p = 0.0015 for 40/20, 80/40 and 160/80 dose regimens respectively. At week 4 for risk ratio p = 0.284, p = 0.0682 and p = 0.0036 for 40/20, 80/40 and 160/80 dose (more...)

C, remission rates

Remission rates were the primary outcome in this RCT. For remission rates there was a statistically significant difference in favour of the two high-dose adalimumab regimens relative to placebo for the proportion of patients in remission at (45/151 vs 9/74; p = 0.004). At week 4 the risk difference (intervention–placebo) and risk ratio (intervention/placebo) reached statistical significance only in the highest dose regimen group. Remission rates are summarised in Figure 7.

FIGURE 7. CLASSIC I remission rates.

FIGURE 7

CLASSIC I remission rates. At week 4 risk difference p = 0.354, p = 0.057 and p = 0.0005 for 40/20, 80/40 and 160/80 dose regimens. At week 4 risk ratio p = 0.359, p = 0.0691 and p = 0.0021 for 40/20, 80/40 and 160/80 dose regimens respectively. LCI, (more...)

For each of the three CDAI-based binary outcome measures there was an apparent linear dose response trend with greater effectiveness for higher dose.

D, other outcomes

At week 4, favourable responses to treatment were reported for CDAI scores, for QOL scores (IBDQ), and for CRP levels. The results reported are summarised in Table 11.

TABLE 11. Mean (SD) values for CDAI, IBDQ and CRP concentrations at baseline and week 4.

TABLE 11

Mean (SD) values for CDAI, IBDQ and CRP concentrations at baseline and week 4.

E, other considerations – subgroup analyses

Logistic regression failed to show a relationship between baseline CRP levels or concomitant immunosuppressive therapy and remission rates at week 4 with placebo or adalimumab.

For the small subgroup of patients with fistulas (11%), no significant differences were observed between placebo and intervention with regard to fistula improvement or remission.

Quality assessment (based on published report)

Randomisation, allocation concealment and blinding were adequate. Baseline characteristics were reasonably well balanced between groups. There were no losses to follow-up, and withdrawals were limited to 5%. Efficacy estimates appear to have been calculated using ITT analysis, but this was not stated explicitly. A power calculation was conducted; this assumed 20% and 45% remission rates in the placebo and intervention arms respectively (the observed placebo rate in the trial was about 12%). Last observation carried forward was used for analysis of IBDQ scores, but the number of missing data was not stated.

CLASSIC I.63 Summary of effectiveness evidence

Two subcutaneous injections of adalimumab given 2 weeks apart at 40 mg then 20 mg, at 80 mg then 40 mg, or at 160 mg then 80 mg, were more effective than placebo at achieving remission (CDAI score < 150) at week 4 after the first injection (p = 0.004 for the two high-dose regimens combined vs placebo). The percentage of placebo-treated patients gaining remission at week 4 was ~12% compared with between ~18% and ~36% for adalimumab-treated patients depending on dose regimen received. Point estimates of response 70 rates, response 100 rates and remission rates were associated with considerable uncertainty, but for all three outcome measures a trend was evident for higher doses to be more effective. At week 4 of follow-up, risk differences (intervention–placebo) and risk ratios (intervention/placebo) for the highest dose regimen reached statistical significance in favour of adalimumab for all three outcomes. Subgroup analyses failed to identify any baseline characteristics associated with a better response to active intervention relative to placebo.

GAIN64 (adalimumab)

In this trial,64 325 patients were randomised to two subcutaneous injections 2 weeks apart of either placebo (n = 166) or adalimumab at a dose regimen of 160 mg then 80 mg (n = 159). To be included patients had to have been previously exposed to infliximab treatment and found to be intolerant (n = 190), unresponsive (n = 164), or intolerant and unresponsive (n = 40). The primary response was defined as the proportion of patients in remission at week 4 after the first injection.

A, response 70; B, response 100; and C, remission

The primary outcome was remission rates. The remission rate at week 4 was 7% in the placebo group and 21% in the adalimumab group (p < 0.001). This result and those for the secondary outcomes as reported are summarised in Table 12. The CDAI-based binary response outcome measures reported are summarised graphically in Figure 8. At weeks 2 and 4, risk differences (adalimumab–placebo) and risk ratios (adalimumab/placebo) were in favour of the intervention and reached statistical significance.

TABLE 12. Outcome measures reported in the GAIN trial.

TABLE 12

Outcome measures reported in the GAIN trial.

FIGURE 8. Response 70, response 100 and remission rates reported in GAIN.

FIGURE 8

Response 70, response 100 and remission rates reported in GAIN. At week 4 risk difference p = 0.001, p = 0.007 and p = 0.0002 for response 70, response 100 and remission respectively. At week 4 risk ratio p = 0.0014, p = 0.009 and p = 0.0006 for response (more...)

D, other outcomes

Results for these are also shown in Table 12. Mean CDAI scores reduced from baseline to a greater extent with adalimumab than with placebo (at week 4, p < 0.001 for mean change from baseline). At week 4 the improvements from baseline in IBDQ scores were 30 and 15 for the adalimumab and the placebo groups respectively. CRP levels at week 4 relative to baseline were more normalised in the intervention than in the placebo group. The change from baseline comparing adalimumab with placebo reached statistical significance in favour of adalimumab.

E, other considerations – subgroup analyses

The primary outcome (remission at week 4) was reported for subgroups of patients defined according to: previous response or intolerance to infliximab; receiving or not receiving immunosuppressive agents at baseline; receiving or not receiving corticosteroids at baseline; or having a negative or positive test for antibodies to infliximab. Risk difference was in favour of adalimumab relative to placebo for all subgroups.

A small proportion of patients (14%, n = 45) had draining fistulas or perianal fistulas at baseline. Rates of fistula improvement and remission were similar between placebo and adalimumab groups.

Quality assessment (based on published report)

Randomisation, allocation concealment and blinding were adequate. Baseline characteristics were well balanced between groups. There were no losses to follow-up, and withdrawals were limited to 4%. Efficacy estimates appear to have been calculated using ITT analysis for remission and response outcomes. For continuous variables such as IBDQ, last observation was carried forward; the number of missing data for IBDQ was small (eight patients). A power calculation was conducted; this assumed 20% and 35% remission rates in the placebo and intervention arms respectively (the observed rates at week 4 in the trial were 7% and 21% respectively).

GAIN.64 Summary of effectiveness evidence

Two subcutaneous injections of 160 mg and then 80 mg of adalimumab given 2 weeks apart were more effective than injections of placebo at achieving remission (CDAI score < 150) at week 4 after the first injection (p < 0.001). The percentage of placebo-treated patients gaining remission at week 4 was 7% (95% CI 4% to 12%) compared with 21% (95% CI 14% to 27%) for adalimumab-treated patients. At weeks 2 and 4 of follow-up, risk differences (intervention–placebo) and risk ratios (intervention/placebo) reached statistical significance in favour of adalimumab for remission, response 70 and response 100. A statistically significant difference in favour of adalimumab versus placebo was observed for change in IBDQ score at week 4 relative to baseline.

Pooling and indirect comparison

The two adalimumab trials differed with respect to their populations: CLASSIC I63 excluded patients if they had previously received any anti-TNF treatment while the GAIN64 trial recruited only patients who had previously experienced infliximab treatment but had proved intolerant or unresponsive; because of these clear population differences results from the two trials were not pooled. The existence of only a single induction trial for infliximab in this population precluded pooling.

No head-to-head induction trial of infliximab versus adalimumab has been conducted. A possible approach to compare effectiveness of the two drugs is by indirect comparison using trials with a ‘common’ comparator (e.g. placebo). The Targan et al. population,57 in contrast to that in GAIN,64 was naive to anti-TNF therapy and therefore indirect comparison between these trials was not judged productive. The placebo rates for remission and response 70 in Targan et al.57 were low compared with those in the adalimumab trials and are indicative of likely differences between the potentially ‘common’ comparator groups possibly stemming from the very small sample size of the placebo group in the Targan et al.57 trial. Because of the likely difference in target placebo populations, indirect comparison was judged more likely to be misleading than informative. It is relevant that neither industry submission undertook an indirect comparison between these induction trials. One way clinical heterogeneity may be expressed is in different response rates in placebo groups. Although CDAI scores at baseline may be similar between trials, this could mask considerable clinical heterogeneity because CDAI is a summary score and patients can achieve the same score yet may have problems with quite different aspects of their disease.

Maintenance trials in adults (wholly or predominantly non-fistulising)

These are trials in which all patients receive short-term induction therapy with anti-TNF and then proceed to longer term treatment with either placebo or anti-TNF. The predominant aim of these trials was to investigate whether anti-TNF was superior to placebo in maintaining any favourable clinical response observed from induction therapy. As no true placebo comparator existed during the induction therapy it is not possible to determine how much of the favourable clinical response seen from induction was actually attributable to active intervention. This complicates interpretation of results.

Four trials were identified, two with infliximab [Rutgeerts et al.58 and ACCENT I (Hanauer et al.3 and Rutgeerts et al.2)] and two with adalimumab [CLASSIC II (Sandborn et al.66) and CHARM (Colombel et al.67)]. These studies were characterised by distinct differences in induction regimens.

The Rutgeerts et al.58 trial was an extension of the Targan et al.57 infliximab induction trial. Patients eligible had received variably one or two previous infusions of placebo or of infliximab at doses of 5, 10 or 20 mg/kg. Patients with a response 70 were then eligible for the trial. The induction regimen of participants in this trial was variable and not clearly defined, making it difficult to identify the precise target population involved.

Similarly to Rutgeerts et al.,58 the CLASSIC II66 trial was an extension of a previously conducted induction trial, namely the CLASSIC I63 study of adalimumab. Patients eligible for CLASSIC II were required to be in remission (CDAI < 150) at week 4 of CLASSIC I and also 4 weeks later. These patients may have received two subcutaneous injections 2 weeks apart of various doses of adalimumab (40 mg then 20 mg, 80 mg then 40 mg, or 160 mg then 80 mg) or of placebo.

The ACCENT I2,3 (infliximab) and CHARM67 (adalimumab) trials were free-standing maintenance trials with more straight forward induction regimens. In ACCENT I patients received a single induction infusion of 5 mg/kg infliximab. In CHARM patients received subcutaneous induction injections of 160 mg of adalimumab and of 80 mg of adalimumab 2 weeks apart.

The main study and population characteristics are shown in Table 13. The main outcome measures described in the published reports of the four trials are summarised in Table 14.

TABLE 13. Main study and population characteristics: maintenance trials in adults predominantly or wholly non-fistulising.

TABLE 13

Main study and population characteristics: maintenance trials in adults predominantly or wholly non-fistulising.

TABLE 14. Outcomes measured in maintenance trials with mainly non-fistulising adult populations.

TABLE 14

Outcomes measured in maintenance trials with mainly non-fistulising adult populations.

Rutgeerts et al., 199958 (infliximab)

The Rutgeerts et al.58 trial was an extension of the Targan et al.57 infliximab induction trial and included 73 of the original 108 patients. Targan et al.57 consisted of a 4-week comparison between placebo and one dose of infliximab in three arms (5 mg/kg, 10 mg/kg or 20 mg/kg). This was followed after a maximum of 2 weeks by an open-label phase with 12 weeks of follow-up that started with the option of a 10 mg/kg dose of infliximab for week 4 non-responder patients. To be eligible to enrol in Rutgeerts et al.58 the Targan et al. week 4 responder patients needed to achieve a response 70 at week 8, and the week 4 non-responder patients needed to achieve a response 70 at week 8 after the open-label option of a 10 mg/kg infusion of infliximab. Four weeks after qualifying (week 8 after induction infliximab or 8 weeks after open-label infliximab) the eligible patients were randomised to i.v. infusion of placebo or 10 mg/kg infliximab (designated week 12 of maintenance phase) and a further three infusions at 8-week intervals (a total of four infusions after becoming eligible to participate; administered weeks 12, 20, 28 and 36). Follow-up continued to week 48.

The induction regimen in this study was variable between patients in duration and in exposure to infliximab. In consequence, induction was ill-defined and the distinction between the induction regimen and maintenance regimen was also unclear. The eligible patients could have received any of the following possible infusions of infliximab: no infliximab (placebo), one 5 mg/kg infusion, one 10 mg/kg infusion or one 20 mg/kg infusion; a second infusion of 10 mg/kg could be given (to any patients) at week 4 if there was no response. Four patients received no infliximab (placebo and no second 10 mg/kg dose as a response was achieved). How closely the trial induction phase corresponds to the licence indication is uncertain.

A, response 70

No primary outcome measure was identified. The response 70 results presented (summarised in Figure 9) referred to point prevalence at assessment time points and do not necessarily indicate maintenance of individual patient response. At week 8 > 90% of patients had a response 70 (CDAI reduced by > 70 points relative to baseline in Targan et al.57). At week 12 (randomisation week) this had diminished to about 75% and by week 48 had further diminished to 33% in the placebo group and 57% in the infliximab group (p = 0.038 for risk difference and p = 0.054 for risk ratio). Point estimates were associated with considerable uncertainty. The authors stated that of patients with response 70 at the last infusion (week 36), 62% of the infliximab group and 37% of the placebo group maintained their response for the 8 weeks to week 44 (p = 0.16).

FIGURE 9. Response 70 rates in Rutgeerts et al.

FIGURE 9

Response 70 rates in Rutgeerts et al. At weeks 24 and 48 risk difference p = 0.094 and p = 0.038 respectively. At weeks 24 and 48 risk ratio p = 0.108 and p = 0.054 respectively. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, (more...)

B, response 100

This outcome was not reported.

C, remission

The point prevalence of remission at different follow-up weeks was reported (results are summarised in Figure 10). Point estimates were associated with considerable uncertainty. At randomisation (week 12) ~38% of patients were in remission in the infliximab group; this increased to ~ 60% during weeks 16–40. The corresponding values for the placebo group were ~ 44% (week 12) and 35% (weeks 16–40). Risk difference (infliximab–placebo) and risk ratio (infliximab/placebo) just reached statistical significance (p < 0.05) at most time points for weeks 16–40.

FIGURE 10. Remission rates in Rutgeerts et al.

FIGURE 10

Remission rates in Rutgeerts et al. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

D, other outcomes

Time to loss of response for patients achieving a response at ‘any time’ during follow-up after randomisation was reported. The criteria for loss of response were not explicit. Over 48 weeks it is possible for a patient to enter a response state on several occasions. The publication did not make clear which occasion(s) were used in the analysis, or how and if double counting was avoided. The log rank test for difference between placebo and infliximab groups just failed to reach statistical significance (p = 0.057).

Median CDAI score, median IBDQ score and median CRP concentrations were reported, but range of values and statistical analyses for these outcomes were not presented. The results were in favour of infliximab relative to placebo with greater reduction in CDAI scores, larger increases in IBDQ scores and more ‘normalisation’ of CRP concentrations. The results published are summarised in Figure 11.

FIGURE 11. Median CDAI, IBDQ and CRP levels reported in Rutgeerts et al.

FIGURE 11

Median CDAI, IBDQ and CRP levels reported in Rutgeerts et al. Data taken from published graphs and redrawn. Where necessary the authors carried last observation forward.

Quality assessment (based on published report)

Randomisation, allocation concealment and blinding were adequate. Baseline values for those characteristics reported were evenly balanced, but values for CRP, which was not balanced in the original Targan et al. trial,57 were unclear. Analysis of response 70 and remission rates was by ITT; the results presented were point prevalence values at various follow-up times, and they therefore represent maintenance of response at the group level only and not maintenance by individual patients. For continuous outcomes, last observation was carried forward where necessary, but the number of missing data was not reported. No primary outcome was identified and no power calculation was described; the combined trials appear to have been powered only for the induction analysis of Targan et al.57 (at week 4 of that study). The maintenance part of the study was probably underpowered. About 33% of patients withdrew.

Rutgeerts et al., 1999.58 Summary of effectiveness evidence

The study recruited patients from among responders (CDAI score reduced by 70 points) following on from the Targan et al. trial57 and the resulting induction phase varied between patients in both duration and dose regimen. Subsequent maintenance treatment with infliximab (four infusions of 10 mg/kg at 8-week intervals) generated a greater proportion of patients with a response 70 and with remission than did treatment with placebo. Point prevalence estimates for these outcomes were associated with considerable uncertainty. The trial left unanswered how well a clinical response is sustained at the individual patient level.

ACCENT I2,3 (infliximab)

This was a free-standing maintenance trial (i.e. newly started).2,3 There were 580 eligible patients (CDAI range 220–400), of whom 573 received a single induction infusion of 5 mg/kg infliximab. Two weeks later patients were randomised to placebo, to 5 mg/kg infliximab at weeks 2 and 6 and then every 8 weeks to week 54, or to 5 mg/kg at weeks 2 and 6 and then 10 mg/kg infliximab every 8 weeks to week 54 (these groups are here termed 5 mg/kg and 10 mg/kg groups respectively).

At week 2 (randomisation week) patients were classified as responders (335/573, 58.5%) or non-responders (238/573, 41.5%) depending on whether they achieved a response 70 (a reduction of > 70 points in CDAI score at week 2 relative to baseline). At week 14 patients who initially responded but then worsened were eligible to cross over to treatment with increased dosage of infliximab; this crossover treatment for the placebo group was termed ‘episodic treatment’. The results for responders were published in 2002 (Hanauer et al.3) and patients who crossed over to increased dosage after week 14 for most of these analyses were considered as treatment failures.

Effectiveness results published for responders only in 2002 (Hanauer et al.3) are reviewed below, and results for all patients, irrespective of responder status at week 2 and published in 2004 (Rutgeerts et al.2), are considered in the following section.

ACCENT I: results for responders

Of the 335 responders (58.5% of those who had received an induction dose of 5 mg/kg infliximab), 110 were randomised to placebo, 113 to the 5 mg/kg infliximab group and 112 to the 10 mg/kg infliximab group.

A, response 70

The published results for responders3 included graphical presentation of point prevalence of response 70 at weeks 30 and 54. These results are summarised in Table 15. A statistically significant difference in rates in favour of infliximab versus placebo was reported for both infliximab groups at weeks 30 and 54. The manufacturer's submission provided point prevalence rates for response 70 for all assessment visit weeks from 2 to 54. These results are summarised in Figure 12.

TABLE 15. Published response 70 rates for responders at weeks 30 and 54 in ACCENT I.

TABLE 15

Published response 70 rates for responders at weeks 30 and 54 in ACCENT I.

FIGURE 12. Response 70 rates for responders throughout follow-up in ACCENT I.

FIGURE 12

Response 70 rates for responders throughout follow-up in ACCENT I. CIC, commercial-in-confidence; LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

Point estimates were associated with appreciable uncertainty. Week 2 response rates of ~90% had diminished in all groups by week 54 to 15% in the placebo group and 38% and 47% in the 5-mgkg and 10 mg/kg infliximab groups respectively. Risk differences (infliximab–placebo) remained fairly constant from week 14 onwards. Risk differences and risk ratios (infliximab/placebo) reached statistical significance in favour of infliximab at all visit times from week 10 to week 54. It is unclear why week 2 response rates were less than 100%; it is possible some patients with a 70-point CDAI reduction from baseline nevertheless required surgery or a change in concomitant medication for worsening of clinical condition. After week 2, decline of response occurred in both placebo and intervention groups, then after week 10 risk differences remained similar [e.g. for the 5 mg/kg arm risk differences (infliximab–placebo) remained similar after week 14 as follows: at weeks 10, 14, 22, 30, 38, 46 and 54 risk differences were 0.14, 0.23, 0.26, 0.24, 0.21, 0.23 and 0.23 respectively]. This suggested that most benefit of infliximab was delivered in the first 10–12 weeks of the trial.

B, response 100

This outcome was not reported.

C, remission rates

Remission was a coprimary outcome. The results published for remission at week 30 and week 54 are summarised in Table 16. For this outcome patients who worsened and crossed over to ‘episodic treatment’ (allowed from week 14 onwards) were counted as treatment failures (i.e. as no longer in remission). The results reported measured the point prevalence of remission for each group at week 30 and did not require maintenance of response from week 2 to week 30 at the patient level. A statistically significant greater proportion of patients were in remission at weeks 30 and 54 in the infliximab groups than in the placebo group. At week 30 the risk differences (infliximab–placebo) were 18% and 25% for the 5 mg/kg and 10 mg/kg groups respectively and the corresponding numbers needed to treat (NNT) (30 weeks) were 5.66 and 4. Note this NNT estimate does not include non-responders who had been administered induction infliximab. The point prevalence of remission had diminished somewhat by week 54.

TABLE 16. Remission rates for responders reported at weeks 30 and 54 in ACCENT I.

TABLE 16

Remission rates for responders reported at weeks 30 and 54 in ACCENT I.

The unpublished Industry Trial Report for ACCENT I2,3 provided information regarding the maintenance of remission at the individual patient level for weeks 14–54. The percentages were slightly discrepant with those in the published report as indicated in Table 17.

TABLE 17. Patient level maintenance of remission reported in ACCENT I.

TABLE 17

Patient level maintenance of remission reported in ACCENT I.

The manufacturer's submission and the Industry Trial Report provided commercial-in-confidence (CiC) point prevalence rates for remission for all assessment visits from weeks 2 to 54. These results are summarised in Figure 13.

FIGURE 13. Remission rates for responders throughout follow-up in ACCENT I.

FIGURE 13

Remission rates for responders throughout follow-up in ACCENT I. At week 30 risk difference p = 0.0027 and p < 0.0001 for 5 mg/kg and 10 mg/kg groups respectively. At week 30 risk ratio p = 0.0047 and p = 0.00025 for 5 mg/kg and 10 mg/kg groups (more...)

Point estimates were associated with appreciable uncertainty (CiC information has been removed). From week 10, remission rates diminished in all groups and risk difference (infliximab–placebo) diminished or remained fairly constant; risk differences and risk ratios (infliximab/placebo) reached statistical significance at all visit times from week 10 onwards. It is evident that loss of remission was continuous after weeks 6–10 of follow-up and that the advantage of intervention over placebo was mostly gained by about weeks 6–10, the phase of the study during which dose frequency was greatest. Thereafter decline of response was about the same for both placebo and intervention groups despite continued infliximab every 8 weeks in the treatment arms; for example, for the 5 mg/kg arm risk differences (infliximab–placebo) remained similar after week 14 as follows: at weeks 10, 14, 22, 30, 38, 46 and 54 risk differences were 0.15, 0.21, 0.20, 0.18, 0.15, 0.15 and 0.15 respectively.

D, other outcomes

The primary outcome in ACCENT I2,3 was identified as time to loss of response. (Note: a protocol amendment added the proportion of responder patients in remission at week 30 as a coprimary outcome, which has been reported above.) Loss of response was defined as a CDAI of ≥ 175, a CDAI increased by ≥ 35% and a CDAI increased by ≥ 70 points relative to the qualifying value for a response on at least two consecutive assessments, or requirement for change in medication or requirement for surgery. Assessments were scheduled at weeks 0, 2, 6, 10, 14 and then every 8 weeks to week 54. With this definition of loss of response, it is possible for an individual responder to no longer qualify as achieving a response 70 status, but counter-intuitively nevertheless to not have lost response. (For example, an individual with a CDAI of 221 at enrolment would qualify as a responder at week 2 with a CDAI score reduced by 71 points to 150. If this patient's CDAI subsequently rose to 170 he or she would no longer be in a response 70 but would nevertheless not have lost response because the increase in score from week 2 was < 70 points, < 35% of week 2 score and below a score of 175.) For this primary outcome, patients in the active intervention arms had significantly longer time to loss of response than patients given placebo (p = 0.0002, log rank test). The median times to loss of response are summarised in Table 18.

TABLE 18. Median time to loss of response in responders in ACCENT I.

TABLE 18

Median time to loss of response in responders in ACCENT I.

Published effectiveness results for responders included median CDAI scores and median IBDQ scores. These are summarised in Table 19. For missing values of CDAI and IBDQ, the nearest observation was carried forward. CDAI scores and IBDQ scores diminished and increased respectively to a greater extent in the infliximab groups than in the placebo group. The IQRs for median values during follow-up were not reported.

TABLE 19. Median CDAI and IBDQ scores for responders during follow-up in ACCENT I.

TABLE 19

Median CDAI and IBDQ scores for responders during follow-up in ACCENT I.

The manufacturer's submission provided information about QoL measures (SF-36). The SF-36 scores were reported separately for mental and physical components for weeks 30 and 54 of the trial, and mean improvement from baseline was reported. SDs of values were provided. The results are summarised in Table 20. Change from baseline for SF-36 physical component reached statistical significance in favour of infliximab at both weeks 30 and 54.

TABLE 20. SF-36 results reported for responders in ACCENT I.

TABLE 20

SF-36 results reported for responders in ACCENT I.

Median daily steroid dose was reduced by week 14 in all groups and then remained constant. The reduction in the infliximab groups was greater than that for the placebo group. The odds ratio for discontinuation of steroid use (infliximab/placebo) at week 54 was 4.2 (95% CI 1.5 to 11.5).

E, other considerations – subgroup analysis of remission rate in severe CD patients

The manufacturer's submission for infliximab provided CiC information about the proportion of responder patients who initially had severe disease (defined as a baseline CDAI score > 300) and who achieved remission status during follow-up. Results presented referred to patients classified as having severe disease who were randomised to the 5 mg/kg infliximab group [n = 63/113 (56%)] and placebo group [n = 48/110 (44%)]. No information was provided regarding patients with severe disease among non-responders. The remission rates in placebo and 5 mg/kg infliximab arms and the risk difference for this subgroup of patients are shown in Figure 14. Remission rates were slightly poorer in this more severe CDAI group than for all responders, but a similar pattern was shown during follow-up, in that most of the advantage from the intervention was achieved with the first three doses (early phase). Thereafter, remission decayed at approximately similar rates in the two arms even though patients in the intervention arm received further doses of infliximab and risk differences decreased from week 14 onwards.

FIGURE 14. Remission rates, risk difference and risk ratio (severe disease responders ACCENT I).

FIGURE 14

Remission rates, risk difference and risk ratio (severe disease responders ACCENT I). LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

ACCENT I2,3 (responders): quality assessment (based on published report of Hanauer et al.3)

Randomisation, allocation concealment and blinding were adequate. Baseline characteristics were only reported for all patients (i.e. for all responders and for all non-responders). It was therefore not possible to judge if baseline characteristics were evenly balanced between the three arms of responders that were analysed for effectiveness outcomes. Similarly the number of patients who withdrew was reported for all enrolled patients and it was not possible to determine how many responders discontinued their randomised treatment. Where necessary, the nearest or last observation was carried forward for continuous outcomes but the number of missing data was not reported. A power calculation was conducted and based on the primary outcome of loss of response. The definition of loss of response was complex and did not correspond to a failure to maintain a response 70 status, and its clinical meaning was difficult to gauge.

ACCENT I: results for all patients (Rutgeerts et al.2) (infliximab)

The results for all 573 patients who received an induction dose in ACCENT I2,3 were presented by Rutgeerts et al.2 in a paper published 2 years after that, describing results for responders only. Separate results for non-responders have not been published. The 573 patients were 335 responders and 238 non-responders (defined according to whether a 70-point reduction in CDAI score was attained by week 2 after the induction infusion).

Randomisation at week 2 resulted in allocation of 188 patients to the placebo group, 192 to the 5 mg/kg group and 193 to the 10 mg/kg group.

The authors stated ‘the primary objective of the analysis was to examine the difference in efficacy between episodic and scheduled treatment strategies with infliximab under conditions that simulate clinical practice’. For this purpose the patients in the original placebo group were designated as receiving ‘episodic strategy’, and those in the infliximab groups as receiving a ‘5 mg/kg scheduled strategy’ and a ‘10 mg/kg scheduled strategy’ respectively. From week 14 onwards patients who had shown a response to infliximab therapy at any time but then worsened were eligible to cross over to ‘active episodic treatment as needed with infliximab 5, 10, and 15 mg/kg for patients originally assigned to episodic, 5 mg/kg scheduled, and 10 mg/kg scheduled treatment strategies respectively’. This description is confusing as it clearly states that active episodic treatment is given in both episodic and scheduled strategies, which renders a comparison of episodic and scheduled strategies problematic. The publication designates the start of episodic treatment to be week 14 (see Appendix 9 for patient flow through the trial).

ACCENT I.2,3 Summary of effectiveness evidence for responders

Of the 573 patients (with baseline CDAI 220–400), 58.5% (335) achieved response 70 2 weeks after a single induction infusion of 5 mg/kg infliximab. These patients were designated ‘responders’. It is unclear if the three trial arms of randomised responders were well balanced at baseline. Of responders, (CiC information has been removed)% were in remission (CDAI < 150) at week 2. This represented (CiC information has been removed)% of the original 573 patients. The proportion of responders with remission had declined by week 30 to 23% (95% CI 14% to 29%) for those who only received placebo after induction and to 39% (95% CI 30.% to 48%) for those who received four infusions of 5 mg/kg infliximab (at weeks 2, 6, 14 and 22) and to 42% (95% CI 36% to 55%) for those who received four infusions consisting of 5 mg/kg at weeks 2 and 6 and 10 mg/kg at weeks 14 and 22. Risk differences (infliximab–placebo) and risk ratios (infliximab/placebo) for remission at week 30 reached statistical significance in favour of infliximab for both infliximab groups. By week 54 the percentage of patients in remission had diminished further in all three groups. Most of the advantage of intervention relative to placebo was achieved by weeks 10–14; thereafter risk differences remained fairly stable. A similar pattern of results was observed for response 70. Published information regarding maintenance of remission at the patient level (as distinct from group level) was meagre. Between weeks 14 and 54, 11% of placebo patients retained remission at all six study visits; the corresponding values were 25% and 33% respectively for 5 mg/kg and 10 mg/kg infliximab groups. Somewhat lower values of (CiC information has been removed)%, (CiC information has been removed)% and (CiC information has been removed)% respectively were quoted in the Industry Trial Report. Results favouring infliximab over placebo were reported for several other outcomes including median CDAI scores and median IBDQ scores. These measures required last or nearest observation carried forward in order to allow for missing data.

The treatment regimens received before week 14 in each of the randomised groups were as follows:

  • placebo/‘episodic group’: 5 mg/kg infliximab week 0, placebo weeks 2 and 6
  • 5 mg/kg group ‘scheduled strategy’: 5 mg/kg infliximab weeks 0, 2 and 6
  • 10 mg/kg group ‘scheduled strategy’: 5 mg/kg infliximab weeks 0, 2 and 6.

Treatment to week 14 was therefore similar for the two infliximab ‘scheduled strategy’ groups and was determined according to randomisation. From week 14, crossover to an increase in infliximab dosage was allowed in all three trial arms for patients whose CD worsened. The criteria for worsening were ‘an increase CDAI of ≥ 70 points from the qualifying score with a total score of at least 175, an increase in CDAI of 35% or more from baseline value, or the introduction of new treatment for active Crohn's disease’. From week 14 onwards it was possible for patients in different arms to be receiving identical infliximab treatment; for example, a placebo patient might cross over at week 14 to receive 5 mg/kg and this corresponds to treatment received by a 5 mg/kg ‘scheduled strategy’ patient who did not cross over. This complicates the interpretation of any comparisons between groups.

A, response 70

No primary outcome was identified. Analyses were according to randomised group irrespective of crossover after week 14 to different treatment regimen, and comparisons were drawn between the ‘episodic group’ and the two ‘scheduled strategy’ groups. The results for response 70 for all patients in ACCENT I are summarised in Figure 15. By week 14, statistically significant differences in CD status were evident between placebo group and intervention groups (p-values for risk differences and risk ratios are shown in Table 21). Risk differences and risk ratios for comparison between ‘episodic’ and ‘scheduled’ strategies after week 14 were in favour of ‘scheduled strategies’ but failed to reach statistical significance at most time points. Interpretation of these differences is problematic.

FIGURE 15. Response 70 rates for all patients in ACCENT I.

FIGURE 15

Response 70 rates for all patients in ACCENT I. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

TABLE 21. p-values for comparison of response 70 rates at week 14 for all patients in ACCENT I.

TABLE 21

p-values for comparison of response 70 rates at week 14 for all patients in ACCENT I.

B, response 100

This outcome was not reported.

C, remission rates

Figure 16 summarises the published results for rates of remission at clinic visits to end of follow-up (week 54). Week 14 remission rates were greater in the two ‘scheduled treatment’ arms (37.5% in the 5 mg/kg group and 43% in the 10 mg/kg group) than in the ‘episodic’ group (25.5%). p-values for week 14 comparisons between placebo and intervention groups are shown in Table 22.

FIGURE 16. Remission rates for all patients in ACCENT I.

FIGURE 16

Remission rates for all patients in ACCENT I. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

TABLE 22. p-values for comparison of remission rates at week 14 for all patients in ACCENT I.

TABLE 22

p-values for comparison of remission rates at week 14 for all patients in ACCENT I.

Treatment regimens up to week 14 were strictly pre-specified and designed to examine effectiveness for maintenance of the induced response. After week 14 treatment regimens became variable (termed ‘episodic’ by the authors). It is clear that by week 14 the CD status of patients in the placebo/'episodic' arm had departed from that of patients in the two ‘scheduled strategy’ arms; this means that at baseline (week 14) for the comparison of ‘episodic’ with ‘scheduled strategies’, the groups were imbalanced. Comparisons between ‘episodic’ and ‘scheduled’ strategies after week 14 are not randomised comparisons. For a randomised comparison of the two strategies patients should have been rerandomised at week 14. Such rerandomisation was precisely the study design adopted by Menter et al.70 when comparing continuous with intermittent treatment strategies with infliximab in psoriasis.

Risk differences and risk ratios for comparison between ‘episodic’ and ‘scheduled’ strategies after week 14 were in favour of ‘scheduled strategies’, but failed to reach statistical significance at nearly all time points. Interpretation of these differences is problematic because, as described above, the comparisons are not between properly randomised groups and because patients in all groups were allowed the option of ‘episodic’ treatment.

D, other outcomes

Median CDAI score and the proportion of patients with IBDQ score > 170 were reported and are summarised in Table 23 and presented graphically in Appendix 10. By week 14, statistically significant differences in CDAI median scores were evident between the placebo group and the intervention groups. Differences were less pronounced after week 14, especially for the placebo versus 5 mg/kg comparison. The percentage of patients with IBDQ score > 170 did not differ significantly between placebo and 5 mg/kg groups, but after week 14 favoured the 10 mg/kg group relative to placebo.

TABLE 23. CDAI and IBDQ results for all patients in ACCENT I.

TABLE 23

CDAI and IBDQ results for all patients in ACCENT I.

The manufacturer's submission provided information regarding CD-related hospitalisation rates and rates for intra-abdominal surgery. These rates and the relative risk for the 5 mg/kg ‘scheduled maintenance’ group relative to the ‘episodic’ are summarised in Table 24. The results for mucosal healing observed for a small subgroup of patients (n = 58) at European study centres who underwent endoscopy examination are also tabulated. The interpretation of the comparisons is problematical for the reasons already described, in particular after week 14. The extent to which avoidance of hospitalisation and abdominal surgery might depend on the administration of active intervention is not measurable because no true control (placebo) group existed after that time.

TABLE 24. Endoscopy, hospitalisation and abdominal surgery results: all patients in ACCENT I.

TABLE 24

Endoscopy, hospitalisation and abdominal surgery results: all patients in ACCENT I.

Quality assessment of ACCENT I2,3 (all patients): (based on published report of Rutgeerts et al.2)

Randomisation, allocation concealment and blinding were adequate. Baseline characteristics at week 0 were well balanced. Where necessary, the nearest or last observation was carried forward for continuous outcomes but the number of missing data was not reported. No power calculation was conducted for the analysis of all patients. The number of patients who withdrew was reported except for patients who crossed over to a 15 mg/kg dose regimen from a 10 mg/kg regimen. The proportion of patients who withdrew before the end of the trial was substantial.

Trial design, withdrawals, crossovers and validity of comparisons It must be questionable whether the ‘episodic’ (placebo) arm did ‘simulate clinical practice’ as stated to be an objective of the study. Patients in this arm of the study received one dose of 5 mg/kg infliximab at week 0, followed by an interim period of > 3 months with no active infliximab therapy before the ‘episodic’ use of infliximab according to worsening disease (for patients ‘who had responded at any time to infliximab therapy’). There is little evidence to support the idea that this resembles clinical practice. The scheduled strategy is difficult to define as it did not follow a prescribed programme of treatment as might be anticipated by the term ‘scheduled strategy’, but encompassed ‘episodic’ treatment in the same manner as the ‘episodic’ arm.

Because of the large numbers of patients who withdrew from treatment and crossed over to dose escalations, the actual treatments received in the three different trial arms are difficult to define. Figure 17 summarises the progression of patients through the trial with respect to withdrawal from treatment and crossover to increased dose of infliximab.

FIGURE 17. Withdrawals and crossovers in ACCENT I.

FIGURE 17

Withdrawals and crossovers in ACCENT I. * (CiC information has been removed). R, randomisation. Note: Dropouts (DOs) and week 14 or later crossovers (XOs) allowed ‘as required’.

Over a period of 1 year, about a quarter of patients withdrew from treatment, and of those allocated active intervention at randomisation only about half completed the trial receiving the treatment regimen to which they had been allocated at randomisation.

The authors' stated primary objective ‘…was to examine the difference in efficacy between episodic and scheduled treatment strategies with infliximab’.2 They concluded that the scheduled treatment strategy was superior to episodic treatment. Unfortunately, the comparisons were compromised by strong biases introduced as a result of the study design. These biases are explained below:

  1. Crossover to increased infliximab was allowed for patients ‘who had responded at any time to infliximab therapy’ and subsequently worsened. In the placebo group (‘episodic strategy’), 78 of 188 patients (41%) were classified at week 2 as non-responders and received no further infliximab to week 14; these patients were unlikely to become responsive and therefore to qualify for crossover to active intervention. In contrast to this group the week 2 non-responders in the ‘scheduled strategy’ arms received additional doses of infliximab (5 mg/kg) at both weeks 2 and 6, boosting their opportunity to ‘respond at any time’ to infliximab. The greater opportunity to respond at any time in the ‘scheduled strategy’ arms represents a strong bias in their favour in any subsequent comparison with the episodic arm. Relative to the scheduled strategy this resulted in a substantial proportion of patients in the episodic arm being denied access to active therapy. This is reflected in the very large difference between arms in their exposure to infliximab stated to be 3 and 5 times greater in the two scheduled strategy arms than in the episodic arm.
  2. Episodic treatment was introduced at week 14 of the trial, but by this time the CD status of patients in the placebo ‘episodic’ arm was significantly inferior to that in the scheduled strategy arms in terms of several efficacy measures. This advantage for the scheduled strategy arms is reflected in increases not seen in the placebo group from week 2 in the response 70 rates and at weeks 6 and 10 in the remission rates. The result is a bias in favour of scheduled strategy for any comparison between strategies at times after week 14. Essentially, the compared arms were unbalanced at the start of the compared strategies (week 14).

ACCENT I.2,3 Summary of effectiveness evidence for all patients

Two infusions of 5 mg/kg infliximab at weeks 2 and 6 after a single induction infusion of 5 mg/kg were better than placebo infusions at generating remission and response 70. At week 14, risk differences (infliximab–placebo) and risk ratios (infliximab/placebo) were in favour of infliximab and reached statistical significance (p < 0.02 for remission, p < 0.01 for response 70).

At week 14, ‘episodic’ treatment was introduced and subsequent comparisons were made between the original placebo arm (designated ‘episodic treatment strategy’) and original infliximab arms (termed ‘scheduled treatment strategies’). Because of bias strongly in favour of scheduled strategy groups, the post 14-week comparisons were not valid estimates of the relative effectiveness of strategies. Biases identified arose from: (a) reduced opportunity for crossover to active therapy for patients in the episodic group compared with the scheduled groups; and (b) gross imbalance in disease status at the start of the strategies (week 14).

Difficulties in interpreting post 14-week comparisons between groups were compounded by the very high rate of withdrawal from treatment and the use of ‘episodic’ treatment in all three arms of the trial so that the distinction between episodic and scheduled strategies was obscured except for the fact that the original infliximab groups were allowed larger dosages of active intervention.

CLASSIC II66 (adalimumab)

The CLASSIC II66 trial was an extension of the previously conducted adalimumab induction trial, CLASSIC I,63 which had enrolled 299 patients. To be eligible for CLASSIC II, patients were required to be in remission (CDAI < 150) at week 4 of CLASSIC I and also 4 weeks later (equivalent to week 8 of CLASSIC I and designated week 4 of CLASSIC II). These patients may have received two subcutaneous injections 2 weeks apart of various doses of adalimumab (40 mg then 20 mg, 80 mg then 40 mg, or 160 mg then 80 mg) or two injections of placebo. Fifty-five eligible patients entered CLASSIC II, this means about 12 patients did not retain remission from weeks 4–8 of CLASSIC I or declined to participate. The 55 patients were randomised at week 4 of CLASSIC II to receive placebo (n = 18) or 40 mg of adalimumab every other week (e.o.w.) (n = 19) or 40 mg of adalimumab weekly (n = 18) from weeks 4 to 54. Thus CLASSIC II analysed only strong responders from the CLASSIC I trial.

For the purposes of the ‘primary efficacy analysis’, patients who had continued non-response defined as ‘a decrease in CDAI ≤ 70 points vs. Week-0 value in CLASSIC I’ were considered treatment failures and became eligible for open-label treatment. This means patients in remission at start of CLASSIC II became treatment failures if they ceased to qualify as response 70 responders relative to their baseline CDAI score in CLASSIC I. In addition, patients who flared during CLASSIC II follow-up were also counted as treatment failures and were eligible for open-label treatment. CD flare was defined as an increase of ≥ 70 points above the week 4 CLASSIC II value (which by definition was < 150) AND a CDAI score > 150 (no longer in remission). Thus a patient in remission at week 4 (CLASSIC II) with a CDAI score of 149 would need to move to a CDAI of at least 219 to be classified as having experienced flare. For this patient a score of 218 would not count as a flare but could count as treatment failure if his or her week 0 CLASSIC I CDAI score had been < 288 (for reference the mean baseline CDAI score at week 0 for 299 CLASSIC I patients was 298).

A, response 70 and B, response 100

Response 100 and response 70 rates throughout follow-up were among the secondary outcome measures of efficacy. Results reported for responses 100 and 70 are summarised in Figure 18. The placebo rates were high for these less rigorous measures of effectiveness, and the risk differences (adalimumab–placebo) and risk ratios (adalimumab/placebo) failed to reach statistical significance at most time points.

FIGURE 18. Response 100 (upper panel) and response 70 (lower panel) rates in CLASSIC II.

FIGURE 18

Response 100 (upper panel) and response 70 (lower panel) rates in CLASSIC II. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

C, remission

The primary outcome was the proportion of patients in remission at week 56 in each arm of the randomised cohort. Remission throughout follow-up was among the secondary outcome measures of efficacy. For the primary outcome, 10 patients (18%) withdrew before week 56 (five from placebo and five from adalimumab). These were counted as remission failures for the primary analysis. Remission rates at week 56 are summarised in Table 25. Remission rates during the trial are summarised in Figure 19.

TABLE 25. Remission rates at week 56 in CLASSIC II (primary outcome).

TABLE 25

Remission rates at week 56 in CLASSIC II (primary outcome).

FIGURE 19. Remission rates in CLASSIC II.

FIGURE 19

Remission rates in CLASSIC II. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

Point estimates of remission rate during the trial were associated with considerable uncertainty, reflecting the small number of patients in the trial. The fact that rates rose and fell during follow-up indicated the values reported referred to point prevalence. Nearly half of patients in the placebo group were in remission at week 56 despite not receiving active intervention from 2 weeks prior to randomisation onwards. Risk differences (intervention–placebo) and risk ratios (intervention/placebo) were in favour of intervention at all follow-up times and reached statistical significance at several time points.

D, other outcomes

The results published for continuous measures are summarised in Table 26. These measures involved last observation carried forward to allow for missing values. The amount of missing values was not published but was available (CiC) in the unpublished Industry Trial Report. For week 56 changes in favour of adalimumab relative to placebo were reported for mean IBDQ and CDAI scores.

TABLE 26. IBDQ scores, CDAI scores and CRP concentrations reported in CLASSIC II.

TABLE 26

IBDQ scores, CDAI scores and CRP concentrations reported in CLASSIC II.

At the start of CLASSIC II,66 49% of patients were receiving systemic steroids or budesonide; seven of the placebo group, seven of the e.o.w. adalimumab group, and eight of the weekly adalimumab group. Using the last observation carried forward it was reported that by week 56 the number who had discontinued steroids was four in both the placebo and e.o.w. adalimumab groups, and seven in the weekly adalimumab group.

E, other considerations – open-label study

Most patients from CLASSIC I63 who did not qualify for CLASSIC II66 participated in an open-label study in parallel with CLASSIC II. The results reported were not randomised comparisons and are outwith the inclusion criteria for this report.

Quality assessment (based on published report)

Randomisation, allocation concealment and blinding were adequate. Baseline characteristics at week 0 were well balanced. The study was powered for the primary outcome (remission at week 4) of CLASSIC I, and no further power calculation was conducted for CLASSIC II. The number of patients who withdrew was reported; 5 of 18 placebo patients withdrew and 5 of 37 patients given adalimumab withdrew. There were 32 patients (58%) who completed to 56 weeks of double-blind follow-up. The last observation was carried forward as necessary for continuous outcomes, but the number of missing data was not reported.

CLASSIC II.66 Summary of effectiveness evidence

The trial population (n = 55) was recruited from responders in the previous CLASSIC I63 adalimumab induction trial (n = 299). Only responders with a strong response (remission for at least a month) were selected; they had received various induction dose regimens.

Maintenance injections of 40 mg of adalimumab administered weekly or e.o.w. generated a statistically significant greater proportion of patients in remission at week 56 than did placebo (frequency of administration not published). About half of the placebo group and 81% of those who received infliximab were in remission at week 56. Point estimates of response rates were associated with considerable uncertainty due to the small size of the trial. There were no statistically significant differences in effectiveness between e.o.w. and weekly adalimumab regimens.

CHARM67 (adalimumab)

This was a free-standing maintenance trial (i.e. newly started).67 There were 854 enrolled patients (CDAI range 220–400), of whom 130 (15.2%) had fistulas at screening and baseline. An induction regimen consisting of an 80-mg injection of adalimumab at week 0 and a 40-mg injection 2 weeks later was followed by randomisation of 778 patients at week 4 to one of three arms as follows: placebo to week 56 (n = 261), 40 mg adalimumab e.o.w. to week 56 (n = 260) and 40 mg adalimumab weekly to week 56 (n = 257). There were 76 (8.9%) withdrawals prior to randomisation. Assessment visits were planned for weeks 0, 2, 4, 6, 8, 12, 16, 20, 26, 32, 40, 48, 56 and 60.

At week 4 patients were classified as responders or non-responders. Responders had to have a reduction of ≥ 70 CDAI points relative to baseline. Of the 854 patients given the induction regimen, 499 (58%) were categorised as responders and were the focus of the published effectiveness results. This population was different to that followed up in the other adalimumab maintenance trial, CLASSIC II,66 in that the latter were on average better responders, having achieved remission from induction. The numbers of responders randomised to the three trial arms of CHARM were 170 to placebo, 172 to adalimumab e.o.w. and 157 to adalimumab weekly.

The coprimary outcome measures were designated: the percentage of week 4 responders who achieved remission at weeks 26 and 56. Pre-specified secondary outcomes included (1) percentage achieving response 70 and response 100 at weeks 26 and 56; (2) change in IBDQ score from baseline at weeks 26 and 56; (3) percentage achieving clinical remission at weeks 26 and 56 who were able to discontinue corticosteroid use; (4) percentage achieving clinical remission at weeks 26 and 56 who were able to discontinue steroids for ≥ 90 days; (5) percentage of patients with fistula remission (closure of all fistulas that were draining at screening and baseline visits); and (6) median time in clinical remission among randomised responders achieving remission. Post hoc analyses examined subgroup responses and sustainability of response.

At or after week 12, patients with disease flare (an increase of ≥ 70 CDAI points from the score at week 4 and a CDAI score > 220) or sustained non-response (CDAI score not reduced by ≥ 70 points from week 0) were eligible to cross over to 40-mg adalimumab e.o.w. which could be escalated to 40-mg weekly for patients with continued non-response or recurrent flare. For the primary effectiveness outcome (responders), any patients who crossed over were counted as remission failures.

A, response 70 and B, response 100

The published response 70 and response 100 rates at weeks 26 and 56 are summarised in Table 27. Rates reached statistical significance in favour of adalimumab for both dose regimens at both time points.

TABLE 27. Reported response 100 and response 70 rates in CHARM.

TABLE 27

Reported response 100 and response 70 rates in CHARM.

The unpublished Industry Trial Report for CHARM provided (CiC) values for response 70 at time points for all assessment visits. These are summarised in Figure 20. (CiC information has been removed.)

FIGURE 20. Response 70 rates among responders in CHARM.

FIGURE 20

Response 70 rates among responders in CHARM. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

Similar CiC results were observed for response 100 and are summarised in Figure 21. From week 8 onwards (CiC information has been removed).

FIGURE 21. Rates of response 100 among responders in CHARM.

FIGURE 21

Rates of response 100 among responders in CHARM. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

C, remission

The primary outcome was the proportion of patients in remission at weeks 26 and 56. The results are summarised in Table 28. The difference between adalimumab groups and placebo reached statistical significance in favour of adalimumab for both dose regimens.

TABLE 28. Remission at weeks 26 and 56 in CHARM.

TABLE 28

Remission at weeks 26 and 56 in CHARM.

The secondary outcomes of remission rates for each follow-up visit to week 56 are summarised in Figure 22. Risk differences (adalimumab–placebo) and risk ratios (adalimumab/placebo) reached statistical significance in favour of adalimumab at all time points after week 6. Rates of remission in the adalimumab e.o.w. arm diminished through follow-up. From weeks 12–16 onwards, risk differences remained stable so that most benefit of the intervention appeared to be delivered in the first quarter of the trial. The rates reported were group point prevalence values and do not reflect maintenance of remission at the patient level. The difference in rates between the two adalimumab regimens at week 56 was not significant (risk difference p = 0.32, risk ratio p = 0.32).

FIGURE 22. Remission rates reported during follow-up in CHARM.

FIGURE 22

Remission rates reported during follow-up in CHARM. At week 56 risk difference and risk ratio for both regimens of adalimumab versus placebo, p < 0.0001. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence (more...)

Patient level maintenance of remission was published for weeks 26–56. In the adalimumab arms, 81% of patients in remission at week 26 sustained remission to week 56; this represented 114 patients and 27% of all those randomised to adalimumab. For patients randomised to placebo, 48% of those in remission at week 26 sustained remission to week 56. This represented 14 patients and 5% of all those randomised to placebo. The median time in clinical remission that started at any time was 127 days for the placebo group, 378 days for the adalimumab e.o.w. group and > 392 days for the adalimumab weekly group (p = 0.002 and p < 0.001 vs placebo respectively). Over 56 weeks it was possible for a patient to enter a remission state on several occasions. The publication did not make clear which occasion(s) were used in the analysis or how and if double counting was avoided.

D, other outcomes

Published mean CDAI and IBDQ scores are summarised in Table 29. No variance information was provided. After week 12, CDAI and IBDQ scores for patients who crossed over to increased adalimumab doses were included in the calculation of group mean scores although this was not made explicit. Mean CDAI scores decreased and mean IBDQ scores increased, to a greater degree respectively in the adalimumab groups than the placebo group. Given that a true placebo group did not exist after week 14, the results thereafter are difficult to interpret. The last observation was carried forward; the proportion of patients evaluated at week 56 (CiC information has been removed).

TABLE 29. Group mean CDAI and IBDQ scores reported for responders in CHARM.

TABLE 29

Group mean CDAI and IBDQ scores reported for responders in CHARM.

From week 8 the responder patients who were receiving steroids at baseline could begin reducing steroid use (presumably at the physician's discretion). This involved 66 placebo patients, 58 and 74 patients respectively in the adalimumab e.o.w. and adalimumab weekly groups. The percentage of these patients who were in remission at week 26 and who had discontinued steroids was 3% (2/66) in the placebo group, and 34% (20/58) and 30% (22/74) in the adalimumab e.o.w. and weekly groups respectively. Corresponding percentages at week 56 were 6%, 29% and 23% respectively. The percentage who were in remission at week 26 and who were steroid free for at least 90 days was 3% in the placebo group and 19% and 15% in the adalimumab e.o.w. and weekly groups respectively. Corresponding percentages at week 56 were 5%, 29% and 20% respectively.

Hospitalisation rates Details on hospitalisation rates from the CHARM trial67 were reported in the industry submission, referenced to published abstracts by Wu et al.71 and by Feagan et al.72 The latter abstract reports the hospitalisation rates in the placebo arm and the combined adalimumab arms, which were 22.4% and 14.0% respectively. The 56-week actuarial CD-related hospital admission rates for the placebo and for the combined adalimumab arms were 13.9% and 5.9% respectively. A difference in relative risk was apparent at 2 weeks after randomisation, and placebo patients had 4.5 times the risk of hospitalisation at month 3 as adalimumab patients. Wu et al.71 used a Cox proportional hazard regression model and found that lower CDAI scores were associated with a decreased risk of hospitalisation and CD-related hospitalisation. Simulated 1-year rates indicated that a 70-point reduction on the CDAI throughout the follow-up period reduced all-cause hospitalisation risk by 28.3% and CD-related hospitalisation by 36.5% at year end. Further simulations indicated that remission was associated with a 43.7% decrease in the 1-year risk of all-cause hospitalisation and a 60.3% decrease in CD-related hospitalisation.

E, other considerations – subgroup analyses and crossover issues

Outcomes for patients with draining fistulas are included in the next section.

The manufacturer's submission to NICE provided weeks 26 and 56 results for placebo and e.o.w. adalimumab group patients who had severe disease at baseline (CDAI > 300). Results for all severe patients and for severe week 4 responders were provided allowing calculation of results for non-responders with severe CD (Table 30). There were 96 severe CD patients in both placebo and e.o.w. adalimumab groups. Rates of remission, response 70 and response 100 are summarised in Figure 23. Remission rates at week 56 in adalimumab and placebo arms were 35% and 10% respectively; higher rates were recorded for the less stringent response 70 and 100 outcomes. These rates were similar to those reported for all week 4 responders (within 5%; listed in Table 30). The rates in the e.o.w. arm for week 4 non-responders with severe CD were about half those for week 4 responders with severe CD.

TABLE 30. Response rates for severe CD patients in CHARM.

TABLE 30

Response rates for severe CD patients in CHARM.

FIGURE 23. Response and remission rates for severe disease responders in CHARM.

FIGURE 23

Response and remission rates for severe disease responders in CHARM. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

Other post hoc subgroup analyses

Several post hoc analyses explored the effectiveness of adalimumab among subgroups of patients defined according to various criteria including: baseline CRP level > or < 1 mg/ml; concomitant treatment with or without immunosuppressant medication; and previous experience of anti-TNF therapy or no previous experience. No statistically significant subgroup differences in adalimumab effectiveness were observed.

Premature withdrawal from treatment and crossover due to worsening disease

The published information about withdrawal from treatment and crossover to open-label therapy was difficult to disentangle. The Industry Trial Report provided fuller detail. Of 499 responders 29% (144) withdrew prematurely: (CiC information has been removed)% of the placebo group [(CiC information has been removed)], (CiC information has been removed)% of the e.o.w. adalimumab group [(CiC information has been removed)] and (CiC information has been removed)% of the weekly adalimumab group [(CiC information has been removed)]. The Industry Trial Report stated the overall premature discontinuation rate among all patients was (CiC information has been removed)% [(CiC information has been removed)], with 79 of these occurring before randomisation. Among all 788 randomised patients, withdrawals during the randomised phase were (CiC information has been removed) [(CiC information has been removed)%] in placebo group, (CiC information has been removed) [(CiC information has been removed)%] in the adalimumab e.o.w. group and (CiC information has been removed) ((CiC information has been removed)%) in the adalimumab weekly group, giving an overall rate of (CiC information has been removed)% [(CiC information has been removed)/(CiC information has been removed)] slightly (CiC information has been removed) than for responders only. Of patients randomised to adalimumab maintenance therapy, the rate of premature withdrawal was the (CiC information has been removed).

Crossover to open-label treatment after week 12 involved (CiC information has been removed) [(CiC information has been removed)%] of patients randomised to placebo, (CiC information has been removed) [(CiC information has been removed)%] of those randomised to adalimumab e.o.w. and (CiC information has been removed)/(CiC information has been removed) [(CiC information has been removed)] of those randomised to adalimumab weekly. These numbers represented patients experiencing worsening disease by flare or discontinued response. Transfer to open-label for patients in the weekly adalimumab group involved continuation of the same dose regimen (as crossover was described as ‘…switched to open-label treatment with 40-mg adalimumab eow… escalated to 40-mg weekly for those with continued non-response or recurrent flare’. After crossover ‘…continued non-response with open-label 40-mg weekly dosage resulted in withdrawal’;67 however, there was no published information about how long the state of flare or non-response was allowed to continue before withdrawal was implemented. The number of responder patients who crossed over to open-label was not published. The Industry Trial Report allowed calculation of crossovers and withdrawals among all randomised patients; this information is summarised in Figure 24.

Figure Icon

FIGURE 24

Withdrawals from treatment and crossovers for flare or non-response in CHARM. *There was a discrepancy concerning one patient in the values for the adalimumab weekly group. (CiC information has been removed.)

Quality assessment (based on published report)

Randomisation, allocation concealment and blinding were adequate. Baseline characteristics were reported only for all patients, for all responders and for all non-responders, not for each of the trial arms. It was therefore not possible to judge if baseline characteristics were evenly balanced between the three arms of responders (placebo and e.o.w. or weekly adalimumab groups) that were analysed for effectiveness outcomes. The frequency of placebo injections was not documented. Information about patients who withdrew was reported. After week 12, patients with disease flare or non-response were allowed to cross over to the open-label treatment. It was difficult to determine how many responders and how many randomised patients in each group crossed over to open-label treatment. There was no statement defining how long after crossover flare or non-response was allowed to continue before withdrawal was implemented. Where necessary, the nearest or last observation was carried forward for continuous outcomes, but this was not stated explicitly and the number of missing data was not reported. A power calculation was conducted and based on the primary analysis of 4-week responders achieving remission at weeks 26 and 56.

The published text stated ‘…secondary efficacy analyses were conducted for all treated patients, including both randomised responder and randomised non-responder groups (all randomised patients who failed to achieve a clinical response at week 4)’.67 Although this might be technically correct, in the sense that analyses were conducted, it is misleading because the results of these analyses were not reported, with the single exception of data on healing of fistulas for a subgroup of patients with fistulas at baseline and screening.

CHARM.67 Summary of effectiveness evidence

Seven hundred and seventy-eight patients given induction injections of 80 mg and 40 mg of adalimumab separated by 2 weeks were randomised at week 4 to maintenance therapy with placebo or 40-mg adalimumab e.o.w. or weekly. Only results for responders were published. Responders were defined as patients who at week 4 had a CDAI score reduced by ≥ 70 points from baseline. At weeks 26 and 56 there were significantly more responder patients in remission in the e.o.w. and weekly adalimumab groups than the placebo group, 40% and 47% respectively versus 17% at week 26, and 36% and 41% respectively versus 12% at week 56 (p < 0.001 for adalimumab vs placebo). The risk difference (adalimumab–placebo) for remission reached statistical significance in favour of adalimumab from week 8 onwards and remained stable from about week 12 or 16 to the end of follow-up (week 56), indicating that most of the benefit from active intervention was delivered during the first quarter to third of the trial.

The proportion of responders (response 70) had diminished to < 50% in all groups by week 56. Response 70 rates diminished (CiC information has been removed) delivered during the first part of the trial. Premature withdrawal from randomised treatments (adalimumab and placebo) was (CiC information has been removed)%; withdrawal rate from active intervention (adalimumab) was (CiC information has been removed) responders (CiC information has been removed)% (CiC information has been removed) non-responders [(CiC information has been removed)%]. Amongst the whole trial population randomised to adalimumab maintenance therapy, (CiC information has been removed)% crossed over to open-label treatment due to flare or non-response. The distribution of crossovers between responders and non-responders was unclear.

Pooling and indirect comparisons

The two adalimumab trials, CLASSIC II66 and CHARM,67 differed fundamentally with respect to populations analysed for outcome results. CLASSIC II66 reported results for responders who had achieved remission, whereas the responders in CHARM67 had achieved only the less stringent response of a 70-point reduction in CDAI score. It would be inappropriate to combine the results from these two trials. It is relevant that the manufacturer's submission for adalimumab did not adopt a pooling approach. In a 2008 Cochrane review73 the authors stated ‘the two studies evaluating adalimumab were evaluated separately due to heterogeneity among the two trials (i.e. CLASSIC II and CHARM)’. Surprisingly, the results section of the review provided pooled results for remission (random effects model), and a further different pooled result (which may have been fixed effects) was presented in the discussion. On contacting the authors regarding these inconsistencies, we were informed that the review would be amended and the modified version, with no pooled results, is now available in the Cochrane Library.

The two infliximab trials, Rutgeerts et al.58 (extension from Targan et al.57) and ACCENT I,2,3 both employed a 10 mg/kg infliximab maintenance therapy arm and both reported results for responders based on a CDAI score reduced from baseline by ≥ 70 points. Therefore there is potential for pooling results. However, the pre-maintenance ‘induction’ phases of the two trials were very different, so the populations analysed for maintenance outcomes were likely to be quite different at the start of maintenance. Responders in ACCENT I2,3 were selected 2 weeks after a single exposure to a 5 mg/kg dose of infliximab. In contrast, responders in Rutgeerts et al.58 were selected between 8 and 12 weeks after their first exposure to infliximab and were required to have a response 70 lasting 4 weeks. A further considerable difference between the responders in the two trials was the degree of exposure to infliximab prior to their selection as responders; in ACCENT I2,3 responders were defined after a single 5 mg/kg exposure, whereas Rutgeerts et al.'s58 responders could have been exposed to any of the following: one 5 mg/kg, one 10 mg/kg, one 20 mg/kg, one 5 mg/kg and one 10 mg/kg, two 10 mg/kg, one 20 mg/kg and one 10 mg/kg, or no infliximab. The cumulative effect of these differences in the responder population (up to sixfold difference in exposure, different requirement in duration of response 70, and between four- and sixfold difference in duration of induction phase) is that the populations were unlikely to be sufficiently similar for the pooling of results to be informative.

Indirect comparison between the placebo-controlled maintenance trials, so as to gain an estimate of relative effectiveness of the two anti-TNF agents, was not undertaken for this non-fistulising adult population. Indirect comparison requires that trials for different interventions of interest share a common comparator arm (‘exchangeability’, see Glenny et al.74). For the maintenance trials, the differences between ‘placebo’ groups were numerous and not easily quantifiable; different induction drugs were administered on differing numbers of occasions for different periods of time, followed by selection of responders by differing criteria representing different proportions of the randomised populations. The basis of indirect comparison depends on strict comparability of the trial arms common to the compared trials (in this case the placebo arms). In these circumstances indirect comparison would be misleading and unjustified. It is noteworthy that neither of the manufacturers' submissions performed formal indirect comparison based on these trials.

Trials recruiting patients with fistulas

Two trials, Present et al.62 – an induction trial – and ACCENT II65 – a maintenance trial – compared infliximab with placebo for adults with fistulising CD. There were no trials of adalimumab that enrolled only from this patient group. In these two trials all patients had one or more fistulas at the time of randomisation, and the main outcome measures focused on the status of fistulas during follow-up. The outcomes measured are listed in Table 31 and the main trial characteristics summarised in Table 32. For reference purposes this section also includes fistula status results for the small subgroups of adult patients who had fistulas in other trials.

TABLE 31. Outcomes measured in trials of fistulising CD.

TABLE 31

Outcomes measured in trials of fistulising CD.

TABLE 32. Main study and population characteristics for trials in fistulising adult populations.

TABLE 32

Main study and population characteristics for trials in fistulising adult populations.

Present et al., 199962 (infliximab)

Present et al.62 was a small study that randomised 31 patients to placebo, 31 and 32 patients respectively to 5 mg/kg or 10 mg/kg infliximab infused at weeks 0, 2 and 6. Follow-up extended to at least week 18 with assessment visits every 4 weeks from week 2 onwards.

Primary outcome

The primary outcome was a > 50% reduction in the number of draining fistulas relative to baseline evaluated by physical evaluation and observed over at least two consecutive study visits at any time during the trial. Secondary outcomes included: complete absence of draining fistula observed over at least 4 weeks (i.e. across at least two consecutive study visits) at any time during the study, time to beginning of response and duration of response. Changes in CDAI and PDAI scores were reported for some patients.

The results for the primary outcome (for 50% reduction in draining fistula occurring at any time over at least two consecutive clinic visits and for complete absence of draining fistula over two consecutive clinic visits) are summarised in Figure 25. For both these outcomes, infliximab at both dose regimens was more effective than placebo (p = 0.002 and p = 0.02 for 5- and 10 mg/kg regimens respectively). The point estimates for response rates were associated with substantial uncertainty because of the small group size; for the combined infliximab groups the response rate was 62% (95% CI 50% to 73%) compared with 26% (95% CI 14% to 43%) for the placebo group (p < 0.001). For those with a response, the median time to response was 6 weeks in the placebo group and 2 weeks in the infliximab groups (Table 33).

FIGURE 25. Rates and risk differences for a 50% reduction and absence of draining fistulas.

FIGURE 25

Rates and risk differences for a 50% reduction and absence of draining fistulas. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

TABLE 33. Time to onset of primary outcome.

TABLE 33

Time to onset of primary outcome.

Response and complete response

The median duration of response (defined as the maximum period during which the patient experienced a 50% reduction in draining fistulas) was approximately 3 months. For infliximab patients, 29/63 (46%; 95% CI 34% to 58%) experienced complete absence of draining fistulas for at least two consecutive clinic visits compared with 4/31 (13%; 95% CI 5% to 29%) of patients in the placebo group (p < 0.001).

The median CDAI and PDAI scores reported for baseline and weeks 2 and 18 of follow-up are summarised in Table 34. By week 2, statistically significantly better (i.e. lower) scores were found for the infliximab groups than for the placebo group. The statistical significance of the difference between groups had weakened or disappeared by week 18. Not all patients contributed data for the analyses (i.e. this was not an ITT analysis).

TABLE 34. CDAI and PDAI scores reported in the Present et al. trial.

TABLE 34

CDAI and PDAI scores reported in the Present et al. trial.

Quality assessment based on published report62

Randomisation and blinding were adequate and allocation concealment was likely to have been adequate. Baseline characteristics were generally well balanced although there was a greater proportion of patients in the infliximab groups that had undergone previous segmental resections than in the placebo group. Draining fistulas of < 3 months' duration were excluded from the primary analysis. However, the number or frequency of these fistulas was not reported, and it was unclear if these were also excluded from the secondary outcome of a complete absence of a draining fistula. Total follow-up time for the primary outcome was unclear. No power calculation was performed. Last observation was carried forward for CDAI and PDAI analyses, but the number of missing data was unclear. There were only six premature withdrawals from treatment, four from the placebo group and one patient from each of the infliximab groups.

Present et al., 1999.62 Summary of effectiveness evidence

Patients with one or more draining fistula of more than 3 months' duration, and an unreported number of fistulas of < 3 months' duration, were randomised to placebo or 5 mg/kg infliximab or 10 mg/kg infliximab, by i.v. infusion at weeks 0, 2 and 6. More patients in the infliximab groups than in the placebo group achieved the primary outcome defined as: a reduction in the number of 3-month duration draining fistulas present at baseline by at least 50% lasting for at least two consecutive clinic visits. The percentage of patients responding to infliximab was 62% (95% CI 50% to 73%) compared with 26% (95% CI 14% to 43%) for the placebo group (p < 0.002). The median time to response was 2 weeks for infliximab groups and 6 weeks for placebo group. The duration of response was the same for both groups (median about 12 weeks).

More patients in the infliximab groups than in the placebo group achieved the secondary outcome of absence of draining fistula lasting for at least two consecutive clinic visits. The percentage of patients responding to infliximab for this outcome was 46% (95% CI 34% to 58%) compared with 13% (95% CI 5% to 29%) for the placebo group (p < 0.001).

ACCENT II65 (infliximab)

This was a maintenance trial that recruited 306 patients who had one or more fistulas of at least 3 months' standing.65 Of the 306 enrolled patients, 282 were assessed for ‘response’ at week 14 after administration of infusions at weeks 0, 2 and 6 of 5 mg/kg infliximab. ‘Responders’ were defined as those patients with at least 50% reduction in draining fistulas relative to baseline, observed at both weeks 10 and 14. Sixty-nine per cent (195) of patients were classified as responders. Both responders and non-responders were randomised to placebo (96 responders, 43 non-responders) or to 5 mg/kg infliximab (99 responders, 44 non-responders), which were administered at weeks 14, 22, 30, 38 and 46. Assessment visits were scheduled at weeks 0, 2, 6, 10, 14, 22, 30, 38, 46 and 54. After week 22, patients losing response could cross over to 5 mg/kg infliximab from placebo and from 5 mg/kg infliximab to 10 mg/kg infliximab. The fistula status outcome measures were:

  • loss of response – defined as a recrudescence of draining fistula, a change in therapy, a need for surgery, dropout due to lack of efficacy, or a worsening of luminal disease activity
  • response – defined as 50% reduction from baseline in draining fistula observed at consecutive visits 4 or more weeks apart
  • complete absence of draining fistula.
Primary outcome

The primary outcome was designated as time to loss of response in responders. The results are summarised in Figure 26.

FIGURE 26. Time to loss of response by responders in ACCENT II.

FIGURE 26

Time to loss of response by responders in ACCENT II. Data taken from published graph and redrawn.

The median time to loss of response after randomisation was 14 weeks in the placebo group and > 40 weeks in the infliximab group (p < 0.001 by log rank test). In the infliximab group, 42% of responders lost response, and in the placebo group, 62% lost response. The main reasons for loss of response in the primary outcome were: change in treatment (38% of placebo, 25% of infliximab) or recrudescence of fistula (22% placebo, 16% infliximab).

Response and complete response

At 30 weeks, 33% and 64% of the placebo and infliximab groups respectively had a response (50% reduction in draining fistula from baseline for at least two consecutive visits), and at week 54 the corresponding percentages had diminished to 23% and 46% respectively (p = 0.001). The manufacturer's submission to NICE contained CiC information for additional weeks of follow-up. These are summarised in Figure 27. Prior to randomisation, except at week 2, the rates were about equal as would be expected, given that all responder patients received identical induction therapy up to week 14 and baseline characteristics were well balanced. At week 2, a surprising difference between groups was observed with higher rates for the patients subsequently randomised to infliximab. At week 14, the placebo group did not receive infliximab. After randomisation at week 14, response rates diminished in both groups. From week 22 the risk difference (infliximab–placebo) reached statistical significance in favour of infliximab, after week 30 risk differences diminished indicating that most benefit for maintenance of response from active intervention was delivered between weeks 14 and 30. By week 30 the intervention group had received two more infusions of infliximab than placebo patients.

FIGURE 27. Rates and risk differences for ≥ 50% reduction of draining fistulas in ACCENT II responders.

FIGURE 27

Rates and risk differences for ≥ 50% reduction of draining fistulas in ACCENT II responders. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

Responders with loss of response during the post-randomisation phase were allowed to cross over after week 22 to an increased dose of infliximab. The renewed response rate in these crossover patients was reported as 25/41 (61%) in the placebo group (crossed over to 5 mg/kg dose) and 12/21 (57%) in the intervention group (crossed over to 10 mg/kg dose). However, Figure 1 of the ACCENT II65 published report shows 50 crossovers from placebo and 28 from 5 mg/kg infliximab.

The published report provided information about the rates of ‘complete response’ among responders.65 A complete response was defined as a complete absence of draining fistulas. The definition for a response required ≥ 50% reduction in fistulas for at least 4 weeks, a ‘complete response’ differed in that no minimum duration was specified. It was unclear, but likely, that this definition applied only to draining fistulas of at least 3 months' standing at baseline. The frequency of draining fistulas at baseline that were of less than 3 months' standing was not reported. The results for a complete response are summarised in Figure 28.

FIGURE 28. Rates, risk difference and risk ratio for complete response among responders in ACCENT II.

FIGURE 28

Rates, risk difference and risk ratio for complete response among responders in ACCENT II. LCI, lower confidence interval; RD, risk difference; RR, risk ratio; UCI, upper confidence interval.

At week 2, after only a single dose of infliximab, 66% (128/195) of patients already had a ‘complete response’. Unexpectedly, more patients who were subsequently randomised to infliximab had a complete response than those subsequently randomised to placebo (p = 0.014). By 14 weeks, 66% and 69% of responder patients who were randomised to placebo and infliximab respectively had a complete response. The rate of complete response in responders diminished in both groups after week 14. From week 22 the risk difference (infliximab–placebo) reached statistical significance in favour of infliximab and from week 30 remained stable, indicating that most benefit in maintenance of response from active intervention was delivered between weeks 14 and 30.

The rate for a complete response among all enrolled patients at week 14 was reported to be 48% (147/306); this generated a 75% [147/(99 + 96)] complete response for responders at week 14 which according to Figure 2B in the ACCENT II65 published report corresponded to week 10 rather than to week 14.

Hospitalisations and major surgery

The manufacturer's submission to NICE presented results for major surgery and for hospitalisation for all patients in ACCENT II65 whether or not they crossed over. For this purpose the placebo arm was termed ‘episodic treatment’ and the infliximab arm ‘scheduled treatment’. A 2.4-fold lower rate was reported for the scheduled treatment group. There were two important differences between these treatments. Firstly, patients in the ‘episodic’ arm experienced a 4-month mandatory withdrawal of active intervention (from weeks 6 to 22) not experienced by patients in scheduled treatment. Secondly, after week 22 the ‘episodic’ group patients were restricted to 5 mg/kg infliximab at episodes of worsening disease, whereas the ‘scheduled treatment’ group were able to receive 10 mg/kg. Restricted access to treatment (weeks 6–22) and restricted dosage represent biases likely to favour the ‘scheduled treatment’ group for any comparisons after week 6. Furthermore the ‘episodic treatment’ procedure was unlikely to reflect how an episodic strategy might be implemented in real world clinical practice, both with respect to the 4-month gap in active intervention and with regard to restriction of dose. Because of bias in the comparisons made and the probable dissimilarity between the trial episodic treatment and likely clinical practice, it was considered here that the hospitalisation rate for the ‘episodic’ treatment and the comparison with the scheduled treatment were very approximate guides.

The considerations described above also apply to the values reported for the percentages of patients requiring major surgery (13% and 2% in ‘episodic’ and scheduled treatment arms respectively).

Other outcomes reported for ACCENT II

The ACCENT II65 published report presented the median decrease from baseline in CDAI score at weeks 30 and 54 for all patients. Improvements in median CDAI were statistically significantly greater for the infliximab group (p = 0.004). Median increases from baseline in IBDQ scores at weeks 30 and 54 were also significantly greater for the infliximab group than for the placebo group. Baseline scores for all patients by group were not provided and baseline balance was therefore uncertain. In the case of missing values, the last observations were carried forward for the CDAI and IBDQ outcomes. The results are summarised in Table 35.

TABLE 35. Median CDAI and IBDQ changes ACCENT II.

TABLE 35

Median CDAI and IBDQ changes ACCENT II.

Further results for the ACCENT II65 trial were presented in two separate papers.75,76 One reported a post hoc analysis of the subgroup of responder patients with rectovaginal fistulas (11 received placebo and 14 received the 5 mg/kg dose regimen of infliximab);75 the other paper performed a post hoc analysis on incidence of abscess development in patients responding to infliximab with closure of fistulas.76 The first of these papers was too underpowered for firm conclusions to be drawn. In ACCENT II,65 crossover to an increased dose of infliximab was allowed for all randomised groups (including placebo) from week 22 onwards; this resulted in the mean dose of infliximab in the placebo group (quoted as 20 mg/kg) being approximately half that of the intervention groups (quoted as 40 mg/kg). The post hoc analysis for abscess development compared these two groups and reported no statistically significant difference in rates (15% vs 19%; p = 0.526).

Quality assessment (based on published report)

Randomisation and blinding were adequate and allocation concealment likely to be so. Baseline characteristics for responders were well balanced between placebo and infliximab arms; however, for the all-patient comparisons between infliximab and placebo arms (e.g. of change in IBDQ and CDAI scores relative to baseline) it was not possible to ascertain if groups were balanced at baseline. There was a lack of clarity in the methods section so that it was difficult to determine if the sentence ‘…data for patients who crossed over from placebo to infliximab were censored before crossover occurred…’ referred to the survival analysis of loss of response. If it did, the reason for different handling of crossovers in the compared groups is difficult to interpret. The number of patients who withdrew prematurely was unclear except for discontinuation for AEs. No power calculation was undertaken. The last observation was carried forward as necessary for continuous outcomes, but the number of missing data was not reported.

ACCENT II.65 Summary of effectiveness evidence

After induction infusions of 5 mg/kg infliximab at weeks 0, 2 and 6, 64% of enrolled patients were classified as responders. Responders were defined as patients experiencing at both weeks 10 and 14 a ≥ 50% reduction in the number of draining fistulas that were present at baseline of at least 3 months' standing.

After week 14, the median time to loss of response by responder patients was greater for patients randomised to continued infliximab treatment of 5 mg/kg at 8-week intervals than for those randomised to placebo (p < 0.001). More responder patients randomised to infliximab at week 14 experienced a response (closure of ≥ 50% of draining fistula for at least 4 weeks) than did responder patients randomised to placebo, and from week 22 the risk difference (infliximab–placebo) reached statistical significance in favour of infliximab. After week 14, response rates diminished in both groups. From week 30, risk differences diminished indicating that most benefit from infliximab was delivered between weeks 14 and 30.

Other trials reporting on subgroups of adults with fistulas

Two other trials reported on effectiveness of anti-TNF therapy for closure of fistulas – the GAIN induction trial of adalimumab64 and the CHARM maintenance trial of adalimumab.67 In the GAIN trial,64 at the end of follow-up (week 4) similar rates of fistula improvement were recorded for adalimumab and placebo groups (3/20 and 5/25 respectively). The CHARM trial67 reported a measure termed ‘fistula remission’ for the subgroup of trial patients who had fistula at screening and baseline. Fistula remission was defined as the percentage of patients with closure of all fistulas that were draining at screening and at baseline (separated by 2 weeks). Fistula remission was observed for 30% (21/70) and 13% (6/47) of combined adalimumab groups and placebo group respectively at week 26 and for 33% (23/70) and 13% (6/47) respectively at week 56.

Paediatric Crohn's disease trials

Patients in these trials were ≤ 18 years of age. Two trials, Baldassano et al.46 and REACH [A randomized, multicenter, open-label study to evaluate the safety and efficacy of anti-TNFα chimeric monoclonal antibody (infliximab, Remicade®) in pediatric subjects with moderate-to-severe Crohn's disease] (Hyams et al.45), looked at the effectiveness of different doses of infliximab in paediatric CD patients. There was no placebo arm in either trial. There were no trials of adalimumab in children. The outcomes measured are shown in Table 36 and the study characteristics are summarised in Table 37.

TABLE 36. Outcomes measured in trials of paediatric CD.

TABLE 36

Outcomes measured in trials of paediatric CD.

TABLE 37. Main study and population characteristics: paediatric trials.

TABLE 37

Main study and population characteristics: paediatric trials.

Baldassano et al., 200346 (infliximab)

The small trial of Baldassano et al.46 examined whether a single dose of infliximab induced a response in paediatric patients. Patients were randomised to a 1 mg/kg (n = 6), 5 mg/kg (n = 7) or 10 mg/kg (n = 8) infusion. Patients were followed up to week 12. The primary outcomes were improvements from baseline in PCDAI and modified CDAI score. Other outcomes were the percentage of patients responding and the percentage in remission.

Table 38 shows the median percentage improvement in PCDAI score at various follow-up times relative to baseline. No clear pattern relating to follow-up time or dose regimen was apparent. To what extent improvement in scores resulted from infliximab treatment is impossible to determine because of lack of an appropriate placebo control group.

TABLE 38. Improvement in PCDAI score in Baldassano et al.

TABLE 38

Improvement in PCDAI score in Baldassano et al.

Response and remission results are summarised in Figure 29. All estimates were associated with great uncertainty due to the small number of participants. The proportion of patients in response approached 100% after 1 week in all groups and then tended to decline during follow-up. There was little difference between the groups. How much of the response was intervention dependent cannot be determined because of the lack of an appropriate placebo control group that did not receive infliximab. For remission, no clear pattern relating to dose or length of follow-up was apparent. Again, because of the lack of an inactive control it is impossible to determine the contribution of infliximab to the observed results.

FIGURE 29. Response and remission rates reported in Baldassano et al. (results as reported, not ITT).

FIGURE 29

Response and remission rates reported in Baldassano et al. (results as reported, not ITT). Response was defined as at least a 10-point reduction in PCDAI or at least a 70-point reduction in modified CDAI score; remission was defined as a PCDAI score < (more...)

Quality assessment (based on published report) Randomisation and blinding were adequate and allocation concealment likely to be so. With such small numbers in each group it is not surprising that some baseline characteristics were imbalanced; notably, the 10 mg/kg group consisted almost exclusively of boys and the baseline CDAI score was substantially higher for the 1 mg/kg group than for the 5 mg/kg and 10 mg/kg groups. The number of patients completing the trial was reported to be 90%. No power calculation was done and analyses did not appear to be ITT.

Baldassano et al., 2003.46 Summary of effectiveness evidence

An induction infusion of 1, 5 or 10 mg/kg infliximab improved PDAI scores relative to baseline. Induction increased the proportion of patients in response (40%–100% depending on dose and follow-up time) and in remission (0%–50% depending on dose and follow-up time). The study was underpowered, so these effectiveness estimates were associated with great uncertainty; no clear pattern was evident relating outcomes to dose regimen. The lack of a placebo control group renders interpretation of results problematic.

REACH45 (infliximab)

The REACH trial45 was called an ‘induction and maintenance’ study. Patients received induction doses of 5 mg/kg infliximab at weeks 0, 2 and 6. Responders were defined as those who reduced baseline PCDAI by at least 15 points and had a score of ≤ 30 at week 10. Responders (only) at week 10 were randomised to either five further doses of 5 mg/kg every 8 weeks delivered at weeks 14, 22, 30, 38 and 46 or three further doses delivered every 12 weeks at weeks 18, 30 and 42. Of 112 patients entering the induction phase, 103 were classified as responders and 99 were analysed. The lack of a placebo control group not receiving infliximab means that it is difficult to determine to what extent maintenance of response after induction was attributable to infliximab intervention. No primary outcome was identified. Response and remission results were reported for weeks 30 and 54 and weeks 10, 30 and 54 respectively. These are summarised in Figure 30. The differences between the two dose regimens for both response and remission at weeks 30 and 54 reached statistical significance (p < 0.05) in favour of the more frequent dose regimen.

FIGURE 30. Post-induction response and remission rates for responders in the REACH trial.

FIGURE 30

Post-induction response and remission rates for responders in the REACH trial. Response defined as decrease in PCDAI of ≥ 15 points from baseline and total [not greater-than] 30. Remission defined as a PCDAI ≤ 10 points.

The REACH publication45 also reported changes from baseline (mean and SD) in PCDAI score, IMPACT III score (a QoL measure; scores range from 35 to 175 with higher scores representing better QoL) and daily corticosteroid use. Last observation was carried forward where values were missing. Information was provided for all ‘responders’ (i.e. the two trial arms combined) or separately for the two different treatment groups, at weeks 10, 30 and 54. The results are summarised in Table 39. The ‘all responders results’ do not represent a randomised comparison but rather a ‘before versus after treatment’ comparison for a subgroup of patients (responders) who were selected because they exhibited a favourable response. Given that a ‘no-treatment control group’ was not included in this trial, the analyses do not provide robust quantitative information about the effectiveness of infliximab for paediatric patients, and the favourable changes reported are difficult to interpret as an indeterminate proportion of the effects observed may have been infliximab independent.

TABLE 39. Changes from baseline in outcome measures reported for the REACH trial.

TABLE 39

Changes from baseline in outcome measures reported for the REACH trial.

After week 10 responder patients were allowed to cross over to increased infliximab for worsening disease state. The increases in infliximab allowed included transfer from infusions every 12 weeks to every 8 weeks and increase in infusion dose from 5 mg/kg to 10 mg/kg. The proportion of patients who crossed over was 40%. The number of responder patients who withdrew prematurely was reported as 22 (21%), but it was unclear if this included withdrawals of patients who had crossed over to increased infliximab.

Quality assessment (based on published report) Randomisation was adequate and allocation concealment likely to be so. This was an open-label study with no blinding.45 Baseline characteristics were well balanced except for steroid use. The number of patients withdrawing was reported, but it was not clear if this also included crossover patients who later withdrew. A power calculation was done and analyses were ITT.

REACH.45 Summary of effectiveness evidence

A 10-week induction phase with infusions of 5 mg/kg infliximab at weeks 0, 2 and 6 was followed by randomisation of responders at week 10 to further 5 mg/kg infusions every 8 or 12 weeks. At week 10, 88% of enrolled patients were classified as responders. Response rates for responders diminished to less than 50% by week 54. The difference between dose regimens reached statistical significance in favour of the 8-weekly infliximab dose regimen. About 40% of patients crossed over to increased infliximab because of worsening disease status. About 20% of patients withdrew from treatment prematurely.

Results in non-responders

Published results for maintenance trials focused on early responders (determined at week 2, or week 4 in the two large trials, ACCENT I2,3 and CHARM67). It is important to attempt to determine if such a subgroup analysis can be justified.

The question of whether results were published for non-responders is summarised in Table 40. Out of all of the maintenance trials, only two trials (ACCENT I2,3 and II65) published results including initial non-responders. Additional information was obtained from the industry submission for results in responders and non-responders from the CHARM67 trial (see Appendix 11). Table 40 also details whether non-responders were randomised.

TABLE 40. Results reported for non-responders (maintenance trials).

TABLE 40

Results reported for non-responders (maintenance trials).

ACCENT I2,3 (infliximab)

Results for responders and non-responders were not published separately nor presented in the manufacturer's submission. However, by subtracting CiC information for responders from published information for all patients, it is possible in theory to calculate the response and remission rates for non-responders. Results for responders and for all patients were available in publications or CiC information in the industry submission for the following outcomes in the ACCENT I2,3 trial:

  1. Median CDAI scores at numerous follow-up times. These were published in separate papers for responders only and for all patients.3 No indication of variance was given, so robust analysis was not possible.
  2. IBDQ scores. These were recorded but reported differently in the two publications2,3 (median scores for responders and a dichotomised outcome for ‘all’ patients); this information cannot be used for estimation of non-responder results.3
  3. Remission and response 70 for responder patients at multiple follow-up times. The manufacturer's submission on infliximab provided CiC results for remission and response 70 for responder patients at multiple follow-up times for ACCENT I.2,3 Results for these outcomes for all patients were available in the public domain. It was possible to calculate the outcome for non-responder patients randomised to placebo or infliximab (5 mg/kg) by appropriate subtraction of responder rates from all-patient rates. Unfortunately, in practice this was meaningful for only the first 14 weeks of the trial because after week 14, patients who crossed over to increased infliximab dosage regimen on exacerbation of their disease contributed to the numbers achieving outcomes in the all-patient results but were discounted in the analyses for responders only.

The combined lack of complete long-term results, and the introduction of crossover to different treatments at week 14 of the trial, made it difficult to determine the rates of response of ‘non-responders’ in the ACCENT I2,3 trial, and renders problematic the interpretation of these rates when the limited available data allows their calculation. Appendix 11 provides the results calculated for non-responders in ACCENT I.2,3

ACCENT II65 (infliximab)

Limited results for responders and non-responders were reported separately for this trial65 that investigated patients with fistulas. The response rate among initial non-responders was 7/44 (16%) in the placebo group and 9/43 (21%) in the infliximab group (p = 0.6). A response was defined as a reduction of at least 50% from baseline in the number of draining fistulas at consecutive visits 4 or more weeks apart. The time point for this result was not stated and it is unclear whether these were patients who ever had a response during the 54-week trial. There are no details on whether these response rates were maintained. It is difficult to compare these results with those of the initial responders as the trial looked at the maintenance of response in initial responders rather than induction of response.

Adverse events

This section includes in-licence and non-licence trial results so as to include all relevant evidence. All studies reported AEs. There were six malignancies among 573 patients followed for 54 weeks in ACCENT I.2 The most serious AEs, and/or those thought potentially to be associated with anti-TNF therapy have been tabulated. In Table 41 trials are combined and the number of patients with selected AEs listed for treatment and placebo groups. Where there were several treatment groups, these have been combined. AEs occurring during induction or open-label periods of maintenance trials are listed separately according to availability of information (CHARM67 and CLASSIC II66). There were differences in how trialists reported or grouped together AEs (see notes to Table 42). Where an event was not reported it is possible this was because the event did not occur. Excluding trials from the total count where the event did not occur may lead to an overestimation of the frequency of an AE. Where patients experienced more than one type of AE within a category (e.g. infusion reactions), they will have been counted more than once.

TABLE 41. Percentage of patients with selected AEs (trials combined).

TABLE 41

Percentage of patients with selected AEs (trials combined).

TABLE 42. Number of patients with selected AEs.

TABLE 42

Number of patients with selected AEs.

Adverse events leading to withdrawal included worsening of CD, infection or obstruction. Serious infections included sepsis, colitis, abscess and pneumonia. Injection site reactions included burning, rash, pain, bruising or irritation, while i.v. infusion reactions included pruritus, chest pain, flushing, dizziness, dyspnoea, injection site irritation and nausea. Very few deaths were reported.

Little difference was found between treatment and placebo groups for the selected AEs. The only cases of tuberculosis and lupus-like syndrome occurred in the treatment groups. AEs leading to withdrawal were slightly higher in the placebo groups and infusion reactions slightly higher in the treatment groups.

Table 42 lists AEs according to trial. It appears that for reporting of AEs, the placebo groups of the maintenance trials also included patients who crossed over to a treatment group. For ACCENT I,2,3 ACCENT II,65 CHARM67 and CLASSIC II,66 crossover was specified as an option for those patients who had a non-response or experienced a disease flare. There were no details regarding potential crossovers from placebo to treatment in Rutgeerts et al.58 (n = 73). See section on quality for details on number of crossovers from placebo groups (see Quality assessment sections and Appendix 12).

Crossover to treatment may have had an effect on the types and numbers of AEs reported in the placebo groups; for example an increase of those types of AEs associated with the treatment (e.g. infection) and/or an underestimate of AEs associated with no treatment (e.g. worsening of CD). It should be noted that in the maintenance trials, all patients (including those subsequently randomised to placebo) initially received the study drug during the induction phase; the effects of this may have carried over into the placebo phase of the RCT.

None of the maintenance trials reported AEs for patients according to whether they had ever or never received the treatment during the RCT phase of the study. As some of the AEs reported are very rare, it is possible that any differences between treatment and placebo groups are due to chance.

Development of antibodies

This section describes all included studies. Table 43 lists numbers of patients developing antibodies to anti-TNF agents, nuclear antibodies and antibodies to double-stranded deoxyribonucleic acid (DNA). Most (10/11) studies reported the development of antibodies to the respective anti-TNF agent;2,3,45,46,57,58,6266 four studies2,3,45,65,66 reported anti-nuclear antibodies and eight studies2,3,45,46,57,58,62,65,66 reported anti-double-stranded DNA antibodies.

TABLE 43. Antibodies to anti-TNF agent and DNA.

TABLE 43

Antibodies to anti-TNF agent and DNA.

Five induction trials reported the proportion of patients with antibodies to an anti-TNF agent;46,57,6264 these ranged from 0% to 6% (adalimumab: 0%, 1.3%; infliximab: 0%, 3.3%, 6%). All reported antibody development either for the intervention group only, or split by placebo and intervention group, except Present et al.,62 which reported antibodies for placebo and intervention group together. Targan et al.57 included patients from the post-RCT open-label extension. This was also the longest follow-up study among induction trials (16 weeks) and had the highest level of antibodies (6%).

Five maintenance trials reported antibodies to an anti-TNF agent;2,3,45,58,65,66 these ranged from 2.6% to 17% (infliximab: 2.9% to 17%; adalimumab: 2.6%). All patients were exposed to anti-TNF during induction. A patient's subsequent exposure was variable according to randomisation group and crossover to active intervention or escalated dosage regimen. Three studies reported antibodies for the intervention and placebo groups together (ACCENT II,65 Rutgeerts et al.58 and CLASSIC II66). The majority of patients in CLASSIC II66 came from the open-label cohort component of the study. The lowest antibody levels occurred in CLASSIC II66 (adalimumab); the other large adalimumab maintenance trial (CHARM67) did not measure antibodies.

Seven studies listed the proportion of inconclusive samples,2,3,45,57,58,62,64,65 which were generally high and ranged from 14% to ‘most’ patients. These samples had detectable concentrations of anti-TNF agent, which could compete for the detection of antibodies to the anti-TNF agent in the immunoassay used, and would therefore not give a valid result. It is unclear whether the overall percentages of antibodies to the anti-TNF agent would have been different if they could have been measured in all patients.

As with the AEs described above, it should be noted that patients in the placebo groups of the maintenance trials would have all received the treatment as part of induction and may also have crossed over to a treatment group during later stages of the trials.

The proportions of anti-nuclear antibodies were variable: 25% in REACH45 (infliximab), 18%/46% [active treatment (anti-TNF) group (Rx)/placebo] in ACCENT II65 (infliximab), 35%/56% (Rx/placebo) in ACCENT I2,3 (infliximab) and 19% in CLASSIC II66 (adalimumab).

Antibodies to double-stranded DNA were measured in three infliximab induction trials46,57,62 (range 0%–13%) and four maintenance trials2,3,45,58,65 (range 4%–34%); only one adalimumab trial (CLASSIC II,66 19%) measured this parameter.

Given the proportion of missing data (inconclusive samples), the varying numbers of patients receiving treatment in different trials (those who crossed over) and the relatively small number of trials, it is not possible to conclude that one of the interventions is more or less likely to result in the development of antibodies to the anti-TNF agent. Whether different types of assays were used for the detection of antibodies or whether there were differences in the number of frequency of assessments, which could have led to differences between studies or drugs, was not investigated.

Based on the results for all patients (responders and non-responders) from the ACCENT I trial,2 it appeared that scheduled treatment led to the formation of fewer antibodies to infliximab than ‘episodic’ treatment (28% in placebo/episodic treatment arm, 9% in 5 mg/kg scheduled arm and 6% in 10 mg/kg scheduled arm). It should be noted that the comparison between ‘episodic’ and scheduled treatment is not a randomised one (see Quality assessment of ACCENT 1). Given that the ‘episodic’ group included patients who crossed over from the scheduled treatment groups and the fact that 46% of total samples were inconclusive, it is unclear how robust these results are.

Safety issues; rare serious adverse events

Information extracted from the RCTs included for review of clinical effectiveness provides little long-term evidence about safety. Anti-TNF therapies have now been licensed for multiple indications and data about rare serious AEs have gradually accumulated. In this section the relevant safety issues following from these data are briefly reviewed.

Bongartz et al.77 meta-analysed rates of malignancy and of serious infection reported in placebo controlled RCTs of infliximab and adalimumab in rheumatoid arthritis. Information in published papers and from the US FDA website was used for the analysis. Odds ratios (anti-TNF versus placebo) for malignancy and infection were 3.3 (95% CI 1.2 to 9.1) and 2 (95% CI 1.2 to 3.1) respectively, and the numbers needed to harm were 154 and 59 respectively over a treatment period of 3–12 months. Higher drug doses were associated with greater risk. Similar results were reported in a meta-analysis conducted by Shoor.78

Tumour necrosis factor-α has an important role in the host immune response to Mycobacterium tuberculosis and in the immunopathology of tuberculosis.79 Patients to be treated with anti-TNF agents should be screened for tuberculosis before starting anti-TNF therapy, they should be monitored for tuberculosis during therapy and those with latent tuberculosis should be appropriately treated prior to initiation of anti-TNF therapy. A 2007 publication (Raval et al.79) detailed 130 infliximab-associated cases of tuberculosis spontaneously reported to the US FDA between 1 November 2001 and 30 May 2006. In 45% of cases there was extrapulmonary disease. In a subset of 67 cases notified after the addition of a tuberculosis warning to the boxed medication it was noted that in six instances no test had been performed and that of 47 tuberculin skin tests performed, 34 gave a negative result. The false-negative rate was unknown. These results emphasise the requirement for vigilance by physicians caring for patients treated or about to be treated with anti-TNF therapies.

Ramos-Casals et al.80 identified 233 cases of autoimmune disease apparently associated with anti-TNF therapies. Of these, 17 occurred in CD patients. Anti-TNF agents infliximab, adalimumab and etanercept were associated with various autoimmune manifestations including lupus, vasculitis and interstitial lung diseases. Overall incidence rates or rates for individual anti-TNF agents are unknown.

Elevated TNF-α is associated with heart failure and its level is correlated with severity of heart failure. Case reports (n = 47) reviewed by Kwon et al.81 indicate that anti-TNF therapy might trigger new onset heart failure in a subset of patients and might exacerbate the condition of some patients. The SPCs carry warning of this potential risk.

Treatment with monoclonal antibodies has been associated with potentially fatal induction of progressive multifocal leuco-encephalopathy. A 2008 systematic review of primary data by Socal et al.82 identified 29 cases most of which (n = 23) were associated with rituximab therapy which depletes the B-cell population. The single instance associated with anti-TNF treatment was reported for a 74-year-old woman given etanercept for rheumatoid arthritis.83

Discussion of results and assessment of effectiveness

Patient heterogeneity

Patient heterogeneity may affect results across different trials. The inclusion criteria of the trials specified a CDAI score between 220 and 400 or 450. The inclusion of patients already at a CDAI level close to remission could have improved the remission rates found. However, if patients already had a low CDAI count, achieving a reduction of 70 or 100 points would have been harder to achieve. The opposite would be more likely to be true for patients with very high initial CDAI scores. Therefore, it is unlikely that the initial wide spread of CDAI scores would have much impact on the results unless there were more patients at one end of the spread than the other. Mean CDAI scores at entry did not vary greatly between trials, so it appears unlikely that patient populations taken as a whole differed substantially between trials with respect to this parameter. Nevertheless populations probably did differ between trials as the placebo rates were heterogeneous. The corollary is that CDAI is not necessarily a reliable indicator of the seriousness of disease or of its likely progression. The CDAI score is a summary score and patients can achieve the same score yet have problems with very different aspects of their disease. Similarly, if a patient had a reduction of 70 points, that could be achieved in a variety of different ways. It is also uncertain whether a reduction of 70 or 100 points means the same in terms of reduction of disease severity for patients starting at different ends of the severity spectrum.

Cohort studies (e.g. Munkholm et al.25) demonstrate that most CD patients, at some time in their disease history, experience ‘highly active disease’ and that they cycle between highly active and quiescent periods of varying durations. Whether CD is severely debilitating for an individual depends to some extent on the frequency with which the episodes of highly active disease are repeated. Cohort studies show that this varies between patients. For these reasons a patient's CDAI score at a particular time, such as at recruitment into a trial, is not a good indicator for the likely duration of that level of disease activity or of the likely subsequent recurrence of highly active disease.

The licence indications for infliximab and adalimumab specify ‘severe’ CD but do not define how severe disease may be determined. It has been assumed that this is a CDAI score of ≥ 300. Trials have recruited patients having ‘moderate-to-severe CD’ defined according to CDAI scores of between 220 and 450, or 220 and 400; it is therefore unclear to what extent these populations fully reflect the intended licensed population.

Induction trials – placebo rates

CD is a chronic relapsing and remitting disease. Induction trials selected patients in relapse. On average, irrespective of treatment, relapsed patients will tend to improve, i.e. remit with time (their CDAI scores will reduce as they regress to the mean). This tendency would be reflected in relatively high rates of improvement in placebo groups in placebo-controlled trials and also in variation in these rates dependent on the relapse–remission cycling characteristics of each of the patients enrolled in the different trials.

The rates of response (reduction in CDAI of 70 or 100 points) and of remission in the placebo arms of the included induction trials varied from trial to trial and in some trials reached high levels (see Appendix 13 for details). Except in the Targan et al.57 trial of infliximab, by week 4 one-third or more of placebo patients had already achieved the least stringent measure of improvement (response 70). Similarly, at least 20%–25% achieved response 100 by week 4. Varied and high rates of placebo response have previously been documented for many CD intervention trials (Su et al.84). For dichotomous outcomes, variable placebo rates can profoundly influence effect size values such as risk difference and risk ratio. Thus placebo and intervention rates in two trials of 10% and 20% respectively in one and 30% and 40% respectively in the other generate identical outcome measures for risk difference (0.1 or 10%) but considerably different measures for risk ratio (2.0 and 1.3 respectively). For this reason, both placebo and intervention rates and both risk difference and risk ratio effect sizes have been presented in this report for most outcomes in the results section. The CIs quoted were not adjusted for repeated measures.

These high and varied placebo rates probably result from three influences: the tendency for CDAI scores to regress to the mean; a placebo effect; and possibly the effect of concomitant treatments allowed in the trials. The variation in placebo rates makes comparisons between trials problematic and indicates that CDAI scores alone are unlikely to be good prognostic indicators. Although recruited populations in the trials conformed to similar ranges, means or medians of CDAI score, they are likely to be clinically dissimilar.

Induction trials – effect sizes

By week 4 all induction trials, except for CLASSIC I63 at the lower dose level for adalimumab (80/40 mg/kg weeks 0 and 2), exhibited statistically significant effect sizes for anti-TNF relative to placebo for remission and response, irrespective of whether these were measured in terms of risk difference or risk ratio. The trial of infliximab by Targan et al.57 was remarkable in that the effect sizes observed were much greater than those seen in the other trials; placebo rates were notably lower in Targan et al.57 than in any other trials. Targan et al.57 was the earliest anti-TNF induction RCT and was a relatively small trial, so the point estimates of effectiveness were associated with more uncertainty than was the case for the larger induction trial of adalimumab (GAIN64). Since the publication of Targan et al.,57 no infliximab induction trial has been reported that can provide confirmatory evidence for the large effect size point estimates from the Targan et al.57 trial. The response 70 rate at 4 weeks in the intervention arm of Targan et al.57 was 81%. In the induction phase of the ACCENT I2,3 maintenance trial of infliximab, the response 70 rate at week 4 was considerably less at 59%. ACCENT I2,3 patients were administered the same dose at week 0 and patient baseline characteristics were similar to Targan et al.,57 e.g. very similar CDAI and IBDQ scores. The contribution of infliximab to the initial 59% response 70 rate in ACCENT I2,3 cannot be gauged because of lack of an appropriate control group.

The follow-up in the published adalimumab trial reports was to 4 weeks only, and there is no reliable evidence on the effectiveness of induction with adalimumab beyond this time period. Targan et al.57 provided data on infliximab for some patients up to 16 weeks (4 weeks' induction + 12 weeks' open label).

Maintenance trials – general comments on trial design

The maintenance trials conformed to what have been called ‘adaptive’ trial designs. The main features of such designs have been reviewed by Chang and Chow.85 ACCENT I,2,3 CHARM67 and CLASSIC II66 trials had adaptive trial design of the type described as ‘drop-the-loser’ with in some cases ‘adaptive treatment switching’.85 An inherent problem of ‘drop-the-loser’ design is that groups that are dropped may contain valuable information regarding the response to treatment under study. A further problem concerns how such studies should be powered; whether for the interim analysis at the point when ‘losers’ are dropped, or for the final analysis involving winners only. With treatment switching come problems of identifying the target population for the therapy of interest and a precise definition of the therapy provided. Treatment switching can lead to a change to a different hypothesis being tested. Chang and Chow state ‘From a statistical point of view adaptations to trial and or statistical procedures could (i) introduce bias/variation to data collection, (ii) result in a shift in location and scale of the target population, (iii) lead to inconsistency between the hypothesis being tested and the corresponding statistical tests’.85 In summary, these trials are susceptible to difficulties of analysis and interpretation.

Maintenance trials in adult populations wholly or predominantly of non-fistulising patients

For each drug, one large maintenance trial has been published that employed within-licence treatment regimens: the CHARM67 trial (adalimumab) and the ACCENT I2,3 trial (infliximab). The interpretation of results from the maintenance trials was hampered by the nature of the trial designs, most of which allowed for scheduled crossovers into other treatment arms (or to ‘open-label treatment’). This led to a proportion of patients in the placebo arms of the trials receiving variable amounts of drug. In order to comply with an ITT analysis, these patients (and those who withdrew) were mainly counted as treatment failures for binary outcomes such as remission or response. Not all trials clearly defined the handling of missing data or data for patients who crossed over. Where there were missing continuous data, the last observation carried forward was used in ACCENT I2,3 but not in CHARM,67 the effect of which on results is unclear.

There were particular concerns over the ACCENT I trial (Rutgeerts et al.2 publication), as its stated aim of comparing episodic with scheduled treatment is misleading as no patients were randomised to an episodic treatment arm. Proper comparison of two strategies has been implemented by Breban et al.86 who randomised patients with ankylosing spondylitis to induction with infliximab followed by continuous treatment or followed by treatment adapted to symptom recurrence. Similarly, Menter et al.70 conducted an unbiased comparison of continuous and intermittent infliximab strategies for psoriasis by re-randomising at the start of the compared strategies (week 14) patients who had initially been randomised (week 0) to different infliximab induction regimens. There were also uncertainties regarding the impact of methods for handling of missing data in the analysis including both responders and non-responders.

Responder/non-responder subgroups

The interpretation of the maintenance trials was further complicated by the fact that a subgroup of patients (responders) were selected for analysis or randomisation at varying time points after an induction period during which all patients received the study drug. For both of the large maintenance trials of within-licence treatment regimens (CHARM67adalimumab; ACCENT I2,3infliximab) the published effectiveness results all focused on the ‘responders’ subgroup. Separate results for non-responders were not reported (see Table 40), although both CHARM67 and ACCENT I2,3 randomised both responder and non-responder patients. The definition of responders differed somewhat between the two trials. Furthermore, the induction phases used in both trials differed with respect to duration and number of induction doses administered. The consequence of these considerations is that attempting any comparisons of effectiveness between the trials is very problematic. The proportion of patients categorised as responders in each of these trials was 64% for CHARM67 and 58% for ACCENT I.2,3

It is known from trials where results were also reported for (randomised) non-responders that initial non-responders can still respond later, so it is unclear which patients this subgroup of responders actually represents in clinical practice. It is possible that a subgroup of responders chosen at a different time point would have led to different results. There is no published evidence or information in the manufacturers' submissions to show that compared with non-responders, responders benefit more from the treatment (compared with placebo). The selection of responders at different time points in different trials also hampers any comparisons between the trials.

Reporting effectiveness results for a subgroup but not for all randomised patients (or not for all patients who commenced treatment) appears at odds with usual practice. For example, in placebo-controlled randomised trials of anti-TNF agents (infliximab, adalimumab and etanercept) for the treatment of rheumatoid arthritis, results for all patients have been analysed and presented.87 Dichotomising patients into responders and non-responders makes clinical sense only if a ‘response’ at the time of the dichotomisation is a good prognostic tool for identifying those patients most likely to benefit from maintenance of treatment. In order to find this out, the comparison of results for responders and non-responders is required, which, unfortunately, is the precise analysis that was not undertaken in these trials. Thus there is no evidence available to indicate that subgrouping patients in the ways described is a useful practice. The usefulness of the results reported for responders only is therefore questionable.

The ACCENT I2,3 trial dichotomised patients according to their response at 2 weeks after the induction infusion of infliximab. The decision to do this may have been derived from previous research. The 1997 induction study of Targan et al.57 provided data up to 4 weeks after a single infusion of infliximab at 5 mg/kg. This study reported that the mean CDAI score in the placebo group remained constant from weeks 2 to 4, while the risk differences (infliximab vs placebo) for remission (score < 150 points) and for a 70-point reduction in CDAI score increased from 0.37 to 0.44 and from 0.62 to 0.65 respectively. Placebo rates for these outcomes remained constant from weeks 2 to 4. Although the study was small and the point estimates were associated with considerable uncertainty, these results imply that some patients not responding at 2 weeks do in fact go on to respond at a later time. In ACCENT I2,3 (CiC information has been removed).

In the absence of appropriate analyses it appears that dichotomising patients as early as 2 weeks after a single infliximab infusion is probably premature and does not appear to be a clinically meaningful procedure.

In the CHARM trial67 the time chosen for categorisation into responders and non-responders (at week 4) was not based on efficacy data but was ‘based on pharmacokinetic model estimates for when maximal drug concentrations should be present’. CiC results were available for non-responder patients at weeks 12, 26 and 56, so that it was possible to calculate the response rates among all randomised patients. The risk difference and risk ratio results for remission, response 100 and response 70 are summarised in Figures 31 and 32 respectively. (CiC information has been removed.)

Figure Icon

FIGURE 31

(CiC information has been removed.)

Figure Icon

FIGURE 32

(CiC information has been removed.)

The two large maintenance trials (CHARM67 and ACCENT I2,3) provided evidence that for the subgroups of patients defined as ‘responders’, anti-TNF therapy was more beneficial than placebo with respect to the proportions of patients exhibiting remission or response (70 or 100). Rates at multiple follow-up times extending to week 56 for CHARM67 (remission rates) were reported in the published paper. For the ACCENT I2,3 trial, results for weeks 30 and 54 (response 70 and remission) only were published, but CiC information for multiple time points was provided. The higher rates of response for intervention versus placebo might lead to the conclusion that extended therapy over the prolonged follow-up is beneficial and/or necessary for maintenance of response. However, examination of all the available information indicates that nearly all benefit observed for intervention over placebo was generated early on and that risk differences thereafter remained relatively stable or decreased. These results imply that prolonged treatment after the initial benefit has been attained is uneconomical and, as anti-TNF agents are associated with significant health risks, may be clinically ill-advised. The dose regimens required to attain this early benefit are likely to be different for adalimumab and infliximab.

The published results for the CHARM67 trial graph ‘% patients maintaining remission’ (Figure 2B in CHARM67 publication) versus follow-up time depicted increased rates of remission following after decreased remission rates, demonstrating that patients who achieved remission at late follow-up times are counted as ‘maintaining remission’ and that in fact, the point prevalence of remission is represented in the graph rather than maintenance of individual patients in remission. If this is the case, a late time point (e.g. 30 or 54 weeks) value reported does not necessarily inform about maintenance of response during follow-up as it is possible that those registered as ‘in response’ may only have achieved this status just prior to the time point reported. It was unclear, but appeared possible, that point prevalence of response was the statistic reported in the ACCENT I2,3 published reports; however, (CiC information has been removed).

The most appropriate way to determine the ability of anti-TNF agents to maintain response in patients who were defined as responders is time-to-event analysis with statistical comparison using a log rank test. For ACCENT I,2,3 median time to treatment failure was 19 weeks and 38 weeks for placebo and 5 mg/kg infliximab groups respectively (p = 0.002); however, the definition of treatment failure used in this analysis was complex, did not correspond to a loss of response 70 status, and its clinical impact was difficult to gauge. The CHARM67 trial (adalimumab) reported median duration of remission for those responders who achieved remission starting at any time during follow-up. The median times were 127 days for the placebo group, 378 days for the 40 mg/kg adalimumab e.o.w. group and > 392 days for the 40 mg/kg weekly dosage regimen group.

Trials recruiting patients with draining fistula

One induction trial provided evidence that infliximab promotes fistula closure to a greater extent than placebo. The ACCENT II65 trial of infliximab maintenance treatment (IMT) focused on responders (69% of patients receiving induction doses). IMT promoted closure of fistulas to a statistically significantly greater extent than did placebo. There was evidence that a reduction of dose frequency from every 4 weeks to every 8 weeks was associated with poorer maintenance of fistula closure. Limited evidence from the CHARM67 trial suggested that adalimumab may also promote fistula closure.

It is possible that fistula closure may not necessarily be a desirable outcome as it may result in increased development of abscesses. A post hoc analysis of patients participating in the ACCENT II65 trial found no significant difference in abscess incidence between two groups receiving different mean dosages of infliximab. Interpretation of these results is problematic because results for the most appropriate comparison (placebo vs infliximab) were not available.

Trials recruiting paediatric patients

Two trials of infliximab, one induction and the other maintenance, reported on the treatment of paediatric patients. Unfortunately, in these trials all patients received infliximab, and no reliable inferences regarding the effectiveness of the intervention were possible because the spontaneous response rates in the population were unknown. The more frequent of the two dosage regimens used in the REACH45 trial (5 mg/kg every 8 or 12 weeks) resulted in statistically significant greater rates of response and remission, a dose response relationship that likely implies beneficial effect of infliximab relative to placebo or standard treatment, but a placebo-controlled trial would have provided far stronger evidence of effectiveness.

Differences in effectiveness of anti-TNF agents; indirect comparisons

No head-to-head trials were found that compared the effectiveness of adalimumab and infliximab. However, the existence of placebo-controlled induction and maintenance trials for both drugs means that adjusted indirect comparisons of effectiveness were theoretically feasible using methods described by Glenny et al.74

The indirect comparison of trials was hampered in this case by a number of factors. One of these was differing placebo rates found for induction trials and unknown or uncertain placebo rates in maintenance trials (because all patients receive active intervention early in the trial). Patients with CD can experience spontaneous clinical improvement without treatment. Su et al.84 conducted a meta-analysis of CD trials looking at placebo rates for remission and response (based on the CDAI). The authors found substantial heterogeneity between placebo rates and found that these were in the main attributable to follow-up duration, number of follow-up visits and CDAI score at entry to the trial (see Appendix 13). Because of the variation in placebo rates in the induction trials, indirect comparison was not made.

Indirect comparison of effectiveness using maintenance trials was judged unlikely to deliver valid results. For responders, the placebo arms of compared trials were not truly comparable because the groups had received different anti-TNF induction drugs on differing numbers of occasions and for different periods of time; furthermore, the ‘responder’ groups consisted of different proportions of the randomised populations according to differing criteria. For all patients' results again placebo groups were not truly similar between trials and, additionally, availability of results for all patients in the adalimumab trials was limited (see Appendix 11); furthermore, the permitted crossover of variable proportions of placebo group patients to active intervention at weeks 12 and 14 of the CHARM67 and ACCENT I2,3 trials would render analyses unreliable.

Adverse events

The large number of crossovers in the trials made the comparison of AE rates between treatment and placebo arms difficult, as many patients in the placebo arms will have received some study drug. In addition, the maintenance trials either gave an induction bolus of the drug at the start of the trial then randomised to treatment or placebo, or enrolled patients from a previous induction trial. Similarly, it is difficult to tell what the true rates for the development of antibodies are for each of the drugs, again due to crossovers and induction doses. It was not within the remit of this project to examine the test accuracy of antibody determination used in the trials. Increased risk of malignancies and of infection is evidenced from published information about anti-TNF agents used in all their indications. Vigilance with respect to tuberculosis and patients with potential or suspected heart failure should be mandatory.

Summary of effectiveness results

There were no included RCTs with severe CD patients only. They all included moderate-to-severe CD.

  1. The general pattern of results is similar for the two drugs.
  2. There is a good initial, clinically significant, improvement for the majority of patients when given induction treatment with infliximab or adalimumab. The short duration induction trials demonstrated that the majority of CD patients with moderate-to-severe disease gained clinical benefit from a single i.v. infusion of infliximab (5 or 10 mg/kg) or two subcutaneous injections of adalimumab (80 mg and 40 mg or 160 mg and 80 mg) separated by 2 weeks. Published estimates for the proportion benefiting depended on the measure of clinical response employed and were associated with considerable uncertainty (e.g. 95% lower CI to upper CI ranged from 13% to 66% and 16% to 47% for remission at week 4 from infliximab and adalimumab respectively depending on dose and trial). Obtaining a valid estimate of effectiveness for the two drugs and for their relative efficacy was plagued with difficulties contingent on the small number of trials, their small size, differences between the populations examined, and uncertainties concerning the most appropriate induction regimen to be used and the imprecision of trial results.
  3. Although there exists a core of responders of indeterminate size who maintain an anti-TNF dependent response, in general the initial good response is not well maintained with extended treatment. This is evidenced in three ways:
    1. The percentage of patients in response (or with remission) fades away after the first weeks or so of maintenance therapy.
    2. Large numbers of patients drop out of treatment. In ACCENT I,2,3 34% of patients dropped out, some before dose escalation, some after; in the active treatment arms, 25% withdrew in ACCENT I2,3 and (CiC information has been removed)% in CHARM.67 Among responders in CHARM,67 about (CiC information has been removed)% withdrew from active treatment.
    3. Large numbers of patients required dose escalation and/or transfer to open label (CHARM67) because of worsening disease. In ACCENT I,2,3 68% in the 5 mg/kg arm required dose escalation, as did 49% of those in the higher dose arm. In CHARM67 about 30% in the adalimumab arms crossed over to escalated dose or open-label therapy.

These results indicate that during extended treatment an appreciable proportion of patients decide there is an unsatisfactory balance between the actual benefit of anti-TNF and its perceived benefits. The withdrawal rates in these trials are not similar to other monoclonal antibody interventions and contrast with the > 90% compliance over 52 weeks observed for i.v. weekly infused eculizumab.88 The high requirement for dose escalation reflects efforts to resuscitate a fading response; the continuing dropout rate after escalation shows that these efforts meet with limited success.

Conclusions

Evidence from at least one induction and one maintenance trial for each drug administered within the licensed dose regimen demonstrates that for selected patients, relative to SC, these anti-TNF agents (infliximab and adalimumab) deliver statistically significant benefits of disease remission and improvement based on CDAI binary measures. Remission, response 70 and response 100 rates measured in maintenance trials indicate that for ‘responders’ nearly all benefit is achieved in about the first 12 weeks of treatment. Thereafter risk differences (anti-TNF minus placebo) remain relatively stable. These results imply that a short burst of treatment is likely to be more clinically effective and cost-effective than prolonged treatment and that after about 12 weeks the likelihood the intervention will be clinically effective and cost-effective will steadily diminish as treatment is extended unless other favourable outcomes additional to those based on CDAI measures are delivered later than 10–12 weeks.

The recruitment of patients who may not have failed alternative treatments together with the selective reporting of outcomes for early responders in the maintenance trials means it is difficult to gauge the effectiveness of these drugs in maintaining favourable outcome among the whole patient population with moderate-to-severe CD who are resistant to other treatments. Because of inappropriate study designs, heterogeneity of patients, incomplete and/or selective reporting of outcomes and lack of head-to-head trials, no convincing objective evidence was available to indicate whether one drug was superior to another either in respect to effectiveness or to safety.

© 2011, Crown Copyright.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK100752
PubReader format: click here to try

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (3.3M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...