U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Samson DJ, Ratko TA, Rothenberg BM, et al. Comparative Effectiveness and Safety of Radiotherapy Treatments for Head and Neck Cancer [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2010 May. (Comparative Effectiveness Reviews, No. 20.)

Cover of Comparative Effectiveness and Safety of Radiotherapy Treatments for Head and Neck Cancer

Comparative Effectiveness and Safety of Radiotherapy Treatments for Head and Neck Cancer [Internet].

Show details

Methods

Topic Development

The topic of this report and preliminary key questions were developed through a public process involving the public, the Scientific Resource Center (www.effectivehealthcare.ahrq.gov/aboutUS/contract.cfm) for the Effective Health Care program of the Agency for Healthcare Research and Quality (AHRQ), and various stakeholder groups. Additional study, patient, intervention, and eligibility criteria, as well as outcomes, were refined and agreed upon through discussions between the Blue Cross and Blue Shield Association Technology Evaluation Center Evidence-based Practice Center (BCBSA TEC EPC), the Technical Expert Panel (TEP) members, our AHRQ Task Order Officer, and comments received from the public.

Search Strategy

Electronic Databases

The following databases were searched for citations (search strategy can be found in Appendix A). The search was not limited to English-language references; however, foreign-language references to single-arm studies were not translated and abstracted.

  • MEDLINE®(January 1, 1990, through September 28, 2009)
  • EMBASE® (January 1, 1990, through September 28, 2009)
  • Cochrane Controlled Trials Register (no date restriction)

Single-arm studies, which are not a main focus of this review, were selected from studies identified through the January 13, 2009 search result update. Comparative studies were identified through the latest search updates.

The TEP and individuals and organizations providing peer review were asked to inform the project team of any studies relevant to the key questions that were not included in the draft list of selected studies.

We examined the bibliographies of all retrieved articles for citations to any randomized, controlled trial or nonrandomized comparative study that was missed in the database searches. In addition, we searched abstracts for the past 5 years of meetings of the American Society of Therapeutic Radiation Oncology (ASTRO) and the American Society of Clinical Oncology (ASCO).

Search Screen

Search results were stored in a ProCite® database. The study selection process is outlined in Figure 1. Using the study selection criteria for screening titles and abstracts, a single reviewer marked each citation as either: (1) eligible for review as full-text articles; (2) ineligible for full-text review; or (3) uncertain. Citations marked as uncertain were reviewed by a second reviewer and resolved by consensus opinion, with a third reviewer to be consulted if necessary. Using the final study selection criteria, review of full-text articles was conducted in the same fashion to determine inclusion in the systematic review. Of 2,679 citations, 354 articles were retrieved and 108 selected for inclusion (Figure 2). Records of the reason for exclusion for each paper retrieved in full-text, but excluded from the review, were kept in the ProCite® database (see Appendix B, Excluded Studies).

Figure 1. Study selection process.

Figure 1

Study selection process.

Figure 2. QUOROM flow diagram.

Figure 2

QUOROM flow diagram.

Study Selection

This Evidence Report takes a two-tiered approach to evidence of the comparative effectiveness and safety of four types of radiotherapy. The primary focus is on comparative studies of these techniques to each other or to 2DRT, which was commonly used before the diffusion of IMRT and 3DCRT. The secondary focus is on reviewing single-arm studies on any of the technologies of interest for potential hypothesis generation.

The diagram in Figure 1 describes how we proceeded through this comparative effectiveness review, from conducting the literature search to applying the selection criteria. The complexity of the diagram stems from two factors: first, the need to insure that all relevant studies are included (hence the second review of excluded full-text articles, the review of bibliographies of abstracted articles, and the several updates performed while the review was being prepared) and second, the need for complete and accurate abstraction of the data from the included articles.

Further steps included data extraction and summary (see Data Extraction and Analysis, following), quality assessment (see Assessment of Study Quality, following), and finally evidence synthesis and interpretation. Assessment of the quality of the selected studies is an important part of how we conducted this review; however, interpretation of the body of evidence for a particular class of interventions entailed more than that. Quality assessment informed the critical appraisal of the results and conclusions of each type of study, but rating classes did not give a complete picture of the strength of the body of evidence.

Beyond quality ratings for each study, we explored the methodologic strengths and weaknesses of different study designs (randomized, controlled trials, nonrandomized comparative studies, and prospective or retrospective single-arm studies), to identify which can generate provide evidence on the efficacy and safety of the radiotherapy modalities and which can only help generate hypotheses that require later confirmation. All of these activities contributed to interpreting the overall strength of the evidence and determining whether conclusions could be drawn with respect to key questions.

Types of Studies

Studies were included for Key Question 1 and Key Question 2 if they were:

  • Randomized trials, nonrandomized comparative studies, or single-arm intervention studies, that:
    • reported on an outcome of interest specifically among patients with head and neck cancer;
    • involved an intervention of interest, excluding noncomparative studies describing use of 2DRT(defined below) only;
    • reported results separately in individual patient groups according to radiation therapy modality received, except for proton beam therapy, where the results of photon and proton therapy may be combined;
    • reported tumor control data compiled separately according to tumor site, or included a multivariable analysis that controlled for anatomic location and evaluated the impact of type of radiotherapy on tumor control outcomes.
  • Single-arm studies with 25 or more evaluable patients that adhere to all aforementioned criteria and provide descriptive information on tumor characteristics particularly location and histology. Single-arm (noncomparative) studies of 2DRT were excluded because this radiotherapy technique is currently little practiced. Studies had to use the same type of radiotherapy for boost as for the planning treatment volume; 2DRT or electrons could be used in the lower neck.

The criteria allowing the use of a different type of therapy in the lower neck and the use of photons and protons combined were developed after the beginning of the project. These issues arose during the data abstraction process and were resolved with the assistance of the two members of the TEP who provided extended consultation.

Dose planning studies that did not report any outcome of interest were not included. While such studies may show apparently better dose distributions for IMRT or proton beam therapy over 3DCRT or 2DRT, this review emphasizes outcomes such as adverse events, quality of life, tumor control, and patient survival. Dose distribution is considered an intermediate outcome, which may be related to health outcomes, but by itself does not establish the comparative effectiveness of different radiotherapy techniques.

Studies were included for Key Question 3 if they met the selection criteria for Key Questions 1 and 2 and also:

  • presented treatment outcome data associated with different categories or levels of:
    • tumor characteristics,
    • tumor anatomic locations, or
    • patient characteristics (e.g., older versus younger).

Studies were included for Key Question 4 if they met the selection criteria for Key Questions 1 and 2 and also:

  • presented treatment outcome data associated with different categories or levels of:
    • user experience (years of experience with IMRT, number of patients treated with IMRT, formal training in IMRT),
    • target volume delineation (gross tumor volumes, clinical target volumes, planning target volumes, lymph node regions, organs at risk), or
    • dosimetric parameters (dose to targets, dose constraints for organs at risk).

Types of Participants

The populations of interest for all four Key Questions included patients with head and neck cancer. To define what constitutes head and neck cancer, we consulted with clinical resources such as the National Cancer Institute’s Physician Data Query (PDQ) Cancer Information Summary (www.cancer.gov), the oncology textbook edited by DeVita, Hellman, and Rosenberg,8 and the National Comprehensive Cancer Network Clinical Practice Guidelines in Oncology.1 The consensus definition of head and neck cancer includes tumors of:

  • larynx;
  • pharynx (hypopharynx, oropharyx, and nasopharynx);
  • lip and oral cavity;
  • paranasal sinus and nasal cavity;
  • salivary gland; and
  • occult primary of the head and neck

The following tumors were excluded:

  • brain tumors;
  • skull base tumors;
  • uveal/choroidal melanoma, other ocular and eyelid tumors;
  • otologic tumors;
  • cutaneous tumors of the head and neck (including melanoma);
  • thyroid cancer;
  • parathyroid cancer;
  • esophageal cancer; and
  • tracheal tumors.

Tumor site was not necessarily defined as occurring in one anatomic location. For example, for purposes of data abstraction, “oral cavity” was considered as one site, although it technically involves multiple anatomic sites (e.g., buccal mucosa, the anterior two-thirds of the tongue, lips, etc.).

Treatment Setting

The original categories for therapeutic settings were refined after abstraction* to fit the mix of approaches used in the studies and to create meaningful categories for data synthesis. The final list follows:

  • Primary (definitive): radiotherapy only (no surgery, with or without chemotherapy)
  • Preoperative radiotherapy: radiotherapy before surgery. (with or without chemotherapy)
  • Postoperative (adjuvant): radiotherapy after surgery (with or without chemotherapy)
  • Reirradiation: radiotherapy after earlier radiotherapy (other treatments irrelevant)

Chemotherapy regimens given in conjunction with radiotherapy could be described in the following ways:

  • Concurrent chemoradiotherapy: radiotherapy and chemotherapy at the same time (with or without surgery)
  • Post-radiotherapy (adjuvant) chemoradiotherapy: chemotherapy given after radiotherapy (with or without surgery)
  • Pre-radiotherapy (neoadjuvant) chemoradiotherapy: chemotherapy given before radiotherapy (with or without surgery)
  • Split chemoradiotherapy: chemotherapy given both before and after radiotherapy (with or without surgery)

Initial review of studies revealed a wide variety of treatment settings defined by radiotherapy techniques in relation to both surgery and chemotherapy. Studies addressing only primary radiotherapy without surgery or chemotherapy were quite rare, so we included studies that addressed a single setting other than primary radiotherapy as well as studies that addressed a group of patients receiving a mix of settings. Evidence is reviewed first among studies that addressed a single setting, then among studies that included mixed settings.

The relevant practice settings were

  • hospitals and
  • outpatient radiotherapy facilities.

Subpopulations of interest included: age, race or ethnicity, sex, disease severity and duration, weight (body mass index), and prior treatments.

Types of Interventions

The interventions of interest were:

  • intensity-modulated radiotherapy (IMRT), defined as any treatment plan where intensity-modulated radiation beams and computerized inverse treatment planning is used;
  • three-dimensional conformal radiotherapy (3DCRT), defined as any treatment plan where CT-based treatment planning is used to delineate radiation beams and target volumes in three dimensions;
  • proton beam therapy (PBT), defined as any treatment plan where proton beam radiation is used; and
  • conventional two-dimensional radiotherapy (2DRT), defined as treatment planning where only 2D projection radiographs are used to delineate radiation beams and target volumes.

Studies were excluded when a mix of radiotherapy modalities was used, such as 2DRT plus IMRT boost or 3DCRT plus brachytherapy. Boost techniques were allowed if they were of the same modality as the main technique (e.g., IMRT with IMRT boost). Conventional 2DRT were addressed to the extent that comparative studies included groups of patients that received 2DRT. However, noncomparative studies of 2DRT werenot sought. Data on other comparators such as stereotactic radiosurgery or similar modalities also were not sought.

Types of Outcomes

In general, outcomes should be standard, valid, reliable, and clinically meaningful. Primary (health) outcomes included:

  • radiation-induced toxicities;
  • adverse events, both acute and chronic normal tissue toxicity, such as
    • xerostomia,
    • dysphagia;
    • mucositis,
    • skin toxicity,
    • osteoradionecrosis or bone toxicity, and
  • effect on quality of life;
  • clinical effectiveness, including
    • local and locoregional control,
    • time to any recurrence (disease-free survival), and
    • patient (disease-specific and overall) survival.

Secondary (intermediate) outcomes included:

  • salivary flow and
  • probability of completing treatment according to protocol.

Health outcomes were given greatest emphasis. Health outcomes may be defined as those directly related to length of life, quality of life, function, symptoms, or harms. Intermediate outcomes may reflect physiologic processes are important to the extent that they are related to health outcomes. The specific primary and secondary outcomes selected here were those for which more than five comparative studies provided data and clinical expert consensus indicated their importance.

Data Extraction and Analysis

Data Elements

The data elements following were abstracted, or recorded as not reported, from intervention studies. Data elements to be abstracted were defined in consultation with the TEP. They included the following:

  • critical features of the study design:
    • patient inclusion/exclusion criteria
    • number of participants and flow of participants through steps of study
    • treatment allocation methods (including concealment)
    • use of blinding
  • patient characteristics, including:
    • age
    • sex
    • race/ethnicity
    • disease and stage
    • tumor histology
    • tumor size
    • disease duration
    • other prognostic characteristics (history of tobacco use, etc.)
  • treatment characteristics, including:
    • localization and staging methods
    • computerized treatment planning
    • radiation delivery source
    • regimen, schedule, dose, duration of treatment, fractionation, boosts
    • beam characteristics
    • immobilization and repositioning procedures
    • concurrent treatments and details
  • outcome assessment details:
    • identified primary outcome
    • secondary outcomes
    • response criteria
    • use of independent outcome assessor
    • follow-up frequency and duration
  • data analysis details:
    • statistical analyses (statistical test/estimation results)
      • test used
      • summary measures
      • sample variability measures
      • precision of estimate
      • p values
    • regression modeling techniques
      • model type
      • candidate predictors and methods for identifying candidates
      • univariate analysis results
      • selected predictors and methods for selecting predictors
      • testing of assumptions
      • inclusion of interaction terms
      • multivariable model results
      • discrimination or validation methods and results
      • calibration or “goodness-of-fit” results

The same abstraction tables were used for comparative and single-arm studies, although some elements did not apply to the latter (e.g., description of control group). A few studies were randomized on a treatment other than radiotherapy, e.g., type of chemotherapy. They were treated as single-arm studies for the purposes of this comparative effectiveness review.

Evidence Tables

Templates for evidence tables were created in Microsoft Excel® and Microsoft Word®. One reviewer performed primary data abstraction of all data elements into the evidence tables, and a second reviewer reviewed articles and evidence tables for accuracy. Disagreements were resolved by discussion, and if necessary, by consultation with a third reviewer. When small differences occurred in quantitative estimates of data from published figures, the values obtained by the two reviewers were averaged.

Assessment of Study Quality

Definition of Ratings Based on Criteria

In consultation with the AHRQ Task Order Officer and TEP, the general approach to grading individual comparative studies developed by the U.S. Preventive Services Task Force9 (USPSTF) was applied to primary studies. The quality of the abstracted studies and the body of evidence was assessed by two independent reviewers. Discordant quality assessments were resolved with input from a third reviewer, if necessary.

The quality of studies was assessed on the basis of the following criteria:

  • Initial assembly of comparable groups: adequate randomization, including concealment and whether potential confounders (e.g., other concomitant care) were distributed equally among groups
  • Maintenance of comparable groups (includes attrition, crossovers, adherence, contamination)
  • Important differential loss to follow-up or overall high loss to follow-up
  • Measurements: equal, reliable, and valid (includes masking of outcome assessment)
  • Clear definition of interventions
  • All important outcomes considered
  • Analysis: adjustment for potential confounders, intention-to-treat analysis

The rating of intervention studies encompasses the three quality categories described here.

  • Good: Meets all criteria ; comparable groups are assembled initially and maintained throughout the study (follow-up at least 80 percent); reliable and valid measurement instruments are used and applied equally to the groups; interventions are spelled out clearly; all important outcomes are considered; and appropriate attention is given to confounders in analysis. In addition, for randomized, controlled trials, intention to treat analysis is used.
  • Fair: Studies graded “fair” if any or all of the following problems occur, without the fatal flaws noted in the “poor” category below: In general, comparable groups are assembled initially but some question remains whether some (although not major) differences occurred with follow-up; measurement instruments are acceptable (although not the best) and generally applied equally; some but not all important outcomes are considered; and some but not all potential confounders are accounted for. Intention-to-treat analysis is done for randomized, controlled trials.
  • Poor: Studies graded “poor” if any of the following fatal flaws exists: Groups assembled initially are not close to being comparable or maintained throughout the study; unreliable or invalid measurement instruments are used or not applied at all equally among groups (including not masking outcome assessment); and key confounders are given little or no attention. For randomized, controlled trials, intention-to-treat analysis is lacking.

The quality of included nonrandomized comparative intervention studies was also assessed based on a selection of items proposed by Deeks et al.10 to inform the USPSTF approach, as follows:

  • Was sample definition and selection prospective or retrospective?
  • Were inclusion/exclusion criteria clearly described?
  • Were participants selected to be representative?
  • Was there an attempt to balance groups by design?
  • Were baseline prognostic characteristics clearly described and groups shown to be comparable?
  • Were interventions clearly specified?
  • Were participants in treatment groups recruited in the same time period?
  • Was there an attempt by investigators to allocate participants to treatment groups in an attempt to minimize bias?
  • Were concurrent/concomitant treatments clearly specified and given equally to treatment groups?
  • Were outcome measures clearly valid, reliable and equally applied to treatment groups?
  • Were outcome assessors blinded?
  • Was the length of follow-up adequate?
  • Was attrition below an overall high level (less than 20 percent)?
  • Was the difference in attrition between treatment groups below a high level (less than 15 percent)?
  • Did the analysis of outcome data incorporate a method for handling confounders such as statistical adjustment?

The quality of included single-arm intervention studies was assessed based on a set of study characteristics proposed by Carey and Boden11 (Table 1), as follows:

Table 1. Carey and Boden case series quality assessment tool.

Table 1

Carey and Boden case series quality assessment tool.

  • Clearly defined question
  • Well-described study population
  • Well-described intervention
  • Use of validated outcome measures
  • Appropriate statistical analyses
  • Well-described results
  • Discussion and conclusion supported by data
  • Funding source acknowledged

The quality of included predictive studies was assessed based on an approach we applied to a systematic review of HER2 testing for breast cancer and other solid tumors.12

Table 2 shows the framework for evaluating how informative different designs and analytic strategies would be to predictions of outcomes according to different categories or levels of predictive factors. The most informative scenario would be a trial in which randomized assignment to treatment groups would be stratified by predictive factor level or patients were randomized to receive treatment guided by predictive factor or not.13 An adequately powered stratified randomization would allow valid inferences of treatment by predictive factor interactions. Randomized trials generally are preferred because they convey the possibility of determining differences in the relative efficacy of two treatments, whereas single-arm studies can only assess the association between predictive factor and outcomes after a single treatment regimen. Subgroup analyses in randomized trials should ideally assess the significance of treatment effect interactions. Prespecified subgroups analyses guard against the problems of data dredging.

Table 2. Hierarchy of study design and conduct for assessing prediction of outcome.

Table 2

Hierarchy of study design and conduct for assessing prediction of outcome.

Post-hoc subgroup analyses may generate hypotheses, but may not support strong inferences about differential effectiveness. Multivariable subgroup analyses in randomized trials may be useful if the subgroup variable introduces imbalances between different variable by treatment combinations, particularly when only a subset of patients have tumor or serum specimens available. An alternative to multivariable subgroup analysis is cross tabulation of treatment by predictive factor level results. The weakness of this approach is failure to control for imbalances in any important prognostic factors, particularly if the patients analyzed are a subset of those randomized. A formal test of interaction is preferred for any trial subgroup analysis. In single-arm (identically treated) studies, multivariable analyses may identify whether a variable is a significant independent predictor of treatment outcome while taking into account the separate influences of other predictors. The least informative situation would be a single-arm study which presents univariate comparisons of predictive factor groups.

To assess the quality of predictive studies, we adapted the “Reporting Recommendations for Tumor Marker Prognostic Studies” (REMARK) statement.14 A checklist based on portions of REMARK and other sources15–22 was developed. Table 2 identifies good quality characteristics that we looked for in predictive studies, including: prospective design; prespecified hypotheses about relation of predictive factor to outcome; large, well-defined, representative study population; predictive factor measurement methods well-described; blinded assessment of predictive factor in relation to outcome; homogeneous treatment(s), either randomized or rule-based selection; low rate of missing data (15 percent or less); sufficiently long follow-up; well-described, well-conducted multivariable analysis of outcome.

Assessment of Applicability

Applicability of findings in this review was assessed within the EPICOT 23 framework (Evidence, Population, Intervention, Comparison, Outcome, Timestamp). Selected studies were assessed for relevance against target populations, interventions of interest, and outcomes of interest.

Data Synthesis

Given that there are only three, quite clinically diverse, randomized trials involving the interventions of interest for treatment of head and neck cancer, this evidence review did not incorporate formal data synthesis using meta-analysis. Rather, the synthesis emphasized comparative studies sorted by specific head-to-head comparisons of interventions, specific patient characteristics, specific outcomes and status relative the evidence hierarchy/study quality assessment. Greater consideration was given to the studies that were more homogeneous in terms of treatment setting and tumor site.

Rating the Body of Evidence

The system used for rating the strength of the overall body of evidence was developed by AHRQ24 for the EPC Methods Guide, based on a system developed by the GRADE Working Group.25 This system explicitly addresses the following domains: risk of bias, consistency, directness, and precision. Grade of evidence strength is classified into the following four categories:

HighHigh confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.
ModerateModerate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.
LowLow confidence that the evidence reflects the true effect. Further research is likely to change our confidence in the estimate of effect and is likely to change the estimate.
InsufficientEvidence is either unavailable or does not permit estimation of an effect.

If concerns arose with the body of evidence, additional domains would be addressed, such as strength of association, publication bias, coherence, dose-response relationship, and residual confounding.

Quality of Life and Symptom Measurement

Quality of life (QOL) and the impact of symptoms resulting from both the cancer itself and therapy should be measured by instruments with established validity and reliability. Although results are frequently reported as mean change in the intervention compared to control arms, this is not the preferred method of measuring outcomes. More informative, is a comparison of response, that is the proportion of patients achieving an improvement that is established representing a minimum clinically important improvement.26

Three types of instruments may be used: generic QOL instruments, which measure well being overall; disease-specific QOL instruments, which include items specific to the disease in question, e.g., swallowing and speaking, in the case of head and neck cancer; and symptom-specific instruments, which focus on a particular symptom, such as xerostomia. Table 3 lists and provides a brief description of the instruments used in the articles reviewed in this comparative effectiveness report. It also indicates whether studies were found assessing their internal consistency (measured by Cronbach’s alpha), test-retest reliability, construct validity, criterion validity, and sensitivity to change. Internal consistency refers to whether the responses to similar items are correlated; test-retest, to how stable a person’s responses are if the instrument is read ministered within a short period of time; construct validity, to the degree to which the instrument relates to the underlying concept to be measured (for example, a patient with more intense symptoms should score “worse” on a disease-specific QOL scale than a patient with less bothersome symptoms); and criterion validity, to the comparison of a scale to an existing, preferably well-validated scale.27 Using ad hoc instruments or ones whose reliability and validity have not been thoroughly examined weakens confidence in the results. Apparent differences over time or between groups may be due to measurement issues rather than to variation in the underlying condition that the instrument is used to assess.

Table 3. Summary of disease-specific quality-of-life instruments and symptom-specific instruments used in abstracted articles.

Table 3

Summary of disease-specific quality-of-life instruments and symptom-specific instruments used in abstracted articles.

Peer Review and Public Commentary

As stated, a Technical Expert Panel (TEP) provided consultation for the comparative effectiveness review and reviewed the draft report. Two TEP members provided extended consultation, primarily for issues that needed to be addressed between the TEP meetings. The draft report was also posted to the Effective Health Care website (www.effectivehealthcare.ahrq.gov) for review by external reviewers, including invited clinical experts and stakeholders. Revisions were made to the draft report based on reviewers’ comments.

Footnotes

*

The original categories for therapeutic setting were definitive radiotherapy (primary, curative intent); postoperative (adjuvant); preoperative (neoadjuvant); chemoradiotherapy; postoperative chemoradiotherapy; metastatic; recurrent (reirradiation); and palliative.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...