NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Seidenfeld J, Samson DJ, Rothenberg BM, et al. HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors. Rockville (MD): Agency for Healthcare Research and Quality (US); 2008 Nov. (Evidence Reports/Technology Assessments, No. 172.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors

HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors.

Show details

3Results and Conclusions

Narrative Review for Key Question 1

What is the evidence on concordance and discrepancy rates for methods (e.g., FISH, IHC, etc.) used to analyze HER2 status in breast tumor tissue?

HER2 assay results are influenced by multiple biologic, technical and performance factors. Since many aspects of HER2 assays have not been standardized until very recently, the effects of these disparate influences could not be isolated. This challenged the validity of using systematic review methods to compare available assay technologies. For that reason, we provide a narrative review of the following factors influencing HER2 test results and their use to classify patients: biologic processes, assay methods, and sources of variability.

Biologic Processes that Influence Cell Membrane Levels of HER2 Protein

Genes such as those in the epidermal growth factor (EGF) receptor family (HER1 through HER4) affect cellular function through the proteins they encode. The HER2 gene is expressed and HER2 protein is found in membranes of all breast and other epithelial cells, and cut-points between “normal” and “overexpressed” levels of HER2 protein are imprecise. Nevertheless, studies have associated increased amounts of HER2 protein in cell membranes with more aggressive behavior of breast and other epithelial cancers and may predict treatment outcomes (Slamon, Clark, Wong, et al., 1987; Esteva, Pusztai, Symmans, et al., 2000; Rowinsky, 2004; Hynes and Lane, 2005; Ettinger, 2006; Serrano-Olvera, Duenas-Gonzalez, Gallardo-Rincon, et al., 2006).

Expression of HER2 and similar genes is a sequential process that (in a simplified overview) includes the following steps: transcription of DNA to messenger RNA (mRNA); processing mRNA to mature, translatable messages; and translation of mature mRNA to synthesize the protein's amino acid sequence. For many proteins (including HER2), additional steps required to produce functional molecules include: post-translational modification (e.g., glycosylation), three-dimensional folding, assembly of multi-subunit proteins, and movement to the relevant cellular site or organelle (not necessarily in this sequence).

We will discuss each of the following biologic mechanisms that potentially may increase the amount of HER2 protein in cell membranes:

A.

Increased gene copy number (i.e., more than diploid amounts of HER2 DNA in cell nuclei), by:

1.

HER2 gene amplification, or

2.

Chromosome 17 polysomy;

B.

Elevated HER2 protein levels in cells with diploid amounts of HER2 DNA, by

1.

Increased rate of HER2 gene expression; or

2.

Decreased degradation (increased stability) of HER2 mature message and/or protein.

Increased gene copy number

Gene amplification. In most HER2-positive cases, increased levels of HER2 protein in breast cancer cell membranes are attributable to an amplified HER2 gene (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Slamon, Clark, Wong, et al., 1987). Gene amplification increases the copy number for a segment from one arm of a chromosome (Albertson, 2006; Myllykangas and Knuutila, 2006); amounts of the central portion (centromere) and the chromosome's other arm remain unaltered. The amplified DNA segment (amplicon) can include one or several genes. It can be organized as extrachromosomal elements, as repeated units at a single locus (which lengthens the affected chromosome arm), or repeats can be spread throughout the genome. Typically, all or most copies of the amplified gene(s) are expressed, and amounts of the excess protein increase nearly exponentially with gene copy number per cell (Szollosi, Balazs, Feurenstein, et al., 1995; Konecny, Pegram, Venkatesan, et al., 2006).

The HER2 gene has been mapped to the long arm of chromosome 17, at position 17q12 (Vanden Bempt, Drijkoningen, and De Wolf-Peeters, 2007; Jarvinen and Liu, 2006; Kauraniemi and Kallioniemi, 2006; Mano, Rosa, De Azambuja, et al., 2007). Amplicon size can vary, with from two to ten (or more) other amplified genes mapping to the region from 17q12 to 17q21. Although not relevant to assays used to classify HER2 status of patients with breast cancer, note that the gene coding for the enzyme topoisomerase II-α (TOPIIA, a target of the anthracyclines) also is located in this segment. Co-amplification of these genes may be more relevant to predict outcomes of therapy with an anthracycline regimen than amplification of the HER2 gene alone, since excess TOPIIA activity is a potential mechanism of anthracycline resistance (see “Results and Conclusions, Key Question 3”).

Chromosome 17 polysomy. HER2 gene copy number also may rise if cells have more than two copies of chromosome 17. Obviously, cells that have replicated their DNA but not yet divided have four rather than two copies of each chromosome, thus also of the HER2 gene. But some breast or other cancer cells may have extra copies of one or more whole chromosomes (termed polysomy), and may stably pass this characteristic to daughter cells. Cells with chromosome 17 polysomy have extra copies of the HER2 gene, although the ratio of HER2 copy number to centromere copy number is the same as in diploid cells unless HER2 also is amplified. However, it is uncertain whether chromosome 17 polysomy is associated with overexpression of the HER2 protein (Vanden Bempt, Drijkoningen, and De Wolf-Peeters, 2007; Beser, Tuzlali, Guzey, et al., 2007; Corzo, Bellosillo, Corominas, et al., 2007; Hyun, Lee, Kim, et al., 2008; Torrisi, Rotmensz, Bagnardi, et al., 2007; Downs-Kelly, Yoder, Stoler, et al., 2005; Ma, Lespagnard, Durbecq, et al., 2005).

Elevated HER2 protein in cells with diploid HER2 DNA. Although uncommon, clinical investigators have reported breast cancer cases with elevated HER2 protein levels in malignant diploid cells (i.e., cells lacking amplified HER2 genes or polysomy 17; e.g., Mass, Press, Anderson, et al., 2005; Vogel, Cobleigh, Tripathy, et al., 2002; Pauletti, Godolphin, Press, et al., 1996). This probably arises through increased expression of the HER2 gene, although decreased rates of degradation for either the mRNA or protein are at least theoretically possible. Increased expression may involve enhanced rates of transcription, message processing, translation, and/or post-translational modification (selectively for the HER2 gene). Detailed review of mechanisms that may increase rates of these processes is outside this report's scope.

It is uncertain whether tumors with increased membrane HER2 protein but diploid HER2 DNA respond differently to therapies (targeted to the HER2 protein, or to others) than do tumors with amplified HER2 DNA that increases HER2 protein. It is also unknown if the route to excess HER2 protein (i.e., whether from increased mRNA production, protein synthesis, or decreased degradation of either) affects tumor biology and aggressiveness or treatment outcomes. In vitro data suggest that increased membrane HER2 protein affects cell physiology, proliferation, and treatment responses in the same way, regardless of how the excess is produced (Pierce, Arnstein, DiMarco, et al., 1991).

Tissue Assays Routinely Used in Clinical Practice to Determine HER2 Status of Breast Tumors

In current clinical practice, assays used to classify breast cancer patients with respect to HER2 status detect either HER2 protein or HER2 DNA (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Research laboratories use assays for HER2 mRNA to study molecular mechanisms and biologic regulation. They are technically more difficult than protein and DNA assays, and measure less-stable molecules. Although real-time reverse transcription polymerase chain reaction (RT-PCR) methods recently were adapted to measure HER2 mRNA in fixed, paraffin-embedded tissues and compared with IHC and ISH assays (Capizzi, Gruppioni, Grigioni, et al., 2008), RT-PCR assays for HER2 mRNA are still uncommon in clinical management of patients with breast cancer and thus are not included in this review.

Each method used to determine HER2 status applies results of a quantitative or semiquantitative assay to assign a binary (“yes/no”) classification. Thus, test results with each assay can vary with different scoring systems and thresholds for positivity. As discussed in a following section (“Postanalytic Factors”), scoring and thresholds may depend on choice of reagents to detect, visualize, and quantitate analytes. Scoring systems and thresholds also have changed over time, with standardized approaches recommended quite recently (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Data are lacking to determine whether differences in treatment outcome as a function of HER2 status are affected by reclassifying patients with currently recommended scoring systems and thresholds.

Methods to detect/measure amount of HER2 protein. Immunohistochemistry (IHC) is the assay used most widely for classifying HER2 status of breast cancer patients, since it uses techniques and equipment long used by most clinical pathology laboratories for other proteins such as estrogen and progesterone receptors (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). The assay incubates thin slices of fixed tissue on a microscope slide with an antibody to HER2, washes off unbound antibody, then visualizes bound antibody. Because IHC preserves tissue architecture and cellular structure (morphology), it permits scoring to focus on antibody specifically bound to membranes of invasive breast cancer cells. IHC also permits permanent storage of stained slides if later re-evaluation is needed.

IHC scoring systems consider the proportion of antibody-stained invasive cancer cells and the intensity of staining, a partly subjective judgment. Besides the U.S. Food and Drug Administration (FDA) -approved IHC kits (HercepTest™ and PATHWAY™; see Table 2, Introduction), various antibodies to HER2 protein are commercially available as analyte-specific reagents that can be used for independently developed (so-called “home-brew”) assays (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003; Hicks and Kulkarni, 2008). Some are polyclonal, with a mix of antibody molecules that may recognize different binding site (epitopes) on the HER2 protein. Others are monoclonal, homogeneous molecules that recognize a single epitope. These differences may lead to discrepant results with different antibody reagents (Press, Hung, Godolphin, et al., 1994). Other sources of variability in IHC results are discussed in the following section, “Sources of Variability in Classifying HER2 Status.”

Protein assays on homogenized tissue may use antibody to visualize HER2 after separating proteins in a solid matrix (Western blots), or quantitate HER2 by enzyme-linked immunosorbent assay (ELISA). These assays destroy the analyzed tissue samples. Additionally, tissue extracts may mix proteins from cytosol, membranes, and other organelles; and also from multiple cell types: normal breast, inflammatory cells, in situ tumor, and invasive cancer. HER2 levels of in situ breast tumor cells often are elevated, for uncertain reasons and with inadequately studied clinical implications (Allred, Clark, Tandon, et al., 1992; Hoque, Sneige, Sahin, et al., 2002; Collins and Schnitt, 2005). Guidelines stress avoiding areas of ductal carcinoma in situ when scoring assay results (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Nearly all clinical studies on HER2 protein assays to predict treatment outcomes used IHC on tissue slices rather than assays on tissue homogenates, and assigned HER2 status by amount of HER2 protein in membranes of invasive breast cancer cells.

Methods to detect/measure HER2 gene copy number or amount of HER2 DNA. In situ hybridization (ISH) is the most commonly used method to measure HER2 gene copy number in tissue samples from breast cancer patients (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Ross, Fletcher, Linette, et al., 2003; Hicks and Kulkarni, 2008). It uses a labeled probe complementary to the DNA sequence of interest (here, a unique segment from the HER2 gene). Double-stranded DNA in cell nuclei of the fixed tissue sample is denatured so the probe can hybridize (bind) to its complementary sequence, then unbound probe is washed away. As with IHC, tissue preparation for ISH preserves tissue and cell morphology, and scoring focuses on invasive breast cancer cells.

The gene-specific probes are visualized in one of three ways: by fluorescence (FISH), a chromogenic reaction (CISH; uses digoxigenin), or silver deposition (SISH; uses dinitrophenol for enzymatic metallography). FISH requires a fluorescence microscope (more expensive and unavailable in some smaller pathology laboratories), while CISH and SISH use routine light (brightfield) microscopy. Three FDA-approved kits are available for HER2 testing by FISH (PathVysion®, Inform™, and HER2 FISH pharmDx™), while kits for CISH (SPoT-Light) and SISH (EnzMet GenePro™) are not yet approved (see Table 2, Introduction). Slides prepared for FISH testing lose fluorescence, thus, cannot be stored for later review. In contrast, slides prepared for either CISH or SISH can be archived and re-evaluated. Additionally, it is sometimes difficult to identify invasive tumor cells with fluorescence microscopy. All three ISH methods require more time per sample than IHC for slide scoring. Because they were developed recently, fewer clinical studies used CISH or SISH than FISH to classify HER2 status of breast cancer patients.

In ISH assays, pathologists count fluorescent (FISH) or dark-colored (CISH, SISH) spots visible above the nucleus to measure HER2 gene copy number: two in diploid cells; more in cells with amplified HER2 or polysomy 17. Typically, one determines gene copy number for multiple invasive cancer cells on the slide, and averages results for the tissue sample. In some ISH assays, slides are hybridized simultaneously with two probes that fluoresce in or show different colors, to permit copy number measurement for the HER2 gene and chromosome 17 centromere (CEP17). With this approach, HER2 gene status is defined by the ratio of HER2 to CEP 17 copy numbers: greater than 2 if amplified, but approximately 2 if unamplified whether chromosome 17 polysomy is absent or present.

Early research studies extracted DNA from tissue homogenates and measured amounts of the HER2 gene by Southern or slot blots, or by quantitative polymerase chain reaction (PCR) assays. Southern blots first separate DNA molecules by their mobility in a matrix, while slot blots use the mixed extract. Each selectively visualizes the DNA sequence of interest by hybridizing to labeled probes as in ISH. PCR assays amplify (selectively replicate) DNA sequences of interest in vitro, detect them by fluorescent or other probes, and quantify the starting amount using standard curves. As with protein assays on tissue homogenates, these techniques dilute DNA from invasive cancer cells with DNA from surrounding normal tissues and inflammatory cells (Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). They also consume the samples they analyze. Southern and slot blots are less sensitive than PCR and require substantially larger amounts of DNA. Southern blot assays also are labor intensive and less widely available in clinical pathology labs. The remainder of this review focuses on IHC and ISH methods, the only HER2 assays with FDA-approved kits available for clinical use.

Sources of Variability in Classifying HER2 Status

Accurately determining HER2 status depends on proper performance of preanalytic, analytic, and postanalytic steps (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Preanalytic steps are those involved in obtaining, preserving (fixing), and storing tissue samples prior to staining and analysis. Analytic steps prepare and stain fixed tissue samples with antibody to HER2 for IHC, or prepare and hybridize them to HER2 gene probe for ISH, then visualize tissue-bound antibody or probe. Postanalytic steps score test results, classify patients, and assure test quality, consistency, and reproducibility. Some processes for these steps are the same for IHC or ISH, but many differ.

Preanalytic: tissue processing and storage. HER2 tests can use tissue from core (incisional) biopsy or tumor excised for biopsy, lumpectomy, or mastectomy (Wolff, Hammond, Schwartz, et al., 2007a). Tissue sources can be the primary tumor or a lymph node or distant metastasis (Carlson, Moench, Hammond, et al., 2006). While uncommon, studies have reported discordances in HER2 status between primary tumor and metastases (for references, see Carlson, Moench, Hammond, et al., 2006). Retesting HER2 status if metastases develop after a long disease-free or progression-free interval may be warranted, depending on where and how HER2 status of the primary tumor was determined.

Tissues are prepared and preserved for assays by slicing larger samples, fixing in a denaturing solution, and embedding fixed tissue for long-term storage (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Factors that may influence test results include: edge, retraction, or crush artifacts with some core needle biopsies; time from excision to slicing, and to fixation; type of and time in fixative; choice of embedding material; and conditions and duration of storage for fixed and embedded tissues.

Guidelines seeking to standardize methods were not published until recently (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007), although prior consensus conferences (cited in the guidelines) recommended many of the same methods. Importantly, the recommended preanalytic steps are identical for tissues to be tested by IHC or ISH; these are summarized in a following section (see Table 5 in “Current Guideline Recommendations”). Systematic reviews conducted for the American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) and National Comprehensive Cancer Network (NCCN) guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006) reported data were lacking to evaluate effects of nonadherence on test results for some aspects of tissue processing. The published guidelines did not include evidence tables summarizing effects of nonadherence on test results for those aspects of tissue processing that have been evaluated comparatively.

Table 5. Summary of ASCO/CAP guideline recommendations (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007a).

Table 5

Summary of ASCO/CAP guideline recommendations (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007a).

Notably, the literature review for this report showed that most studies reporting concordance and discordance rates of different IHC and ISH assays used archived samples, fixed and embedded elsewhere than the laboratory performing the HER2 assays. With exceptions, most publications did not report adequately on adherence to guideline or prior (consensus) recommendations for tissue processing.

Analytic: performing HER2 assays. Analytic steps for processing thin sections of fixed and embedded tissue cut onto glass slides differ for IHC and ISH assays (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Each begins by deparaffinizing thin tissue sections, but IHC assays use an antigen retrieval step that optimizes antibody binding to HER2 protein while ISH assays first unwind (denature) cells' double-stranded DNA so that the probe can hybridize to its complementary sequence. The temperature and duration of heating used to bake tissue sections on slides, as well as the conditions used for antigen retrieval, can introduce variability in IHC results. Each assay incubates slides with an analytic reagent (antibody for IHC; probe for ISH), removes unbound reagent in one or more washing steps, and incubates with other reactants to visualize bound analytic reagent. Some steps can be automated, which improves consistency and reproducibility if equipment is well-maintained and regularly calibrated. In addition to reagent choice (which antibody, for IHC; which DNA probe, for ISH), varying the conditions (temperatures, durations, etc.), solutions, and reactants used for each step can affect test results, as can poorly maintained or calibrated automated equipment.

While FDA-approved kits include protocols with optimized methods for each analytic step, guideline publications report that approximately half of surveyed laboratories did not adhere completely to protocol methods (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006). The guidelines stress the need to train and periodically assess the skills of staff conducting these assays, and that each run should include standardized positive and negative controls. They also emphasize that each laboratory offering HER2 testing services should validate its test results against a previously validated test, and that laboratories departing from protocol-specified methods with FDA-approved kits, and those using independently developed assays with analyte-specific reagents, should validate test results against established methods and develop their own standard protocols.

As with preanalytic steps, most published studies did not adequately report information needed to evaluate complete adherence with guideline or prior (consensus) recommendations on all analytic steps. Studies that used FDA-approved kits rarely commented on protocol adherence in the methods sections of their reports, and studies that used independently developed assays rarely described assay validation against approved kits.

Postanalytic factors. IHC scoring systems and positivity thresholds have changed over time, and these changes likely alter the proportion of patients classified as HER2 positive (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Some studies on archived tissues classified tumors as HER2 positive if any invasive cells showed strong, complete membrane staining (e.g., Paik, Bryant, Park, et al., 1998; Houston, Plunkett, Barnes, et al., 1999; Paik, Bryant, Tan-Chiu, et al., 2000). Others classified samples as HER2 positive if 1 percent or more of invasive cells were stained (e.g., MacGrogan, Mauriac, Durand, et al., 1996; Elledge, Green, Ciocca, et al., 1998; Di Leo, Larsimont, Gancberg, et al., 2001); yet others, only if 50 percent or more were stained (e.g., Agrup, Stal, Olsen, et al., 2000; Berry, Muss, Thor, et al., 2000; Colozza, Sidoni, Mosconi, et al., 2005). Few studies adopted (or adapted) Allred's system (Harvey, Clark, Osborne, et al., 1999; developed for IHC assays of estrogen receptors), which rates the proportion of stained invasive cells (from 0 to 5) and the intensity of staining (from 0 to 3), then adds for a final score between 0 and 8.

The scale recommended in FDA-approved IHC kits (0 to 3+; developed for HercepTest™ but also used with PATHWAY™) requires membrane staining in 10 percent or more of invasive cells for scores greater than 0. The scale assigns positive scores by staining intensity and totality of membrane staining: 1+ is faint or barely perceptible staining that is incompletely circumferential; 2+ is moderate intensity but complete circumferential staining; and 3+ is strong intensity and complete circumferential staining (www.dakousa.com/prod_downloadpackageinsert.pdf?objectid_105073003). However, some studies that used this scale defined HER2-positive cases as those scored 2+ or 3+, while others classified only those with a score of 3+ as HER2 positive. The ASCO/CAP guideline retains the original definitions for scores of 0 to 2+, but recommends scoring IHC 3+ only if more than 30 percent of invasive breast cancer cells show dark, homogeneous, circumferential membrane staining in a “chicken wire” pattern (Wolff, Hammond, Schwartz, et al., 2007a). Adequate data are lacking to compare accuracy or concordance for this wide variety of scoring systems and thresholds used to classify patients' HER2 status by IHC alone. However, in one recent study (Hameed, Chhieng, and Adams, 2007), three pathologists blinded to FISH results scored IHC-stained slides from 98 breast cancer cases separately using cut-offs of 10 percent, 30 percent, and 50 percent of stained cells to classify samples as HER2+. Specificity of IHC versus FISH was 82 percent, 86 percent, and 87 percent, respectively, for the three increasing cut-offs, while concordance rates of 3+ cases with FISH were 59 percent, 64 percent, and 65 percent.

Scoring and categorizing results of ISH assays also varies (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Guidelines stress that precision and accuracy depend on the number of cells counted and averaged, on accurately identifying and only counting invasive cells, and on counting invasive cells from two or more separate areas of each tumor on either the same or sequential slide(s) (Wolff, Hammond, Schwartz, et al., 2007a). With assays estimating gene copy number per cell without normalizing to a CEP17 probe, most published studies using FISH classified tissues averaging more than 4.0 copies per cell as HER2 positive (for references, see Wolff, Hammond, Schwartz, et al., 2007; Carlson, Moench, Hammond, et al., 2006; Laudadio, Quigley, Tubbs, et al., 2007a). Most published studies using CISH scored samples HER2 positive if the average gene copy number per cell was greater than 5, although some followed the manufacturer's recommendation and defined low-level amplification as copy numbers between 6 and 10. In contrast to published studies with FISH or CISH, recent guidelines consider average scores greater than 6.0 as FISH positive, scores less than 4.0 as FISH negative, and scores between 4.0 and 6.0 as equivocal (ASCO/CAP) or borderline (NCCN). Most studies that normalized to CEP17 classified HER2 to CEP17 ratios greater than 2.0 as HER2 positive (for references, see Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Laudadio, Quigley, Tubbs, et al., 2007). The guidelines consider a HER2/CEP17 ratio greater than 2.2 as positive, a ratio less than 1.8 as negative, and ratios between 1.8 and 2.2 as equivocal (ASCO/CAP) or borderline (NCCN). As with IHC scoring and thresholds, data are lacking to evaluate consequences of the newer classification criteria on accuracy or concordance.

Guidelines and reviews caution that assigning HER2 status is partially subjective and potentially inconsistent because IHC and FISH scoring criteria are variably interpreted and applied by different raters (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Expert panels and reviewers emphasize that image analysis methods, using digital microscopy and automated cellular imaging systems (e.g., Bloom and Harrington, 2004; McCabe, Dolled-Filhart, Camp, et al., 2005; Tubbs, Pettay, Swain, et al., 2006; Ciampa, Xu, Ayata, et al., 2006; Tawfik, Kimler, Davis, et al., 2006; Moeder, Giltnane, Harigopal, et al., 2007), can decrease inter-rater variability and thus improve scoring consistency, accuracy, and precision, particularly for IHC assays. However, this requires careful validation and periodic recalibration of automated systems against standardized positive, negative, and equivocal control samples. Nevertheless, a study testing agreement between pathologists reported that use of digital microscopy to score IHC improved concordance with FISH and also decreased inter-rater variability (Bloom and Harrington, 2004).

Postanalytic steps also include reporting elements that should be provided to clinicians ordering HER2 testing, as well as quality assurance procedures (laboratory accreditation and proficiency testing; competency assessment for pathologists). However, these issues are outside the scope of this report. Readers are referred to recommendations in current guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007).

Is There a “Best” Method to Determine HER2 Status from Breast Tumor Tissue?

Although many studies reported concordance and discrepancy rates for collections of breast tumor tissue tested for HER2 status by IHC with different antibodies, or by IHC and ISH assays, or by multiple ISH assays, current evidence does not suggest one HER2 assay is superior to all others (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). As described previously, preanalytic, analytic and postanalytic methods varied between studies, and all studies preceded guidelines for standardizing these methods. Additionally, data are lacking to fully evaluate effects of nonadherence with certain guideline recommendations on test results. Thus, it is difficult (perhaps impossible) to isolate effects of individual factors that contribute to discordance. As detailed above, these include differences in:

  • Fixing and embedding tissues, preparing and staining them for assays, or scoring and classifying test results;
  • Inherent differences in antibody binding, epitope stability, or antigen retrieval when comparing different antibodies used for IHC;
  • Different biologic mechanisms that can increase membrane HER2 protein, when comparing IHC assays versus ISH assays; or differences in sensitivity and specificity of diverse DNA probes and visualization techniques when comparing different ISH methods.

Identifying one “best” HER2 test clearly requires better comparative data than presently available, with assays that standardized key aspects of preanalytic, analytic, and postanalytic steps in HER2 assay methods.

The lack of a gold standard to determine breast tumors' HER2 status also prevents agreement on one “best” HER2 assay. Furthermore, seeking a single gold standard may be unrealistic, since HER2 status is used in different ways. The optimal assay (or combination of assays) may differ for HER2 as a prognostic marker, as a marker to predict clinical benefit from trastuzumab, or as a marker to predict benefit from a chemotherapy drug class (e.g., an anthracycline or a taxane). For example, HER2 gene amplification may best predict tumor aggressiveness hence prognosis, while membrane density of HER2 protein may best predict trastuzumab binding to tumor cells and thus clinical response. Furthermore, HER2 may only be a surrogate marker for other molecular alterations that more directly impact tumor cell sensitivity to certain chemotherapy drugs (e.g., anthracyclines).

Outcomes of well-designed and adequately powered comparative clinical trials with sufficient followup duration may be a gold standard to evaluate HER2 assays as predictors of treatment benefit. However, even the large randomized, controlled trials on adjuvant trastuzumab (Romond, Perez, Bryant, et al., 2005; Piccart-Gebhart, Procter, Leyland-Jones, et al., 2005; Slamon, Eiermann, Robert, et al., 2005; Joensuu, Kellokumpu-Lehtinen, Bono, et al., 2006) may not have adequately standardized preanalytic steps at local hospitals, did not test all patients with at least two assays, treated few patients with discordant results by different assays conducted in central laboratories; and presently lack sufficient followup to compare outcomes in subgroups of the main treatment arms (see “Results and Conclusions, Key Question 2”).

Current guidelines acknowledge present uncertainty, permit clinicians and laboratories to choose an initial HER2 assay method, and recommend confirming results with an alternative assay when initial tests are equivocal (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007).

Current Guideline Recommendations

Current guidelines recommend very similar algorithms for using well-validated IHC and ISH assays to classify breast cancer patients with respect to HER2 status (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). The algorithm shown in Figure 2, describes possible results, decision-making, and confirmatory testing when IHC is the initial test. All three guidelines agree that an IHC score of 3+ is definitively HER2 positive, a score of 0 or 1+ is definitively HER2 negative, and a score of 2+ is equivocal and requires ISH followup testing to determine HER2 status. In contrast to the other guidelines, the NCCN Task Force (Carlson, Moench, Hammond, et al., 2006) did not specify that an IHC 3+ score requires complete membrane staining in more than 30 percent of invasive cells. The ASCO/CAP expert panel recommended this change from FDA labeling (which requires staining in more than 10 percent of invasive cells), primarily to decrease the number of patients with false-positive results who might be given trastuzumab but are unlikely to benefit (Wolff, Hammond, Schwartz, et al., 2007a). This recommendation anticipates that true positives with equivocal IHC results will be correctly classified by followup ISH. However, data are currently lacking to test this hypothesis.

Figure 2. Algorithm for immunohistochemistry (IHC). (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007b).

Figure

Figure 2. Algorithm for immunohistochemistry (IHC). (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007b).

Figure 3 provides a similar algorithm if FISH is the initial test. The guidelines suggest that well-validated alternatives (CISH or SISH, currently available in the U.S. only as independently developed assays) probably can replace FISH. The algorithm considers HER2 gene copy numbers from 4.0 to 6.0 or HER2/CEP17 ratios between 1.8 and 2.2 as equivocal ISH results. It recommends additional cell counting, retesting by a reference laboratory, or followup testing by IHC before classifying equivocal cases. The other guidelines agree with this recommendation (Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). No studies reviewed for this report followed this recommendation; thus, data are lacking to determine whether confirmatory followup testing on patients with equivocal ISH results improves the accuracy of HER2 status as a predictor for treatment outcomes.

Figure 3. HER2 testing algorithm when ISH is the initial test (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007a).

Figure

Figure 3. HER2 testing algorithm when ISH is the initial test (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007a).

Importantly, the guidelines' treatment recommendations are not identical for all patients whose assay results remain in the equivocal range after additional cells are counted, a different assay method is used, and/or testing is repeated on another tumor section. The recommendation depends on whether the patient would have been included in or excluded from key randomized, controlled trials. For example, patients with HER2/CEP17 ratios 2.0 or greater but less than 2.2 were included and randomized in the adjuvant trastuzumab trials. Therefore, the guidelines view current evidence as too weak to deny such patients adjuvant therapy that includes trastuzumab. In contrast, patients with HER2/CEP17 ratios 1.8 or greater but less than 2.0 were excluded from these trials, and the guidelines view current evidence as too weak to support including trastuzumab in their adjuvant therapy regimens. Figures 2 and 3 include information on trial eligibility of patients whose test results are equivocal by each HER2 assay.

Interestingly, a recent study reported on 17 patients with breast core biopsy specimens showing invasive carcinoma and equivocal FISH results (HER2/CEP17 ratios between 1.8 and 2.2) (Striebel, Bhargava, Horbinski, et al., 2008). These patients were subsequently re-evaluated by IHC and FISH testing on resection specimens. For 10 of the 17 cases, equivocal results obtained with biopsy specimens were definitively resolved by retesting of resection specimens. Four patients were classified HER2 positive and treated with trastuzumab, while six were classified HER2 negative and managed without trastuzumab.

Other recommendations in the ASCO/CAP guideline focus on good laboratory practices for each preanalytic, analytic, and postanalytic step of IHC and ISH assays (Wolff, Hammond, Schwartz, et al., 2007a). They provide a more explicitly detailed set of recommendations than included in the other two guidelines (Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Table 5 reprints the summary of recommendations from the ASCO/CAP guideline. The remainder of this narrative review for Key Question 1 summarizes evidence published after these guidelines on the following four topics, and discusses unresolved issues and uncertainties:

  • Concordance and discordance of different assay methods
  • Discordance between central and local laboratory results
  • Validation and proficiency testing
  • Reports on polysomy 17

Evidence Reported Post-ASCO/CAP Guidelines on Concordance and Discrepancy of HER2 Assay Results

Concordance/discordance of different assay methods. Evidence reviewed by the ASCO/CAP expert panel (Appendixes C and G in Wolff, Hammond, Schwartz, et al., 2007a) led to consensus definitions for unequivocal IHC and ISH results. As shown in Figures 2 and 3, and in Table 5, the panel defined unequivocal HER2-positive results by IHC (i.e., 3+) as greater than 30 percent of invasive cells strongly stained in a homogeneous, circumferential “chicken-wire” pattern, and by ISH as HER2 gene copy number per cell greater than 6 or HER2/CEP17 ratio greater than 2.2. They defined unequivocal HER2-negative results by IHC as scores of 0 or 1+, and by ISH as HER2 gene copy number per cell less than 4.0 or HER2/CEP17 ratio less than 1.8. Equivocal results (defined as 2+ by IHC, HER2 gene copy number from 4.0 to 6.0, or HER2/CEP17 ratio from 1.8 to 2.2) probably imply low-level HER2 amplification and/or overexpression, and should not be considered discordant, whether results of followup testing are positive or negative. Some but not all of these samples may actually have an amplified HER2 gene, but require additional testing to define the patient's correct HER2 status. The ASCO/CAP expert panel found insufficient evidence to determine whether breast cancer patients with equivocal HER2 results benefit from HER2-targeted therapy, although as discussed above, some patients included in adjuvant trastuzumab trials fit this category (also see “Results and Conclusions, Key Question 2”).

For purposes of this review, discordant results are operationally defined as unequivocally positive results by one assay method and unequivocally negative results by a different assay method on sections from the same tumor, with both assays conducted using good laboratory practices, as recommended in the ASCO/CAP guideline (Wolff, Hammond, Schwartz, et al., 2007a). Presently, evidence is lacking to estimate discordance rates from studies that followed all ASCO/CAP recommendations on tissue preparation, testing practices, scoring systems, and thresholds to classify HER2 status of breast cancer patients. Therefore, in the following sections, we summarize evidence on discordance rates reported after the guideline was published by studies that used scoring systems and thresholds similar to those originally specified in U.S. Food and Drug Administration (FDA) -approved kits for IHC and ISH assays.

Investigators from the National Surgical Adjuvant Breast and Bowel Project's (NSABP) central pathology laboratory and colleagues at NSABP-approved reference laboratories conducted IHC (HercepTest™) and FISH (PathVysion®) assays on formalin fixed, paraffin embedded tumor blocks (Paik, Kim, Jeong, et al., 2007; Paik, Kim, and Wolmark, 2008). They reported results with both assays for 1,787 of 2,043 patients enrolled in the NSABP B31 randomized, controlled trial on adjuvant therapy with versus without trastuzumab (Romond, Perez, Bryant, et al., 2005). Of these, they found FISH-negative, IHC 3+ discordant results in 31 cases (1.7 percent). They also reported FISH-positive, IHC 0, 1+, or 2+ results in another 125 cases (7 percent), but did not separately report the proportion of those who tested FISH positive and IHC 0 or 1+.

Central and reference laboratory results with both IHC (HercepTest™) and FISH (PathVysion®) assays also are available (Perez, Romond, Suman, et al., 2007) for 1,779 of the 2,535 patients registered in a similar randomized, controlled trial conducted by the North Central Cancer Treatment Group (NCCTG N9831; Romond, Perez, Bryant, et al., 2005). Investigators reported discordant IHC 3+, FISH-negative results in 53 cases (3 percent), and FISH-positive, IHC 0, 1+, or 2+ results in 218 cases (12.3 percent). Here again, separate results were not reported for the proportion who tested FISH positive and IHC 0 or 1+. Data presently are unavailable on IHC/ISH discordance rates from three other randomized, controlled trials of adjuvant trastuzumab (Piccart-Gebhart, Procter, Leyland-Jones, et al., 2005; Slamon, Eiermann, Robert, et al., 2005; Joensuu, Kellokumpu-Lehtinen, Bono, et al., 2006).

In a retrospective study, a Canadian central reference laboratory used HercepTest™ and three other HER2 antibody IHC assays to retest tumors from patients diagnosed with metastatic breast cancer between 1999 and 2002, and compared the IHC results with central lab FISH using PathVysion® (O'Malley, Thomson, Julian, et al., 2008). Among 505 patients initially classified HER2 positive by IHC in local labs and treated with trastuzumab for metastatic disease, concordance between central IHC and central FISH ranged from 88.9 percent to 90.9 percent, depending on the HER2 antibody used. Concordance between IHC and FISH was highest (92.2 percent) when all four HER2 antibody assays were used to test each sample, and tumors were only classified IHC positive if positive by 2 or more assays. In a sequential sample of 205 invasive breast tumors locally classified IHC negative, from patients diagnosed with metastasis, concordance of central IHC and central FISH ranged from 93.7 percent to 99 percent for individual antibody assays, and was 98.1 percent if tumors were only classified IHC negative if negative by 2 or more assays. However, this study did not report FISH/IHC discordance rates separately by IHC score.

A study from Greece that separately compared IHC results (using HercepTest™ and two other methods) from central and regional laboratories versus central FISH (PathVysion®) reported on 375 breast tumors tested centrally by IHC and FISH (Papadopoulos, Kouvatseas, Skarlos, et al., 2007). FISH-positive, IHC 0/1+ discordances were seen in six cases (1.6 percent; 11.5 percent of 52 IHC 0/1+ cases), while FISH-negative, IHC 3+ discordances were seen in three cases (0.8 percent; 9.4 percent of 32 IHC 3+ cases). Another study from three Greek hospitals compared IHC results (CB11 antibody) with FISH (PathVysion®) for 194 resected breast cancer patients, and also with CISH (SpoT-Light) for 159 of these patients (Kostopoulou, Vageli, Kaisaridou, et al., 2007). This study reported no FISH-positive cases and only one CISH-positive case among 94 IHC 0/1+ patients. Of 30 patients with IHC 3+ results, one (3.3 percent) was FISH negative and CISH negative.

A study from Germany on patients evaluated for inclusion in a trial of trastuzumab for metastatic breast cancer reported central IHC (HercepTest™) and FISH (PathVysion®) results for 289 patients (Hofmann, Stoss, Gaiser, et al., 2008). Investigators reported no FISH-positive cases among 100 patients scored IHC 0/1+, and nine FISH-negative but IHC 3+ cases (8.4 percent of 107 scored IHC positive; 3.1 percent of all patients evaluated).

A small study (n=55) compared two dual-probe (i.e., for HER2 and CEP17) FISH kits (PathVysion® and HER2 FISH pharmDx), a single-probe FISH kit (Inform; HER2 only) and the SpoT-Light CISH kit versus two IHC assays (HercepTest™ and an independently developed test) (Cayre, Mishellany, Lagarde, et al., 2007). Investigators reported results with each assay (and with different positivity thresholds for Inform and SpoT-Light) separately for each sample. Four of 55 (7.3 percent) cases tested IHC 3+ with HercepTest™ and ISH-negative by all assays (other than a threshold of more than four signals for Inform). Three of the four were scored less than 3+ by independently developed IHC. All cases scored FISH positive by two or more kits also were scored IHC 3+ by HercepTest™.

Another small study (n=54) used the HercepTest™ and PathVysion® kits on all samples (Kuo, Wang, Chang, et al., 2007). Three cases (5.6 percent) that tested FISH negative were scored 3+ by IHC. In contrast, no cases that tested FISH positive were scored IHC 0 or 1+.

A systematic review abstracted data from 17 studies (all published before the ASCO/CAP guideline; pooled N=8,419) on FISH/IHC concordance (Dendukuri, Khetani, McIsaac, et al., 2007). Selection criteria sought studies that included consecutive patient series or a random sample, reported agreement between IHC and FISH using standard thresholds, and used assays licensed in Canada to select patients for trastuzumab therapy. All studies used PathVysion® for FISH; 16 used HercepTest™ and one used PATHWAY™ for IHC. Ten combined results for patients scored IHC 0 or 1+, and separately for those scored IHC 2+ or 3+ (pooled N=4,641); seven reported results separately for each IHC score (pooled N=3,778). Using Bayesian meta-analysis, they estimated proportions of breast cancer patients with each of the four possible IHC scores and proportions with each IHC score with positive results by FISH. Table 6 summarizes estimated IHC/FISH discordance rates based on results of the Dendukuri and coworkers' meta-analysis.

Table 6. Estimated discordance rates from meta-analysis of 17 studies on IHC and FISH.

Table 6

Estimated discordance rates from meta-analysis of 17 studies on IHC and FISH.

Three small studies (combined N=211) conducted outside North America compared results of different ISH methods. An Australian study on 49 breast cancer samples reported that each case (n=20) scored highly positive (greater than 10 signals/cell) by FISH, and seven of 10 cases scored low-positive (5–10 signals/cell) by FISH, also scored positive by CISH (Bilous, Morey, Armes, et al., 2006). Each sample scored IHC 3+ by HercepTest™ also tested CISH positive. A study from Germany reported agreement in 95 of 99 breast tumor samples tested by FISH (PathVysion®) and SISH, an overall concordance of 96 percent (Dietel, Ellis, Hofler, et al., 2007). Finally, a study from Poland compared FISH, CISH, and SISH on 63 breast tumor specimens selected for 2+ or 3+ staining by IHC (Sinczak-Kuta, Tomaszewska, Rudnicka-Sosin, et al., 2007). Investigators reported and interpreted multiple statistical tests (Pearson chi-square tests with p<0.01; gamma correlation coefficients of 0.89 to 0.96; Spearman rank correlation coefficients of 0.70 to 0.79; and Kappa coefficients of 0.38 to 0.58) for separate two-way comparisons of assay results (i.e., CISH versus FISH, FISH versus SISH, and SISH versus CISH) as evidence for good agreement between the methods, but did not report concordance or discordance rates. Larger studies are needed to estimate more reliably rates of concordance and discordance between FISH or IHC and newer ISH methods (CISH, SISH). Furthermore, FDA-approved kits for CISH or SISH are not yet available.

To summarize, evidence from seven studies and a meta-analysis reported after the ASCO/CAP guideline (Wolff, Hammond, Schwartz, et al., 2007a) suggests variable but perhaps non-negligible rates for FISH-negative, IHC 3+ discordance (albeit by the older definition of strong, complete membrane staining in greater than 10 percent of invasive cells), ranging from 0.5 percent to 7.3 percent of breast cancer cases. The meta-analysis also estimated that 0.6 percent (95 percent CI: 0.1–1.3 percent) of cases might be scored IHC 0 and FISH positive, while 1.8 percent (95 percent CI: 0.8–3.0 percent) of cases might be scored IHC 1+ and FISH positive. However, data are unavailable to estimate discordance rates for either group using the current ASCO/CAP definition of IHC 3+ (greater than 30 percent of invasive cells stained).

Disagreement between central and local laboratory results. Evidence reviewed by the ASCO/CAP expert panel demonstrated disagreement between central and local laboratory HER2 test results in approximately 20 percent of cases (Wolff, Hammond, Schwartz, et al., 2007a). This included data from the first 104 patients registered for NSABP B31, showing disagreement in 18 percent of cases (Paik, Bryant, Tan-Chiu, et al., 2002), which resulted in a protocol amendment limiting HER2 testing to 23 approved laboratories. The evidence also included data from NCCTG N9831 showing agreement in 88.1 percent of 813 cases rated FISH positive, 81.6 percent of 1,063 cases scored IHC 3+ by HercepTest™, and 75.0 percent of 636 cases scored IHC 3+ by non-HercepTest™ assays (Perez, Suman, Davidson, et al., 2006). Finally, it included data from a community-based clinical study on trastuzumab for metastatic breast cancer showing 77 percent agreement on samples scored IHC 3+ by local laboratories, but only 26 percent agreement on samples locally scored IHC 2+ (Reddy, Reimann, Anderson, et al., 2006). Based on the available evidence, the panel recommended specific measures for assay validation, self-assessment, accreditation, and proficiency testing by laboratories conducting HER2 assays. In the following section, we summarize new evidence comparing local versus central laboratory results, published since the ASCO/CAP review. Although published after the ASCO/CAP guideline, these studies preceded the guideline and scored samples as originally recommended by manufacturers and FDA labeling.

Final data from NSABP B31 showed disagreement on HER2 status in 174 of 1,787 cases (9.7 percent) classified HER2 positive by local laboratories but HER2 negative by both FISH (PathVysion®) and IHC assays in central or reference laboratories (Paik, Kim, Jeong, et al., 2007; Paik, Kim, and Wolmark, 2008). Data presently are unavailable on rates of disagreement between local and central laboratories from three other randomized, controlled trials of adjuvant trastuzumab (Piccart-Gebhart, Procter, Leyland-Jones, et al., 2005; Slamon, Eiermann, Robert, et al., 2005; Joensuu, Kellokumpu-Lehtinen, Bono, et al., 2006).

A small study compared central and local laboratory IHC results on breast tumor samples initially scored IHC 2+ locally and found FISH positive after referral for central laboratory confirmation (Barrett, Magee, O'Toole, et al., 2007). Investigators reported that of 153 IHC 2+ cases referred to the central laboratory for FISH confirmation, 29 (19 percent) had amplified HER2 genes. With repeat IHC in 25 of the 29, the central laboratory scored 18 cases (72 percent) as IHC 3+ and agreed with the local laboratory score of IHC 2+ in only 7 cases (28 percent). Since the central laboratory did not repeat IHC testing for the 124 cases with nonamplified HER2 genes by FISH, the overall rate of agreement with local results cannot be determined.

A larger study compared IHC results in local (regional) and central laboratories (Papadopoulos, Kouvatseas, Skarlos, et al., 2007). Of 458 available samples, 369 were tested by IHC both regionally and centrally and scores agreed for 296 (80.2 percent). Disagreement was greatest among samples (n=11) scored IHC 3+ by regional laboratories (63 percent concordance). Concordance was better among those (n=20) scored IHC 0 or 1+ and those scored IHC 2+ (n=338) at regional laboratories (85 percent and 80 percent, respectively).

A central reference laboratory analyzed tumor specimens from 315 of 399 (79 percent) patients randomized to capecitabine with or without lapatinib, using both IHC (antibody not reported) and FISH (PathVysion®), seeking confirmation of local laboratory results that classified these patients HER2 positive thus eligible for this randomized, controlled trial (Cameron, Casey, Press, et al., 2008). Central testing found 241 of 315 (77 percent) HER2 positive, including 211 with IHC 3+ results and 30 with IHC 2+, FISH-positive results.

In the Canadian study cited previously, central laboratory testing of breast tumor tissue samples confirmed the IHC-positive status of 79.3 percent to 89.6 percent of 505 cases found IHC positive by local laboratory results (O'Malley, Thomson, Julian, et al., 2008). Among 205 cases found IHC negative by local labs, central IHC testing confirmed local results in 94.8 percent to 100 percent of cases. The concordance rates varied, depending on which of four IHC assays the central laboratory used.

To summarize, data reported after publication of the ASCO/CAP guideline (Wolff, Hammond, Schwartz, et al., 2007a) confirm the estimate of approximately 20 percent disagreement between local (or regional) and central laboratories with respect to HER2 assay results. Data are presently lacking to evaluate the effects of adherence to guideline recommendations for preanalytic, analytic, and postanalytic steps on rates of local/central disagreement.

Validation and proficiency testing. Since these issues are outside the scope of this evidence report, interested readers are referred to current guidelines for specific recommendations on best practices to validate assays and test laboratory proficiency (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Evidence reviewed by the expert panel included a summary of results from 2004 and 2005 surveys of laboratories participating in CAP-sponsored interlaboratory comparisons of IHC results, using tissue microarrays as the test material (Fitzgibbons, Murphy, Dorfman, et al., 2006). The key finding was that 97 of 102 laboratories (95 percent) in 2004 and 129 of 141 laboratories (91 percent) in 2005 correctly scored 90 percent or more of the test cases. In the following section, we briefly summarize evidence published after the ASCO/CAP guideline. Again, these studies scored samples as originally recommended by manufacturers and FDA labeling.

An international study compared five pathology reference centers (from Netherlands, Canada, France, Belgium, and Germany) on assay scoring and HER2 status classification for separate samples tested by IHC (n=20) or by FISH (n=20) (Dowsett, Hanna, Kockx, et al., 2007). Agreement was uniform among centers on HER2 status classifications for all 20 IHC test cases, although some scoring differences were noted, and some equivocal cases (i.e., those scored IHC 2+) required FISH confirmation to determine HER2 status. Agreement was uniform among centers 16 of 20 (80 percent) FISH test cases. Each of the other four cases was scored in the equivocal range (HER2/CEP17 ratio 1.7–2.3).

A similar international study (from Netherlands, Australia, Canada, France, and Germany) compared results from five central laboratories on 211 breast cancer specimens tested by CISH, FISH and IHC (van de Vijver, Bilous, Hanna, et al., 2007). Each central laboratory sent unstained sections from samples they tested to four other (“outside”) central laboratories. Investigators reported uniform agreement by CISH in the “outside” laboratories on 73 of 76 cases (96 percent) scored highly amplified (HER2/CEP17 greater than 4.0) by FISH in the initial laboratory. Similarly, “outside” CISH uniformly agreed with 94 of 100 (94 percent) cases initially scored as not amplified by FISH (HER2/CEP17 less than 2.0). Among 35 cases scored as equivocal by initial FISH testing (HER2/CEP17 2.0–4.0), 20 were scored as CISH positive and 15 were scored as CISH negative. Overall interlaboratory concordance was 95 percent for cases with normal HER2 gene copy number (1–5) and was 92 percent for cases with 6 or more copies of the HER2 gene.

A brief report by investigators from the Italian Network for Quality Assessment of Tumor Biomarkers (INQUAT) and the United Kingdom National External Quality Assessment Service (U.K. NEQAS) highlighted the importance of including both preanalytic and analytic steps in proficiency testing programs (Paradiso, Miller, Marubini, et al., 2007). The U.K. NEQAS program for HER2 testing focuses on preanalytic aspects of the IHC assay, while the INQUAT program focuses on intra- and interlaboratory variability in scoring a set of fixed and stained IHC slides. Twelve Italian laboratories participated in both quality control programs during 2003, and only one achieved high-quality performance in preanalytic processing steps and in intra- and interlaboratory reproducibility. Some laboratories that achieved high-quality performance in preanalytic steps did not score slides reproducibly, or vice versa. Three of the 12 laboratories did not perform adequately on either preanalytic or analytic steps.

A recent study covalently attached fixed and unfixed samples of synthetic HER peptide to glass microscope slides with unstained sections of invasive breast carcinomas (Vani, Sompuram, Fitzgibbons, et al., 2008). The peptide fragments were used as positive analyte controls on slides distributed to 192 laboratories participating in the CAP 2006 HER2-B proficiency testing survey. Stained slides were returned and centrally reviewed (n=109 laboratories), permitting participants to evaluate sources of variability in HER2 staining performance. Investigators reported suboptimal staining in 20 of 109 slides (18.3 percent). Of these, seven cases (35 percent of the 20 failures) were attributable to errors in the antigen retrieval step, four (20 percent) were attributable to problems with the antibody staining protocol, and nine (45 percent) had problems with both.

In summary, two studies published subsequent to the ASCO/CAP review (Wolff, Hammond, Schwartz, et al., 2007a) reported similar results on interlaboratory comparisons. Overall, the available evidence shows 90 percent or greater agreement between high-volume reference laboratories in North America, Europe, and Australia. Scoring differences between laboratories occur most often with cases of low-level amplification or low-level overexpression. Results reported before and after the ASCO/CAP review (and other guidelines) support considering such cases as equivocal results, with confirmatory testing needed to classify HER2 status. Collaborative data from Italy and the United Kingdom suggest that quality control programs must evaluate all steps (preanalytic, analytic, and postanalytic) in HER2 testing. Positive analyte controls confirmed that antigen retrieval and antibody staining are persistent sources of interlaboratory variability in IHC results.

Reports on polysomy 17. The ASCO/CAP expert panel (Wolff, Hammond, Schwartz, et al., 2007a) interpreted evidence from two studies (Downs-Kelly, Yoder, Stoller, et al., 2005; Ma, Lespagnard, Durbecq, et al., 2005) as not supporting an association of polysomy 17 (defined as three or more copies of CEP 17) with HER2 protein or mRNA overexpression. However, one of these (Ma, Lespagnard, Durbecq, et al., 2005) reported increased HER2 protein (IHC 3+) in a subset of patients with polysomy 17 and HER2/CEP 17 ratios less than 2. In the following section, we summarize evidence published subsequent to the ASCO/CAP guideline.

Nine studies have reported data on polysomy 17 and HER2 status of breast cancer patients since the ASCO/CAP review. Of these, seven have been published in full (Dal Lago, Durbecq, Desmedt, et al., 2006; Torrisi, Rotmensz, Bagnardi, et al., 2007; Corzo, Bellosillo, Corominas, et al., 2007; Beser, Tuzlali, Guzey, et al., 2007; Hyun, Lee, Kim, et al., 2008; Kostopoulou, Vageli, Kaisaridou, et al., 2007; Hofmann, Stoss, Gaiser, et al., 2008) and two were reported at meetings with slides or video available on line (Kaufman, Broadwater, Lezon-Geyda, et al., 2007; Reinholz, Jenkins, Hillman, et al., 2007). Three studies reported no association of polysomy 17 with HER2 protein and/or mRNA overexpression (Dal Lago, Durbecq, Desmedt, et al., 2006; Torrisi, Rotmensz, Bagnardi, et al., 2007; Corzo, Bellosillo, Corominas, et al., 2007). In contrast, five other studies reported increased levels of HER2 protein in some cases with polysomy 17 and unamplified HER2 genes (Hyun, Lee, Kim, et al., 2008; Kaufman, Broadwater, Lezon-Geyda, et al., 2007; Reinholz, Jenkins, Hillman, et al., 2007; Kostopoulou, Vageli, Kaisaridou, et al., 2007; Hofmann, Stoss, Gaiser, et al., 2008). The ninth study did not report data on overexpression of HER2 protein or mRNA; this study reported chromosome 17 polysomy in two of 11 patients with HER2 gene amplification and in seven of 39 patients with unamplified HER2 genes (Beser, Tuzlali, Guzey, et al., 2007). In one study (Hofmann, Stoss, Gaiser, et al., 2008), seven of nine discordant IHC 3+/FISH-negative patients had chromosome 17 polysomy, and six of 26 patients with polysomy 17 responded to trastuzumab therapy for metastatic disease. However, all six responders were scored 3+ by IHC.

In contrast to conclusions of the ASCO/CAP review (Wolff, Hammond, Schwartz, et al., 2007a), evidence published subsequently reopens the question of whether chromosome 17 polysomy has implications for classifying patients' HER2 status. Five of eight new studies found polysomy 17 to be associated with protein (and/or mRNA) overexpression in at least some patients with nonamplified HER2 genes, while three of eight found no association.

Implications for Remainder of this Report

Discordances between IHC and FISH results might arise in one of three ways. They may be artifacts of one accurate and one inaccurate test. Alternatively, they may reflect a threshold issue, either related to the changes in threshold definitions over time, or an inherent problem of using a continuous measure to classify patients dichotomously. Finally, discordant test results might accurately reflect a small number of different patients with respect to the biologic mechanism that increases membrane levels of the HER2 protein. Present data could not tease apart the many factors reviewed here (preanalytic, analytic and postanalytic) that might have contributed to discordances in HER2 assay results. This clearly affects the interpretation of evidence on key questions that address use of “HER2 status” to predict treatment outcomes, even in nonbreast malignancies (Key Questions 2, 3, and 5). Furthermore, it also affects interpretation of evidence on the added clinical utility of serum measurements for patients with known tissue status, since this presumes accurate classification by tissue assays. Future studies reporting outcomes as a function of HER2 status should report separately on patients with concordant, equivocal, and discordant assay results.

Key Question 2

For patients who are not unequivocally HER2 positive, what is the evidence on outcomes of treatment targeting the HER2 molecule (trastuzumab, etc.), or on differences in outcomes of uniform chemotherapy or hormonal therapy regimens with versus without additional treatment targeting the HER2 molecule, in:

a)

Breast cancer patients characterized by equivocal or discordant HER2 results from different tissue assay methods performed adequately; and

b)

For those with HER2-negative breast cancer?

Study Selection

The search strategy for studies on HER2 testing in breast cancer yielded 3,218 citations. Initial review selected 74 citations potentially relevant to Key Question 2 for retrieval and review as full articles. We used the ASCO/CAP expert panel's definition (Wolff, Hammond, Schwartz, et al., 2007a) of equivocal HER2 assay results: IHC 2+, or HER2 gene copy number from 4.0 to 6.0 or HER2/CEP17 ratio from 1.8 to 2.2 if ISH is the first or only assay. We defined discordant results as unequivocally positive results by one assay method (i.e., IHC 3+, HER2 gene copy number greater than 6.0, or HER2/CEP17 ratio greater than 2.2) and unequivocally negative results by a different assay method on another tissue section from the same tumor. Four trials (eleven reports; see Table 7 and “Available Studies” for citations) met selection criteria for data abstraction and compared outcomes with versus without a drug targeting HER2, for breast cancer patients with equivocal, discordant, or unequivocally negative HER2 assay results. Three trials randomized patients to chemotherapy with versus without trastuzumab; the fourth randomized patients to chemotherapy with versus without lapatinib, a tyrosine kinase inhibitor active against HER1 and HER2. Trials and their results are summarized in Tables 79; detailed abstraction data can be found in Appendix Tables II-AII-I *.

Table 7. Summary study design, treatment, patient characteristics, KQ2.

Table 7

Summary study design, treatment, patient characteristics, KQ2.

Table 9. Summary tumor response, KQ2.

Table 9

Summary tumor response, KQ2.

Available Studies and Reports

Table 7 includes two trials on adjuvant trastuzumab with data for Key Question 2 (NSABP B31 and NCCTG N9831). Each reported post-hoc analyses on interim results for small subgroups of resected breast cancer patients inadvertently randomized to chemotherapy with or without trastuzumab in trials seeking to randomize only HER2-positive patients. Similarly, a trial on chemotherapy with or without lapatinib for locally advanced or metastatic disease (EGF100151) also intended to randomize only HER2-positive patients (Cameron, Casey, Press, et al., 2008; Geyer, Forster, Lindquist, et al., 2006). In each of these trials, local laboratory HER2 testing initially classified all randomized patients as HER2 positive. However, central or reference laboratory retests subsequently identified small subsets as equivocal, discordant, or HER2 negative. Only one trial (CALGB 9840) intentionally randomized HER2-negative metastatic breast cancer patients (referred to as “HER2 non-overexpressors” by study authors), and directly tested whether adding trastuzumab to chemotherapy improved outcomes.

One trial on trastuzumab in adjuvant therapy (NSABP B31) reported data on post-hoc subgroup analyses in a brief published communication (Paik, Kim, and Wolmark, 2008). Another adjuvant trastuzumab trial (NCCTG N9831) compared local, central, and reference laboratory results of HER2 testing in a published article that did not report outcomes (Perez, Suman, Davidson, et al., 2006). Both trials reported subgroup outcomes in meeting abstracts, with slides available online (B31: Paik, Kim, Jeong, et al., 2007; N9831: Perez, Romond, Suman, et al., 2007, and Reinholz, Jenkins, Hillman, et al., 2007). A single, published report provided baseline characteristics and preliminary outcomes data for patients randomized to treatment arms common to B31 and N9831 (Romond, Perez, Bryant, et al., 2005). Data were reported in this publication on each trial separately and both trials combined.

Two trials on patients with advanced or metastatic disease published full reports with subgroup analyses (Seidman, Berry, Cirrincione, et al., 2008; Cameron, Casey, Press, et al., 2008). The EGF100151 trial on chemotherapy with or without lapatinib (Cameron, Casey, Press et al., 2008) also published an earlier report (Geyer, Forster, Lindquist et al., 2006), but without results of repeat HER2 testing by a central or reference laboratory or analyses relevant to Key Question 2. CALGB 9840, the only preplanned analysis relevant to this key question, is on a HER2-negative (i.e., non-overexpressor) subgroup randomized to chemotherapy with or without trastuzumab within a larger trial studying an unrelated question (Seidman, Berry, Cirrincione, et al., 2004, 2008). CALGB 9840 also is the source of all patients in the subgroup analyzed post-hoc in CALGB 150002 (Kaufman, Broadwater, Lezon-Geyda, et al., 2007).

Treatments and Subgroups Compared

Adjuvant therapy. Two trials (NSABP B31, NCCTG N9831) investigated outcomes of adjuvant doxorubicin plus cyclophosphamide (AC; every three weeks for four cycles), followed by paclitaxel (P; every three weeks for four cycles), with versus without trastuzumab (+/-TRZ; weekly for 12 months, beginning concurrently with paclitaxel) in women with fully resected early breast cancer. Outcomes are as-yet unreported for a third arm of N9831, which began trastuzumab therapy after all eight cycles of chemotherapy (AC→P→TRZ). Both B31 and N9831 limited eligibility to HER2-positive patients, defined as FISH-positive/IHC unknown, IHC3+/FISH-unknown, or IHC2+/FISH-positive. Patients were initially evaluated by local laboratory testing, and randomized if classified HER2-positive by these results. They were subsequently re-evaluated by central laboratory testing, but continued with assigned treatments regardless of results. A planned interim analysis at two years' median followup (2.4 years for B31 patients; 1.5 years for N9831 patients) for all patients randomized to the treatment arms common to both trials, pooled patients assigned to the control arms(n=1,679; AC→P) and those assigned to concurrent trastuzumab (n=1,672; AC→P+TRZ) (Romond, Perez, Bryant, et al., 2005). Trastuzumab significantly improved overall survival (OS) at four years: 91.4 percent versus 86.6 percent; hazard ratio (HR) =0.67; 95 percent CI: 0.48–0.93; p=0.015. The B31 (Paik, Kim, and Wolmark, 2008; Paik, Kim, Jeong, et al., 2007) and N9831 (Perez, Romond, Suman, et al., 2007 and Reinholz, Jenkins, Hillman et al., 2007) results included here were unplanned, post-hoc analyses. They compared outcomes of adjuvant AC→(P+/-TRZ) in subgroups found HER2 discordant or negative by central lab results, using data collected for the pooled analysis of Romond, Perez, Bryant, et al. (2005) without longer followup.

Advanced/metastatic disease. A randomized, controlled trial (CALGB 9840) that studied paclitaxel in women receiving first- or second-line therapy for metastatic breast cancer reported outcomes at two meetings (Seidman, Berry, Cirrincione, et al., 2004; Kaufman, Broadwater, Lezon-Geyda, et al., 2007) and in a published article (Seidman, Berry, Cirrincione, et al., 2008). Primary randomization in this trial compared once-weekly to every-third-week paclitaxel dosing regimens. Testing for HER2 status began after enrolling the first 171 patients, and HER2-negative patients (termed “HER2 non-overexpressors” by study authors and defined as 0 or 1+ or IHC 2+/FISH negative by local laboratory tests) were also randomized to treatment with or without trastuzumab. Seidman, Berry, Cirrincione, et al. (2004, 2008) reported outcomes for this second randomization without separating results by paclitaxel treatment frequency. HER2-positive patients (by local laboratory tests) all received trastuzumab and are excluded from the analysis for Key Question 2.

For all patients randomized (n=735), CALGB 9840 investigators first reported that response rate and time to progression (TTP) were better with weekly paclitaxel than with every third week, although the difference in median OS (24 versus 16 months; HR=1.19, p=0.17) was not statistically significant (Seidman, Berry, Cirrincione, et al., 2004). As prespecified in the CALGB 9840 protocol, the final analysis (Seidman, Berry, Cirrincione, et al., 2008) comparing paclitaxel schedules pooled additional patients (n=158) randomized to the identical dose of paclitaxel every third week (all without trastuzumab) in another trial (CALGB 9342; Winer, Berry, Woolf et al., 2004) with those randomized to this schedule in CALGB 9840. In this combined analysis, weekly paclitaxel statistically significantly improved response rate (42 percent versus 29 percent; OR=1.75, p=0.0004), TTP (median, nine versus five months; HR=1.43, p<0.0001), and OS (median, 24 versus 12 months; HR=1.28, p=0.0092), when compared with treatment every third week. Data in Table 8 on HER2 non-overexpressors exclude patients from CALGB 9342.

Table 8. Summary time to event outcomes, KQ2.

Table 8

Summary time to event outcomes, KQ2.

A post-hoc analysis on HER2 non-overexpressors randomized to paclitaxel with versus without trastuzumab in CALGB 9840 compared outcomes for subsets found FISH negative by central laboratory testing who had or did not have chromosome 17 polysomy (CALGB 150002; Kaufman, Broadwater, Lezon-Geyda, et al., 2007). This analysis was not included in the published final report (Seidman, Berry, Cirrincione, et al., 2008). It also did not include patients from CALGB 9342, none of whom were randomized to paclitaxel with or without trastuzumab.

The EGF100151 trial randomized patients with locally advanced or metastatic breast cancer to capecitabine (1 g/m2 twice daily for 14 days every three weeks) plus lapatinib (1.25 g/m2 daily) or to capecitabine alone (1.25 g/m2 twice daily for 14 days every three weeks). Eligibility required: a T4 primary tumor and stage IIIB or IIIC disease, for those without distant metastasis; a history of progressive disease after one or more regimens that included an anthracycline, a taxane, and trastuzumab (given separately or in combinations); and local laboratory HER2 test results of IHC3+ or IHC2+/FISH positive. An interim analysis on 163 patients randomized to capecitabine plus lapatinib and 161 randomized to capecitabine monotherapy reported median TTP was 8.4 months in the combination arm and 4.4 months in the capecitabine monotherapy arm (HR=0.49; 95 percent CI: 0.34–0.71, p<0.001) (Geyer, Forster, Lindquist, et al., 2006). A second report included more patients (n=198, capecitabine plus lapatinib; n=201, capecitabine monotherapy; Cameron, Casey, Press, et al., 2008). By intent-to-treat analysis, median TTP was 6.2 months in the combined arm and 4.3 months in the monotherapy arm (HR=0.57; 95 percent CI: 0.43–0.77, p<0.001). A second interim analysis for OS found 28 percent had died (median OS, 15.6 months) in the combined therapy arm and 32 percent had died (median OS, 15.3 months) in the capecitabine monotherapy arm (HR=0.78; 95 percent CI: 0.55–1.12; p=0.177); followup for survival continues. Central laboratory IHC and FISH retesting of samples from 300 (75 percent) of the 399 randomized in this trial identified small subgroups with HER2-discordant or -negative results (Table 7).

Study Quality

Only one of four included trials (CALGB 9840) stratified randomization by HER2 status, the most informative evidence level defined in this report's study design hierarchy (see Methods, Table 3). The others are post-hoc analyses of treatment effects in HER2-discordant or -negative subgroups from larger randomized, controlled trials. One trial on adjuvant trastuzumab (NSABP B31) and both trials on patients with metastatic or advanced disease (CALGB 9840 and EGF100151) included multivariate analyses. However, neither CALGB 9840 nor EGF100151 used multivariate analysis to adjust treatment outcomes in HER2 discordant or HER2 negative subgroups. Since these subgroups from each study are small and underpowered, and since results from three of four trials are interim analyses with limited followup, we did not assess study quality using the checklist derived from REMARK and other sources (see “Methods”).

Patient Characteristics

Adjuvant therapy. Patients from B31 and N9831 were initially randomized based on positive results of local lab testing, given their assigned regimen, and followed on these randomized, controlled trials. Those in subgroups included here subsequently were reclassified HER2 discordant or HER2 negative by central laboratory results. Baseline patient characteristics and prognostic factors (Table 7) were reported for all patients randomized to each treatment arm in each trial (Romond, Perez, Bryant, et al., 2005), including those classified as HER2 positive by both local and central laboratory results. At the level of initial randomization, baseline characteristics and prognostic factors of the groups treated with versus without trastuzumab were similar. However, data were not reported to separately compare baseline characteristics and prognostic factors by treatment arm for each subgroup of HER2-discordant or -negative patients (by central laboratory results).

Data are available from B31 for two HER2-discordant groups:

  • FISH positive/IHC 0, 1+, or 2+: n=56 +TRZ; n=69 -TRZ (data not reported separately for FISH-positive, IHC 0, 1+ subset)
  • FISH negative/IHC 3+: n=10 +TRZ; n=21 -TRZ;

and for two (partially overlapping) HER2-negative groups:

  • FISH negative/IHC 1+ or 2+: n=69 +TRZ; n=80 -TRZ
  • FISH negative/IHC 0, 1+, or 2+: n=82 +TRZ; n=92 -TRZ (13 and 12 patients per arm added to the 69 and 80 in the arms above).

Data are available from N9831 for two HER2-discordant groups:

  • FISH positive/IHC 0, 1+, or 2+: n=123 +TRZ; n=95 -TRZ (data not reported separately for FISH-positive, IHC 0, 1+ subset)
  • FISH negative/IHC 3+: n= 23 +TRZ; n=30 -TRZ;

and for one HER2-negative group:

  • FISH negative/IHC 0, 1+, or 2+: n=59 +TRZ; n=44 -TRZ.

Advanced/metastatic disease. Patients in CALGB 9840 had metastatic disease undergoing first- or second-line therapy. All were randomized to weekly or every third week paclitaxel, and those who were HER2 negative (IHC 2+/FISH negative or IHC 0 or 1+) by local laboratory results were simultaneously randomized to receive (n=113) or not receive (n=115) trastuzumab. The analysis pooled outcomes in the HER2-negative arms for patients given paclitaxel weekly or every third week. Subsequent analyses (CALGB 150002) compared outcomes separately for subgroups from CALGB 9840 who were FISH negative by central laboratory results and had (+/-TRZ, n=19 each arm) or did not have (+TRZ, n=53; -TRZ, n=50) chromosome 17 polysomy.

Patients in EGF100151 had locally advanced or metastatic disease that progressed after one or more regimens with an anthracycline, a taxane, and trastuzumab (given separately or in combinations, as adjuvant therapy or for metastasis). Women (n=399) with local laboratory HER2 test results of IHC3+ or IHC2+/FISH positive were randomized to capecitabine with or without lapatinib. Baseline characteristics and prognostic factors of the groups treated with versus without lapatinib were similar. Subsequent central laboratory reanalysis by FISH and IHC of tumor samples from 300 patients (75 percent of all randomized) identified HER2 discordant or HER2 negative subgroups (Table 7). Data were not reported to separately compare baseline characteristics or prognostic factors by treatment arm for any of these subgroups.

Results, Key Question 2

Adjuvant AC→(P±TRZ). The only available data are from post-hoc subgroup analyses, without stratification for the subgroups' defining characteristics. Neither the B31 nor the N9831 analyses reported subgroup-specific comparisons of baseline characteristics or prognostic factors by treatment arm. Furthermore, one subgroup mixed results for a discordant subgroup (IHC 0, 1+, FISH positive) with results for initially equivocal but ultimately positive (IHC 2+ but amplified by FISH) patients. Finally, data are presently unavailable from studies that classified patients using assay thresholds consistent with current guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007 see “Results and Conclusions, Key Question 1, Narrative Review”).

Neither trial reported median followup durations, or showed numbers per arm at risk over time, for the specific subgroups compared. In each subgroup from each treatment arm, failure events (e.g., death or relapse) occurred in less than 25 percent of patients (range: 5–23 percent) at the time of analysis. Therefore, length of followup was inadequate for reliable estimates of median event-free durations for any outcome reported. The interim analyses for all patients randomized in the larger trials that were sources of these subgroups (Romond, Perez, Bryant, et al., 2005) also lacked sufficient followup for reliable estimates of median overall survival or median disease-free survival (DFS).

For HER2 discrepant patients who were FISH positive and IHC 0, 1+ or 2+ by central laboratory testing, between-arm differences in outcome were not statistically significant in either trial. In B31 (n=56 +TRZ; n=69 -TRZ), the HR for failure in analysis of DFS was 0.30 (95 percent CI: 0.08–1.07; p=0.064) and the HR for failure in analysis of recurrence-free interval (RFI) was 0.35 (95 percent CI: 0.10–1.28; p=0.11). In N9831 (n=123 +TRZ; n=95 -TRZ), the HR for failure in analysis of DFS was 0.98 (95 percent CI: 0.33–2.91; p=0.97).

Few patients were FISH negative and IHC 3+ by central laboratory results (from B31: n=10 +TRZ; n=21 -TRZ; from N9831: n=23 +TRZ; n=30 -TRZ). B31 reported HR for failure was 0.91 for both DFS and RFI (for each outcome, 95 percent CI: 0.08–10.0; p=0.94), and N9831 reported hazard ratio for failure was 0.61 (95 percent CI: 0.11–3.29; p=0.57). Each between-arm subgroup comparison was not statistically significant.

Only B31 analyzed outcomes of patient subgroups that were HER2 negative by FISH but IHC 1+ or 2+ by central laboratory testing [n=69 +TRZ; n=80 -TRZ]). Between-arm differences reported by Paik, Kim, Jeong et al. (2007) were statistically significant for DFS (HR=0.30; 95 percent CI: 0.11–0.83; p=0.02) and RFI (HR=0.31; 95 percent CI: 0.10–0.95; p=0.041), and favored the subgroup given trastuzumab.

Both trials reported on patients who were FISH negative and IHC 0, 1+ or 2+ by central laboratory testing. In B31, this subgroup added FISH-negative/IHC 0 patients (13 and 12 per arm, respectively) to those in the FISH-negative/IHC 1+ or 2+ arms shown above (combined n=82 +TRZ; combined n=92 -TRZ). Between-arm differences were statistically significant for DFS (7 events, +TRZ, 20 events, -TRZ; HR=0.34; 95 percent CI: 0.14–0.80; p=0.014) and RFI (HR=0.36; 95 percent CI: 0.14–0.92; p=0.034), and again favored the subgroup given trastuzumab. One patient died in the trastuzumab arm, while 10 died in the control arm (HR=0.08; 95 percent CI: 0.01–0.64, p=0.017). In N9831 (n=59 +TRZ, n=44 -TRZ), the between-arm difference in DFS (HR=0.51; 95 percent CI: 0.21–1.2; p=0.13) was not statistically significant.

HER2 gene copy number and magnitude of benefit from trastuzumab. Additional unpublished subset analyses from the B31 trial presented at the June 2007 ASCO annual meeting (Paik, Kim, Jeong, et al., 2007), and similar analyses from the N9831 trial (Reinholz, Jenkins, Hillman, et al., 2007) and the HERA trial (McCaskill-Stevens, Proctor, Goodbrand, et al., 2007) presented at the December, 2007 San Antonio Breast Cancer Symposium, investigated the hypothesis that higher HER2 gene copy numbers, or higher HER2/CEP17 FISH ratios, were associated with a larger magnitude of relative benefit from trastuzumab. Data from the N9831 and HERA trials showed that the hazard ratio for DFS did not grow more favorable to the trastuzumab arm as average FISH ratios increased from 2.0 to 15 or greater (N9831), or from 2 to greater than 8 (HERA). Additionally, investigators found the HR for DFS did not increase as average HER2 gene copy number per cell increased from 4 to greater than 18 (HERA), or from 2 to greater than 10 (B31).

Polysomy 17 and adjuvant trastuzumab. An unpublished post-hoc analysis of data from N9831 presented at the December 2007 San Antonio Breast Cancer Symposium evaluated whether polysomy 17 influenced effects of adjuvant trastuzumab (Reinholz, Jenkins, Hillman, et al., 2007). Investigators reported that among patients with amplified HER2 genes, trastuzumab increased DFS whether or not these patients had polysomy 17. Central lab results identified very few patients without HER2 overexpression by IHC or HER2 gene amplification by FISH, but with polysomy 17. DFS was lower (79 percent versus 83 percent at 3 years; 65 percent versus 75 percent at 5 years) among those given trastuzumab than among those not given trastuzumab, although the sample size was small and few events had occurred in either arm (6 of 24 given trastuzumab, 3 of 13 controls). Investigators also analyzed slightly larger patient subsets without HER2 overexpression by IHC, HER2 gene amplification by FISH, or polysomy 17. DFS was substantially higher (94 percent versus 77 percent at 3 years; 84 percent versus 55 percent at 5 years) among those given than among those not given trastuzumab. As in the subset with polysomy 17, few events had occurred in either arm in the subset without polysomy (4 of 34 given trastuzumab, 13 of 33 controls). Additionally, unpublished data from the NSABP B31 trial showed no impact on prognosis or degree of benefit from trastuzumab (Dr. S. Paik; personal communication, May 2008).

HER2-negative patients with metastatic disease given P±TRZ for first- or second-line therapy. Patients found IHC 2+/FISH negative or IHC 0, 1+ by local laboratory results were randomized in CALGB 9840 to have or not have trastuzumab added to paclitaxel (n=113 +TRZ; n=115 -TRZ). Between-arm differences in OS (median: 21.6 versus 19.6 months, p=0.67), time to progression (TTP; median: 12 versus 6 months, p=0.088), and overall response rate (ORR; 35 percent versus 29 percent, p=0.32) were not statistically significant (Seidman, Berry, Cirrincione, et al., 2008).

CALGB 150002 reported that subgroups from CALGB 9840 found FISH negative by central laboratory results, and also found to have chromosome 17 polysomy (n=19 +TRZ; n=19 -TRZ), showed a statistically significant increase in ORR (63 percent versus 26 percent, p=0.048) among those given trastuzumab plus paclitaxel compared with those given paclitaxel alone (Kaufman, Broadwater, Lezon-Geyda, et al., 2007). In contrast, ORR did not differ between treatment arms (36 percent in each) for centrally FISH-negative patients without chromosome 17 polysomy. The ORR difference between arms for the centrally FISH-negative subgroup with polysomy 17 (+/-TRZ; n=19 each) did not yield statistically significant differences between arms for either OS (p=0.538) or TTP (p=0.88).

HER2-negative patients with advanced or metastatic disease that progressed after an anthracycline, a taxane, and trastuzumab given capecitabine ± lapatinib. Few patients randomized to capecitabine with or without lapatinib in the EGF100151 trial were HER2 discordant (Table 7). Furthermore, outcomes were not reported separately for those found FISH positive but IHC negative by central laboratory testing (with lapatinib, n=15; without lapatinib, n=7), or those found FISH negative but IHC 3+ by central lab results (with lapatinib, n=1; without lapatinib, n=2). Investigators identified a total of 74 patients (23.5 percent of 315 tested in the central laboratory) whose local results were not confirmed by the central lab as meeting HER2 eligibility criteria of IHC 3+ or FISH positive/IHC2+ (Cameron, Casey, Press et al., 2008); distribution between treatment arms was not reported. In an exploratory Kaplan-Meier analysis, investigators found no statistically significant difference between arms (capecitabine with or without lapatinib) in PFS (HR=0.772; 95 percent CI: 0.386–1.543; p=0.46).

Conclusions and Discussion, Key Question 2

Adjuvant trastuzumab. Currently available evidence is inconclusive on outcomes of trastuzumab added to adjuvant chemotherapy for resected HER2-discordant or HER2-negative patients. Evidence on each subgroup may be used to generate hypotheses, but is too weak to test hypotheses, for the following reasons. All available evidence is from post-hoc analyses on subgroups not directly randomized or stratified by the HER2 subgroups of interest. Furthermore, available reports did not show direct comparisons of baseline characteristics and prognostic factors for the specific subgroups compared. Thus, it is uncertain whether the HER2-discordant or HER2-negative subgroups were balanced by treatment arm (i.e., with or without trastuzumab; although treatment arms appeared well-balanced across all patients randomized). Finally, the data used for the two adjuvant studies are from interim analyses, with inadequate followup to estimate median survival for all patients randomized, and inadequate information on median duration of followup in the specific subgroups compared. Thus, although these were large, well-designed and well-conducted randomized, controlled trials, since the overwhelming majority of patients they randomized were unequivocally HER2-positive, only poor quality evidence is presently available on outcomes of adjuvant trastuzumab in either HER2 discordant or HER2 negative patient subgroups.

Adjuvant trastuzumab in HER2-discordant patients. Evidence is unavailable to evaluate effects of trastuzumab specifically for HER2-discordant patients who are FISH positive but IHC negative (0, 1+) by central lab results. Analyses reported from each trial pooled outcomes for these patients with outcomes for those who tested FISH positive and IHC 2+. The latter subset (initially considered equivocal if tested first by IHC) was classified HER2 positive by each trial protocol, and is ultimately classified HER2 positive by algorithms in current guidelines. A more informative analysis limited to the discordant subgroup might compare outcomes with versus without trastuzumab using data pooled from B31 and N9831 on patients who were FISH positive but IHC 0 or 1+ by central lab tests. Results from a systematic review (see Table 6, Key Question 1) estimates this subgroup as 2.4 percent (95 percent CI: 1–4.3 percent) of all breast cancer patients (Dendukuri, Khetani, McIsaac, et al., 2007).

Sample size is insufficient for conclusions from HER2-discordant B31 (total n=31) and N9831 (total n=53) subgroups that tested FISH negative but IHC 3+ by central lab results. The proportion of FISH-negative, IHC 3+ patients is 2.2 percent across both trials (total randomized: 3,822). Results of the systematic review summarized in Table 6 (Key Question 1) estimate this subgroup as 1.2 percent (95 percent CI: 0.6–2.1 percent) of all breast cancer patients (Dendukuri, Khetani, McIsaac, et al., 2007). Although at least three other randomized trials investigated adjuvant trastuzumab, they confirmed eligibility by central or reference laboratory FISH tests before randomizing patients, and have not reported on either of the HER2 discordant subgroups of interest. Thus, large database or registry analyses may be the only source of better evidence on outcomes of adjuvant trastuzumab for the two HER2 discordant subgroups, which together comprise approximately 4 percent of all breast cancer patients.

Factors influencing discordant results. Discordant results may occur if one assay is correct and the other in error, either due to preanalytic, analytic, or postanalytic factors (see Key Question 1). As with any assay, 100 percent accuracy cannot be expected even from the most careful and proficient laboratories. Proficiency testing and other quality control and quality assurance measures to minimize false-negative and false-positive results are recommended in current practice guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). However, concordance of different methods to classify an individual as HER2 positive or negative is at least partly independent from accuracy of performing a specific assay. Even with the most careful and highly accurate laboratory techniques, discordance in classification may occur between a method that detects gene amplification (FISH in these studies, but also true with CISH or SISH) and a method that detects protein overexpression (IHC in these studies, but also true with Western blots).

By current guidelines, clinicians may categorize identical discordant patients differently with respect to HER2 status, depending on the selection and sequence of tests they order. So, for example, FISH-positive and IHC 0 or 1+ patients (1 to 4 percent of cases; see Table 6, Key Question 1) would be classified HER2 positive if tested only by FISH, but would be classified HER2 negative if tested initially by IHC, since reflex FISH would not be performed. Conversely, FISH-negative and IHC 3+ patients (1 to 2 percent of cases; see Table 6) would be considered HER2 negative if tested only by FISH, but HER2 positive if tested initially by IHC. NSABP B31 and NCCTG N9831 report the frequency of these subsets based on careful central laboratory results for FISH and IHC assays, although results are pooled across some IHC scores (see “Results and Conclusions, Key Question 1”). However, these data do not permit assessment of the subset frequencies independent of tissue fixation artifacts that may have occurred at some local hospitals and laboratories, or the margin of error that might exist even in the most proficient laboratories. Nor can the clinical consequences of such discordances be assessed from the available evidence.

Adjuvant trastuzumab in HER2-negative patients. Scant but intriguing evidence suggests the hypothesis that some patients currently classified as HER2 negative may benefit from adjuvant trastuzumab. Data reported from B31 showed significantly longer DFS and RFI in FISH-negative IHC ≤2+ patients given trastuzumab than in similar patients managed without trastuzumab, whether the analysis did or did not include those who were IHC 0. However, a similar analysis of data from N9831 did not show significant differences. Since both were interim analyses of trials in which fewer than 25 percent of subjects had reached a failure event, neither provides conclusive evidence as yet, and follow up analyses from these trials will be of great interest. Blinded review of IHC and FISH scoring would also be useful for samples from these trials, and from other adjuvant trastuzumab trials that confirmed eligibility by central lab testing before randomizing each patient. Recent guidelines conclude that present evidence does not demonstrate improved outcomes with use of adjuvant trastuzumab for patients who would be classified HER2 negative by protocols of B31, N9831, and similar studies (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007).

Importantly, the B31 and N9831 subgroup analyses combine results for HER2-negative patients many now consider to be different: those with the so-called “triple-negative” subtype (i.e., negative for HER2, estrogen receptor, and progesterone receptor), and the luminal subtypes (luminal A or luminal B) that are negative for HER2 but positive for at least one of the hormone receptors. These subtypes were initially defined in studies using microarrays to subdivide breast cancer patients by gene expression patterns (for reviews, see Peppercorn, Perou, and Carey, 2008; Razzak, Lin, and, Winer, 2008; Kang, Martel, and Harris 2008). There is evidence that the triple negative and luminal subsets differ with respect to prognosis, chemotherapy response, and outcomes (Carey, Dees, Sawyer, et al., 2007; Liedtke, Mazouni, Hess, et al., 2008), and they clearly differ with respect to effects of endocrine therapy. Further complexity comes from reports that there is substantial but incomplete overlap between triple negative patients and those classified in the “basal-like” subset by gene expression arrays (Cheang, Voduc, Bajdik, et al., 2008). Notably, new phase III trials have recently opened (and others are planned) specifically for patients with triple negative or “basal-like” breast cancer (Kilburn, 2008). Results from these studies will likely be more conclusive than analyses that pool all HER2-negative patients to determine outcomes for subsets of HER2-negative breast cancer.

Adjuvant trastuzumab in HER2-equivocal patients. Among patients with initially equivocal HER2 test results by current clinical practice guidelines (those scored 2+ if IHC is first, or HER2 gene copy number from 4.0 to 6.0 or HER2/CEP17 ratio from 1.8 to 2.2 if ISH is first), ultimately, most are definitively categorized as HER2 positive or HER2 negative after guideline-recommended followup testing. Data are presently unavailable either to estimate effects of adjuvant trastuzumab on outcomes for the subset with initially equivocal results subsequently classified HER2 positive, or to demonstrate lack of benefit in those subsequently classified HER2 negative. For the minority who remain equivocal after followup testing, the guidelines' treatment recommendation depends on whether the patient would have been included or excluded from key randomized, controlled trials. For example, patients with HER2/CEP17 ratios 2.0 or greater but less than 2.2 were included and randomized in the adjuvant trastuzumab trials. Therefore, the guidelines consider current evidence insufficient to deny these patients trastuzumab with adjuvant chemotherapy. In contrast, patients with HER2/CEP17 ratios 1.8 or greater but less than 2.0 were excluded from these trials, and the guidelines consider current evidence insufficient to include trastuzumab in their adjuvant therapy regimens. Figures 2 and 3 (see Key Question 1) include information on trial eligibility of patients whose test results are equivocal by each HER2 assay.

Advanced or metastatic disease. No data were reported on patients with advanced or metastatic disease and discordant results from IHC and ISH HER2 testing. Evidence is available from one trial (CALGB 9840; n=226) that randomized metastatic breast cancer patients who were HER2 negative by local laboratory testing to chemotherapy with or without trastuzumab (Seidman, Berry, Cirrincione, et al., 2008). Additionally, a small subset of advanced and metastatic patients randomized to chemotherapy with or without lapatinib in another trial (EGF100151; n=74) were found by central lab confirmatory testing not to meet protocol criteria for HER2 positivity (Cameron, Casey, Press, et al., 2008). Thus, one source of good quality evidence (CALGB 9840) and one source of moderate quality evidence (EGF100151) suggest that HER2-negative patients with advanced or metastatic disease do not benefit from treatments targeting the HER2 molecule. Additional evidence supporting this conclusion comes from an analysis of data pooled from three pivotal trials of trastuzumab for metastatic breast cancer. The analysis showed that among patients found IHC 2+ by the presently unavailable “clinical trial assay,” benefit from trastuzumab was limited to those subsequently shown to have amplified HER2 genes by FISH (Mass, Press, Anderson et al., 2005).

CALGB 15002 investigators compared outcomes with versus without trastuzumab for a subgroup of FISH-negative patients who either had (n=38) or did not have (n=103) polysomy 17, (Kaufman, Broadwater, Lezon-Geyda, et al., 2007). Overall response rate was significantly higher with versus without trastuzumab for those with polysomy 17, but was identical with or without trastuzumab for those without polysomy 17. In contrast, the N9831 study on adjuvant therapy (Reinholz, Jenkins, Hillman, et al., 2007) reported no impact of polysomy 17 on benefit from trastuzumab, and unpublished data from a second study (NSABP B31; Dr. S. Paik, personal communication, May 2008)) suggested the same finding. This might be due to different definitions of polysomy 17 for CALGB 15002 (average CEP17 copy number per cell greater than 2.2) and N9831 (more than 3 CEP17 signals in more than 30% of nuclei). It might also reflect differences between adjuvant therapy and treatment for metastatic disease with respect to polysomy 17 as a predictor of benefit from trastuzumab. Note also that studies reviewed for “Results and Conclusions, Key Question 1” report conflicting data on a possible association of polysomy 17 with overexpression of HER2 protein. Thus, presently available evidence leaves unanswered questions with respect to the utility of polysomy 17 to select patients for HER2-targeted therapy.

Key Question 3a

For breast cancer patients, what is the evidence on clinical benefits and harms of using HER2 assay results to guide selection of chemotherapy regimen?

Study Selection

The search strategy for studies on HER2 testing in breast cancer yielded 3,218 citations. Initial review of titles and abstracts selected 219 citations potentially relevant to Key Question 3 for retrieval and review as full articles. Of these, 161 were considered potentially relevant to Key Question 3a (HER2 status to guide choice of chemotherapy regimen) while 62 were considered potentially relevant to Key Question 3b (HER2 status to guide choice of hormonal therapy regimen). Four reports were considered for both question 3a and 3b.

Twenty separate studies met selection criteria and were abstracted for Key Question 3a (Table 10; Appendix Table IIIa-A *). Eleven studies investigated adjuvant chemotherapy for resected early stage breast cancer, including nine randomized, controlled trials, an uncontrolled series, and the standard-dose control arm of a randomized, controlled trial of high-dose chemotherapy with autologous stem-cell support (HDC/AuSCS). Six studies investigated neoadjuvant (preoperative) chemotherapy for locally advanced breast cancer; one was a randomized, controlled trial and five were uncontrolled, single-arm series. Three studies investigated first- or second-line therapy for advanced or metastatic breast cancer. Two randomized, controlled trials compared different regimens; the third randomized, controlled trial compared different doses of one drug, but pooled arms for the analysis by HER2 status.

Table 10. Summary design, treatment, patient characteristics, KQ3a.

Table 10

Summary design, treatment, patient characteristics, KQ3a.

Available Studies

Eleven studies on postsurgical adjuvant chemotherapy. The available evidence included one retrospective analysis of an uncontrolled single-arm series (Yang, Klos, Zhou, et al., 2003), and ten randomized, controlled trials. However, for one of the randomized, controlled trials, (Tanner, Isola, Wiklund, et al., 2006), one arm was excluded, since patients received HDC/AuSCS. Each randomized, controlled trial was designed to compare outcomes of treatment regimens in populations not selected or stratified for HER2 status, and most published earlier reports that compared patients, prognostic factors, and outcomes by treatment arm for all randomized patients. With only one exception (Martin, Pienkowski, Mackey, et al., 2005), reports from randomized, controlled trials included for Key Question 3a were secondary or correlative analyses on patient subgroups with archived tissue samples that permitted HER2 testing. The proportion of originally randomized patients included in the analyses by HER2 status ranged from 34 to 92 percent (see Table 10). A subset of trials compared baseline characteristics and known prognostic factors between the subgroups with known HER2 status and those with undetermined HER2 status, and a smaller subset also compared outcomes. None of these studies used trastuzumab for HER2-positive patients; studies addressing the use of trastuzumab are included in the discussion of Key Question 2.

Studies on the CMF regimen. The uncontrolled series (Yang, Klos, Zhou, et al., 2003; n=94) and one comparative randomized, controlled trial (Gusterson, Gelber, Goldhirsch, et al., 2003; n=2,504 randomized) studied the cyclophosphamide plus methotrexate plus fluorouracil (CMF) regimen. The Gusterson and co-workers trial separately randomized groups of node-negative and node-positive patients. Tissue blocks for determining HER2 status were unavailable for 515 (40 percent) of 1,275 randomized node-negative patients and for 483 (39 percent) of 1,229 randomized node-positive patients. Node-negative patients were randomized to one perioperative cycle of adjuvant CMF or to observation. Node-positive patients were randomized to multiple cycles of adjuvant CMF or to one perioperative cycle of adjuvant CMF. The relevance of these findings for current practice may be limited as taxane-based regimens have largely replaced CMF when anthracyclines are not used, particularly for hormone-receptor-negative patients.

Studies on anthracycline-based regimens. Four randomized, controlled trials (Moliterni, Menard, Valagussa, et al., 2003; Colozza, Sidoni, Mosconi, et al., 2005; Pritchard, Shepherd, O'Malley, et al., 2006; Knoop, Knudsen, Balslev, et al., 2005) compared CMF versus anthracycline-based regimens, and a fifth randomized, controlled trial compared an anthracycline-based regimen without autologous stem-cell support (AuSCS) versus a higher-dose regimen with AuSCS (Tanner, Isola, Wiklund, et al., 2006). Only the non-AuSCS arm of the Tanner and co-workers study met selection criteria for data abstraction. Moliterni, Menard, Valagussa, et al. (2003) compared CMF followed by doxorubicin (CMF→A) versus CMF alone, and included 92 percent of originally randomized patients. Colozza, Sidoni, Mosconi, et al. (2005) compared epirubicin (E) alone versus CMF, and included 76 percent of originally randomized patients. Pritchard, Shepherd, O'Malley, et al. (2006) and Knoop, Knudsen, Balslev, et al. (2005) compared cyclophosphamide plus epirubicin plus fluorouracil (CEF) versus CMF, although the Pritchard and co-workers study gave 6 cycles while the Knoop and co-workers study gave 9 cycles. Pritchard and co-workers included 89 percent of originally randomized patients while Knoop and co-workers included 79 percent. Tanner, Isola, Wiklund, et al. (2006) also gave 9 cycles of CEF in the non-AuSCS arm of their trial, although the doses administered were higher than those in the Pritchard and Knoop trials. Outcomes by HER2 status for 72 percent of those randomized to the non-AuSCS arm are considered a single-arm study in this review.

Two randomized, controlled trials with two reports each compared different doses (Dressler, Berry, Broadwater, et al., 2005; Thor, Berry, Budman, et al., 1998) or dose intensities and schedules (Del Mastro, Bruzzi, Nicolo, et al., 2005; Del Mastro, Bruzzi, Venturini, et al., 2004) for anthracycline-based regimens. The Dressler and co-workers study investigated interaction of HER2 status with dose in 524 patients from the Cancer and Leukemia Group B (CALGB) trial 8541. This trial randomized 1,549 patients to high-dose (600/60/600 mg/m2 every four weeks for 16 weeks), moderate-dose (400/40/400 mg/m2 every four weeks for 24 weeks) or low-dose (300/30/300 mg/m2 every four weeks for 16 weeks) regimens of cyclophosphamide, doxorubicin and fluorouracil (CAF) (Budman, Berry, Cirrincione, et al., 1998). Although earlier reports (Thor, Berry, Budman, et al., 1998; Muss, Thor, Berry, et al., 1994) included different proportions of randomized patients tested for HER2 status by IHC and/or PCR, Dressler and colleagues compared outcomes separately by assay method (IHC, FISH, or PCR) for HER2 status subgroups from each dose arm (n=524, 33.8 percent of originally randomized patients).

In the GONO-MIG-1 study, Del Mastro and colleagues (2004, 2005) randomized 1,214 patients to either six cycles of CEF every three weeks (FEC21) or up to nine cycles at the same dose (600/60/600 mg/m2) every two weeks (FEC14). The analysis by HER2 status included 731 (60 percent) of originally randomized patients.

Studies on regimens with a taxane. Two randomized, controlled trials investigated effects of HER2 status on outcomes of regimens with versus without a taxane (Hayes, Thor, Dressler, et al., 2007; Martin, Pienkowski, Mackey, et al., 2005). Hayes and colleagues (2007; CALGB trial 9344) randomized 3,121 patients to doxorubicin plus cyclophosphamide (AC) followed by paclitaxel or observation. The trial used a 3 × 2 factorial design to compare three doses of doxorubicin in AC, each followed or not by paclitaxel. Since outcomes were not statistically significantly different across doxorubicin doses, the analysis of outcomes with versus without paclitaxel by HER2 status pooled patients from all three doxorubicin doses. Two groups of 750 patients each were randomly selected for this correlative analysis, but tissue blocks were available and analyzed for only 1,322 (42 percent of those originally randomized).

Martin, Pienkowski, Mackey, et al. (2005) stratified patients (n=1,491) by number of involved axillary lymph nodes and randomized them to six three-week cycles of docetaxel plus doxorubicin plus cyclophosphamide (TAC) or fluorouracil plus doxorubicin plus cyclophosphamide (FAC). The preplanned analysis by HER2 status included 1,262 (85 percent) of originally randomized patients. Patients were not stratified by HER2 status. In the TAC group, 20.8 percent were HER2 positive and 15.4 percent lacked tumor specimens for measuring HER2; in the FAC group, 22 percent were HER2 positive and 15.3 percent lacked tumor specimens. The study does not report the distribution of other prognostic factors by treatment group and HER2 status combined, which would be useful in ensuring balance in this subset of trial patients with known HER2 status.

Evidence hierarchy. The first section of Table 11 categorizes available studies on HER2 status and outcomes of adjuvant chemotherapy according to the evidence hierarchy used in this evidence report (see “Methods”). No trials stratified patients by HER2 status or randomized patients to therapy guided or not guided by HER2 status, the highest category of evidence. Only one randomized, controlled trial that compared TAC versus FAC, reported a preplanned multivariate subgroup analysis (Martin, Pienkowski, Mackey, et al., 2005). Eight randomized, controlled trials, one that compared CMF versus no or minimal CMF, four that compared CMF versus an anthracycline-based regimen, two that compared different doses or schedules of anthracycline-based regimens, and one that compared AC alone versus followed by paclitaxel, reported post-hoc multivariate subgroup analyses. Finally, single-arm data from two reports provided univariate analyses by HER2 status.

Table 11. Hierarchy of evidence, KQ3a.

Table 11

Hierarchy of evidence, KQ3a.

Study quality assessment. The first section of Table 12 shows that, of nine studies that analyzed the relationship of HER2 status to outcome differences in previously completed randomized, controlled trials on adjuvant chemotherapy, each was prospectively designed; included a large, well-defined and representative study population; and treated patients in each study arm homogeneously, or used rule-based selection for non-study therapies. However, only two reports (Dressler, Berry, Broadwater, et al., 2005; Martin, Pienkowski, Mackey, et al., 2005) included a prespecified hypothesis on the relationship of HER2 status to differences between regimens in treatment outcome. Each study adequately described the assays and thresholds they used to for classify patients' HER2 status, but only five (Colozza, Sidoni, Mosconi, et al., 2005; Dressler, Berry, Broadwater, et al., 2005; Del Mastro, Bruzzi, Nicolo, et al., 2005; Tanner, Isola, Wiklund, et al., 2006; Hayes, Thor, Dressler, et al., 2007) reported that individuals who assessed HER2 status were blinded to patient and tumor factors and to treatment outcomes. Only three studies from randomized, controlled trials (Moliterni, Menard, Valagussa, et al., 2003; Pritchard, Shepherd, O'Malley, et al., 2006; Martin, Pienkowski, Mackey, et al., 2005) included ≥85 percent of originally randomized patients. However, a fourth (Hayes, Thor, Dressler, et al., 2007) randomly selected two large subsets (n=750 each) and separately analyzed more than 85 percent of patients in each. Six studies from randomized, controlled trials (Moliterni, Menard, Valagussa, et al., 2003; Colozza, Sidoni, Mosconi, et al., 2005; Pritchard, Shepherd, O'Malley, et al., 2006; Knoop, Knudsen, Balslev, et al., 2005; Dressler, Berry, Broadwater, et al., 2005; Hayes, Thor, Dressler, et al., 2007) had 9 or more years' median follow-up, but in only one of these (Moliterni, Menard, Valagussa, et al., 2003) was median follow-up ~15 years. Reporting of methodologic details for multivariate analyses was inadequate in all studies.

Table 12. Study quality ratings, KQ3a.

Table 12

Study quality ratings, KQ3a.

Six studies on preoperative neoadjuvant chemotherapy. Six studies, including one randomized, controlled trial and five uncontrolled series, compared outcomes by HER2 status for patients undergoing neoadjuvant (preoperative) chemotherapy. The randomized, controlled trial (Learn, Yeh, McNutt, et al., 2005) randomized patients (n=144) to one of three arms: doxorubicin plus cyclophosphamide (AC), AC plus docetaxel (AC+D), or AC followed by docetaxel after resection (AC→D). Analysis of pathologic outcomes at resection pooled patients from the AC and AC→D arms and compared these versus the AC+D arm. The secondary, unplanned analysis by HER2 status included 104 (72 percent) of originally randomized patients.

Two uncontrolled series, one prospective (n=232, Arriola, Moreno, Varela, et al., 2006) and the other retrospective (n=67, Park, Kim, Lim, et al., 2003) reported on patients given doxorubicin alone. One uncontrolled retrospective series (n=97, Zhang, Yang, Smith, et al., 2003) reported on patients given three to six cycles of fluorouracil plus doxorubicin plus cyclophosphamide (FAC). A similar uncontrolled, retrospective series (n=77; Tinari, Lattanzio, Natoli, et al., 2006) reported on patients given three to six cycles of fluorouracil plus epirubicin plus cyclophosphamide. Finally, one uncontrolled retrospective series (n=54, Tulbah, Ibrahim, Ezzat, et al., 2002) reported on patients given three or four cycles of paclitaxel plus cisplatin. Each series reported outcomes by HER2 status for all patients (n=232 for the Arriola and co-workers series; n<100 for each of the others).

Evidence hierarchy. As shown in Section 2 of Table 11, no studies on neoadjuvant chemotherapy reported either of the two highest evidence categories. The only study from a randomized, controlled trial on neoadjuvant chemotherapy (Learn, Yeh, McNutt, et al., 2005) reported a post-hoc multivariate subgroup analysis. Two series (Park, Kim, Lim, et al., 2003; Zhang, Yang, Smith, et al., 2003) reported post-hoc multivariate subgroup analyses, while three series reported univariate analyses only.

Study quality assessment. Section 2 of Table 12 shows that only two studies (one a randomized, controlled trial) on neoadjuvant chemotherapy were prospectively designed (Learn, Yeh, McNutt, et al., 2005; Arriola, Moreno, Varela, et al., 2006), and only one reported a prespecified hypothesis for the relationship of HER2 status to outcome of neoadjuvant chemotherapy (Arriola, Moreno, Varela, et al., 2006). Only one study (Arriola, Moreno, Varela, et al., 2006) included ≥100 patients. Four of six (Arriola, Moreno, Varela, et al., 2006; Park, Kim, Lim, et al., 2003; Tulbah, Ibrahim, Ezzat, et al., 2002; Tinari, Lattanzio, Natoli, et al., 2006; but not Learn, Yeh, McNutt, et al., 2005) adequately described the assays and thresholds used to classify patients' HER2 status, but only two (Tulbah, Ibrahim, Ezzat, et al., 2002; Tinari, Lattanzio, Natoli, et al., 2006) reported HER2 assays were scored by assessors blinded to patient and tumor characteristics and treatment outcomes. Patients in each study were treated homogeneously, and each series, but not the randomized, controlled trial (Learn, Yeh, McNutt, et al., 2005), reported on all enrolled patients. Follow-up was not an issue for any study on neoadjuvant therapy, since the outcome of interest was pathologic responses at resection. Reporting of methodologic details for multivariate analyses was inadequate in all studies.

Three studies on chemotherapy for advanced or metastatic breast cancer. Each was a secondary analysis from a randomized, controlled trial designed to compare outcomes of treatment regimens in populations not selected or stratified for HER2 status, and each published earlier reports comparing outcomes by treatment arm for all randomized patients. One randomized, controlled trial (n=474, Harris, Broadwater, Lin, et al., 2006; CALGB 9342) randomized patients with stage IV or inoperable disease undergoing first- or second-line therapy to three different doses of paclitaxel. The analysis of outcomes by HER2 status included 35 percent of originally randomized patients, and pooled data across all three doses. Thus, Harris and co-workers (2006) was considered a single-arm study in this systematic review.

A second randomized, controlled trial (n=326, Di Leo, Chan, Paesmans. et al., 2004) randomized patients to doxorubicin alone (A) or docetaxel alone (T). Eligibility required patients to have metastatic disease and to have failed prior CMF (either as adjuvant therapy or for metastasis), but no prior exposure to either of the randomized drug therapies. The analysis by HER2 status included 54 percent of originally randomized patients. The third randomized, controlled trial (n=516, Konecny, Thomssen, Luck, et al., 2004) randomized patients to first-line therapy for metastatic disease with either epirubicin plus cyclophosphamide (EC) or epirubicin plus paclitaxel (ET). Up to one prior hormonal therapy for metastasis was permitted, with patients stratified by prior hormonal therapy. The analysis by HER2 status included 53 percent of originally randomized patients.

Evidence hierarchy. As shown in Section 3 of Table 11, no studies on advanced or metastatic disease reported evidence of the two highest categories. Two randomized, controlled trials (Di Leo, Chan, Paesmans. et al., 2004; Konecny, Thomssen, Luck, et al., 2004) reported post-hoc multivariate subgroup analyses. The third study, a pooled analysis across trial treatment arms (Harris, Broadwater, Lin, et al., 2006) only reported a univariate analysis.

Study quality assessment. Section 3 of Table 12 shows that each of the three included studies on HER2 status as a predictor of chemotherapy outcomes for advanced or metastatic breast cancer was designed prospectively, but none reported a prespecified hypothesis for the effect of HER2 status on outcomes. Each study included a large, well defined, and representative study population, adequately described the HER2 assays and thresholds they used to classify patients' HER2 status, and treated patients in each study arm homogeneously. Only two of three (Harris, Broadwater, Lin, et al., 2006; Di Leo, Chan, Paesmans. et al., 2004) reported blinding HER2 assessors to patient and tumor characteristics and to treatment outcomes. Each omitted 15 percent or more of enrolled patients from the analysis of outcomes by HER2 status, and each omitted key methodologic details on their multivariate analyses from the published reports. Long-term follow-up was available in only one study (Harris, Broadwater, Lin, et al., 2006), and one did not report the median duration of follow-up (Konecny, Thomssen, Luck, et al., 2004).

Patient Characteristics

Eleven studies on postsurgical adjuvant chemotherapy. Although all investigated adjuvant chemotherapy, the eleven studies varied with respect to their patient groups' distributions of baseline characteristics and risk factors for recurrent disease (Appendix Tables IIIa-B and IIIa-C *, Table 10). Only a subset of these studies compared the HER2 positive and negative subgroups for baseline characteristics and risk factors. Also, only a subset of the nine randomized, controlled trials compared patients included in the analysis by HER2 status with those excluded because tissue blocks were missing or unsuitable.

Studies on CMF. Of the two CMF studies, the retrospective series by Yang, Klos, Zhou, et al. (2003) pooled data for node-negative and node-positive patients, groups that Gusterson, Gelber, Goldhirsch, et al. (2003) randomized separately to different treatment arm pairs. Yang, Klos, Zhou, et al. (2003) only reported baseline characteristics and risk factors for all patients analyzed. Gusterson, Gelber, Goldhirsch, et al. (2003) compared HER2-positive versus HER2-negative patients separately for the node-positive and node-negative groups, but did not compare those with known HER2 status versus those lacking tissue blocks for HER2 assays. In node-negative patients, HER2 positivity was statistically significantly associated with larger tumor size, hormone-receptor negativity, and higher tumor grade. In node-positive patients, HER2 positivity was statistically significantly associated with menopausal status, hormone-receptor negativity, and higher tumor grade.

Studies on regimens with versus without an anthracycline. Three (Colozza, Sidoni, Mosconi, et al., 2005; Pritchard, Shepherd, O'Malley, et al., 2006; Tanner, Isola, Wiklund, et al., 2006) of five studies comparing adjuvant regimens with versus without an anthracycline compared baseline characteristics of HER2 positive and negative subgroups. Three (Colozza, Sidoni, Mosconi, et al., 2005; Knoop, Knudsen, Balslev, et al., 2005; Tanner, Isola, Wiklund, et al., 2006) explored whether subgroups tested for HER2 status were similar to the total study population or the subgroup not tested. Two trials (Moliterni, Menard, Valagussa, et al., 2003; Pritchard, Shepherd, O'Malley, et al., 2006) determined HER2 status on 92 percent or 89 percent, respectively, of the patients originally randomized and did not report comparisons to all or omitted patients. Each trial's full treatment arms were well balanced for baseline characteristics and prognostic factors.

Moliterni, Menard, Valagussa, et al. (2003) did not report data comparing baseline factors by HER2 status. All patients in this trial had one to three positive nodes, and approximately 65 percent had tumors smaller than 2.1 cm in diameter. Colozza, Sidoni, Mosconi, et al. (2005) reported that treatment arms were well balanced, whether comparing all patients randomized or only those tested for HER2 status. However, significantly more patients randomized to epirubicin than to CMF were HER2 positive (41 percent versus 28 percent, p=.03). Progesterone receptor positivity was the only factor statistically significantly associated with HER2 positivity. This trial included node-positive and node-negative patients (4 or more positive nodes in less than 25 percent), and approximately 45 percent with tumors 2 cm or smaller in diameter.

Pritchard, Shepherd, O'Malley, et al. (2006) reported baseline characteristics of patients tested for HER2 status were similar to those of all randomized patients, but did not show data for this comparison. They showed data comparing FISH-positive and FISH-negative subgroups; except for a shift toward younger age in the FISH-positive subgroup, there were no significant differences. Just over half the patients in this trial had T2 or T3 tumors, all had positive lymph nodes, with four or more positive nodes in 37 percent and 43 percent of the FISH-negative and FISH-positive groups, respectively. Knoop and co-workers (2005) reported that among all patients tested for HER2 status, treatment arms were well balanced for prognostic factors. However, they did not report comparing the HER2-positive versus HER2-negative patients, either by treatment arms or across treatments. Tumors were larger than 2 cm diameter in approximately 60 percent of patients, and approximately 30 percent had four or more positive nodes. Tanner, Isola, Wiklund, et al. (2006) reported (but did not show data) that baseline characteristics of all patients tested for HER2 status did not differ from those of the entire trial cohort. They showed that baseline characteristics were similar for HER2-tested subgroups from each arm. However, the AuSCS arm was excluded from this review, and data were not reported comparing baseline characteristics of HER2-positive versus HER2-negative patients from the FEC arm.

Studies on dose or dose intensity of anthracycline-based regimens. Studies from randomized, controlled trials that compared dose (Dressler, Berry, Broadwater, et al., 2005) or dose intensity (Del Mastro, Bruzzi, Nicolo, et al., 2005) of anthracycline-based regimens reported baseline characteristics and prognostic factors of patients with known HER2 status were similar to those of patients omitted from the analyses, since HER2 status was unknown. Dressler and co-workers (2005) did not report data comparing baseline characteristics or prognostic factors of HER2-positive versus HER2-negative patients. Del Mastro and co-workers (2005) found a greater proportion of HER2-positive than HER2-negative patients lacking expression of both estrogen and progesterone receptors (62 percent versus 32.5 percent). Other baseline characteristics and prognostic factors were similar between subgroups by HER2 status and between treatment arms.

Studies on regimens with versus without a taxane. One of two studies from randomized, controlled trials on regimens with versus without a taxane compared baseline characteristics and prognostic factors of patient with known HER2 status versus those of patients with unknown HER2 status. The trial comparing paclitaxel versus observation after AC (Hayes, Thor, Dressler, et al., 2007) showed similar baseline characteristics, prognostic factors and overall survival in the two subgroups they randomly selected and tested for HER2 status (n=643 and 679, respectively). These subgroups were also similar to all treated patients (n=3,121), and to all non-tested patients (n=1,799). Tumor diameter was 2 cm or smaller in approximately 35 percent, and approximately 54 percent had 4 or more positive nodes. The randomized, controlled trial that compared TAC versus FAC (Martin, Pienkowski, Mackey, et al., 2005) only compared patient characteristics and prognostic factors by treatment arm for all patients randomized. Neither study compared HER2-positive versus HER2-negative patients, either pooled across treatments or by treatment arm.

Six studies on preoperative neoadjuvant chemotherapy. The randomized, controlled trial on neoadjuvant therapy (Learn, Yeh, McNutt, et al., 2005) did not compare treatment arms or patient subgroups by HER2 status (neither known versus unknown nor positive versus negative) with respect to baseline characteristics or prognostic factors. This study only reported patient and tumor characteristics for all randomized patients

Only one (Tulbah, Ibrahim, Ezzat, et al., 2002) of the five included series compared baseline characteristics and prognostic factors for HER2-positive and HER2-negative subgroups. Across all five studies, approximately 55 percent to 65 percent of included patients were positive for estrogen receptors, and 45 percent to 55 percent were positive for progesterone receptors. However, their study samples varied somewhat with respect to tumor size and number of positive nodes. The series reported by Arriola, Moreno, Varela, et al. (2006) included 30 percent T2 and 70 percent T3 tumors, with 60 percent of patients node negative and 40 percent N1. Most patients (91 percent) in the series reported by Park, Kim, Lim, et al. (2003) had tumors between 5 and 10 cm in diameter. However, they did not report nodal status. Zhang, Yang, Smith, et al. (2003) include a few patients (13 percent) with T1 tumors, and approximately 33 percent node-negative patients. Most patients in the Tulbah, Ibrahim, Ezzat, et al. (2002) series had T3 or larger tumors, and approximately 55 percent had N1 disease. They reported generally well-balanced HER2-positive and HER2-negative subgroups. Finally, 75 percent of patients in the Tinari, Lattanzio, Natoli, et al. (2006) series had tumors with diameters between 2 and 5 cm; number of positive nodes was not reported.

Three studies on chemotherapy for advanced or metastatic breast cancer. Each of three included randomized, controlled trials reported that baseline characteristics and prognostic factors for the subgroup tested for HER2 status were similar to those of patients not tested. However, none compared HER2-positive versus HER2-negative subgroups, either separately by treatment arm or across arms.

Harris, Broadwater, Lin, et al. (2006) reported the only statistically significant difference between patients tested for HER2 (and other biomarkers) and those not tested was a shorter disease-free interval among those tested (19 versus 31 months, p=.0003). Investigators attributed this difference to discarding of tissue blocks after 10 years, thus a shorter interval from diagnosis to metastasis for those with blocks remaining. Hormone-receptor status (positive in 58 percent) and median number of metastatic sites (one) were the only prognostic factors reported among those tested for HER2 status. The analysis by HER2 status pooled patients across three trial arms randomized to different paclitaxel doses.

Di Leo, Chan, Paesmans, et al. (2004) showed the subgroups tested for HER2 status from each treatment arm were similar to each other and to the untested patients. Approximately half the included patients had three or more sites of disease, and more than three fourths had visceral involvement. They did not report hormone receptor status.

Konecny, Thomssen, Luck, et al. (2004) reported no statistically significant differences in baseline characteristics or prognostic factors between groups tested for HER2 and those not tested from each treatment arm compared separately. However, the HER2-positive and HER2-negative groups were not directly compared, either separately by treatment arm or pooled across arms.

Results, Key Question 3a

Eleven studies on postsurgical adjuvant chemotherapy

Studies on CMF. Both studies on CMF reported superior outcomes in HER2-negative compared with HER2-positive patients (see Tables 13 and 14). The Gusterson, Gelber, Goldhirsch, et al. (2003) trial used proportional hazards models to compare hazard ratios (HR) for disease-free (DFS) and overall survival (OS) after no or one cycle of CMF in node-negative patients; each HR was not statistically significant. They also compared multiple versus single cycles of CMF in node-positive patients. Results favored multiple cycles for the HER2-negative subgroup and were statistically significant, but were not significant for HER2-positive patients:

  • OS, HER2- (n=406 multiple; n=200, one): HR=0.69, 95 percent CI: 0.52–0.92; p=.01
  • OS, HER2+ (n=85 multiple; n=55, one): HR=1.15, 95 percent CI: 0.62–1.54; p, NS
  • DFS, HER2- (n=406 multiple; n=200, one): HR=0.57, 95 percent CI: 0.46–0.72; p<.0001
  • DFS, HER2+ (n= 85 multiple; n=55, one): HR=0.77, 95 percent CI: 0.51–1.16; p, NS

Table 13. Summary time to event outcomes, KQ3a.

Table 13

Summary time to event outcomes, KQ3a.

Table 14. Summary tumor response, KQ3a.

Table 14

Summary tumor response, KQ3a.

The Yang, Klos, Zhou, et al. (2003) uncontrolled series (n=94) reported that at 5 years, DFS in the HER2-negative subgroup was superior to DFS in the HER2-positive subgroup (n=60, 86 percent versus n=34, 53 percent; log rank p<.1; stratified log rank, p=.002 after adjustment for nodal status).

Studies on regimens with versus without an anthracycline. Only one (Pritchard, Shepherd, O'Malley, et al., 2006) of four included randomized, controlled trials comparing regimens with versus without an anthracycline reported superior outcomes with the anthracycline regimen that reached statistical significance for HER2-positive but not HER2-negative patients. Pritchard, Shepherd, O'Malley, et al. (2006) used multivariate analysis (MVA) to test for an interaction of comparative treatment effect with HER2 status. The study compared CEF versus CMF and reported the following results for OS and relapse-free survival (RFS):

  • OS, HER2- (n=237, CEF; n=228, CMF): HR=1.06, 95 percent CI: 0.83–1.44; p, NS
  • OS, HER2+ (n=75, CEF; n=88, CMF): HR=0.65, 95 percent CI: 0.42–1.02; p=.06
  • OS, treatment by HER2 interaction from MVA: HR=2.04, 95 percent CI: 1.14–3.65, p=.02
  • RFS, HER2- (n=237, CEF; n=228, CMF): HR=0.91, 95 percent CI: 0.71–1.18; p, NS
  • RFS, HER2+ (n 75, CEF; n 88, CMF): HR=0.52, 95 percent CI: 0.34–0.80; p=.003
  • RFS, treatment by HER2 interaction from MVA: HR=1.96, 95 percent CI: 1.15–3.65; p=0.01

The other trials reported no statistically significant differences for any subgroups they compared. Moliterni, Menard, Valagussa, et al. (2003) compared CMF alone versus CMF followed by doxorubicin (CMF→A) in HER2-positive (n=50, CMF; n=45, CMF→A) and HER2-negative (n=208, CMF; n=203, CMF→A) subgroups. Confidence intervals spanned 1.00 and HRs were not statistically significant for either outcome (OS, RFS) in either subgroup. With Cox MVA, treatment by HER2 interaction terms were:

  • OS: HR=0.48, p=.052
  • RFS: HR=0.68, p, NS

Colozza, Sidoni, Mosconi, et al. (2005) compared CMF versus epirubicin alone (E), in HER2-positive (n=37, CMF; n=54, E) and HER2-negative (n=96, CMF; n=79, E) subgroups. Log rank analyses of Kaplan-Meier survival curves showed a statistically significant difference in OS at 8 years after CMF favoring HER2-negative over HER2-positive patients: (87.4 +/- 3.4) percent versus (67.6 +/- 7.7) percent, p=.024. All other subgroup comparisons were not statistically significant, and Cox MVA interaction terms for treatment effect by HER2 status also were not statistically significant.

Knoop, Knudsen, Balslev, et al. (2005) compared CMF versus CEF in HER2-positive (n=143, CMF; n=120, CEF) and HER2-negative (n=293, CMF; n=249, CEF) subgroups. For both OS and RFS, hazard ratios from Cox multivariate analyses (stratified by tumor grade, estrogen receptor and TOP2A status; and adjusted for tumor size, nodal and menopausal status) uniformly spanned 1.00 and were not statistically significant for either HER2-positive or HER2-negative subgroups.

The Tanner, Isola, Wiklund, et al. (2006) study showed separate Kaplan-Meier curves for HER2-positive (n=56) and HER2-negative (n=124) subgroups from the tailored FEC arm for both OS and RFS. However, they did not report statistical significance of differences between these HER2 status subgroups (although they reported statistical significance of differences between HER2 status subgroups treated by HDC/AuSCS versus subgroups treated with tailored FEC).

Studies on dose or dose intensity of anthracycline-based regimens. In one of two included studies, multivariate proportional hazards analysis showed statistically significant interaction of anthracycline-based regimen dose or dose-intensity with HER2 status to predict outcome. Dressler, Berry, Broadwater, et al. (2005) compared DFS after high-, moderate-, or low-dose CAF regimens in HER2-positive and HER2-negative subgroups. They reported separate MVAs using FISH, IHC, or PCR to classify patients' HER2 status. Results for DFS at five years comparing high-dose versus low-dose plus moderate-dose CAF subgroups were:

  • HER2/FISH (n=91, HER2+; n=433, HER2-): HR=0.822 (95 percent CI: 0.553–1.220)
  • HER2/IHC (n=127, HER2+; n=396, HER2-): HR=0.834 (95 percent CI: 0.590–1.181)
  • HER2/PCR (n=91, HER2+; n=400, HER2-): HR=0.732 (95 percent CI: 0.507–1.056)
  • HER2/FISH, interaction CAF dose by HER2: HR=0.919 (95 percent CI: 0.814–1.038); p=.033
  • HER2/IHC, interaction CAF dose by HER2: HR=0.418 (95 percent CI: 0.188–0.930); p=.0003
  • HER2/PCR, interaction CAF dose by HER2: HR=0.585 (95 percent CI: 0.253–1.352); p=.043

Investigators stated (but did not report HRs, CIs, or p values) that MVA yielded similar results for statistically significant interaction of CAF dose with HER2 status to predict OS.

Del Mastro, Bruzzi, Nicolo, et al. (2005) compared outcomes after identical doses of FEC administered every 14 days (FEC14) or every 21 days (FEC21). Multivariate proportional hazards analysis showed that interaction terms for HER2 status by randomly assigned treatment (dose intensity or treatment frequency) were not statistically significant for EFS (HR=0.53; p=.12) or OS (HR=0.646; p= .379). HER2 status (HER2-positive, n=103; HER2-negative, n=628) was statistically significant to predict EFS (HR=2.04, p=.005) and OS (HR=2.41, p=.006), while randomly assigned treatment (FEC14, n=370; FEC21, n=361) was not statistically significant to predict either outcome (EFS, HR=0.85, p=.335; OS, HR=0.72, p=.379).

Studies on regimens with versus without a taxane. One of two included studies reported statistically significant interaction of HER2 status with added paclitaxel to predict treatment outcome. Hayes, Thor, Dressler, et al. (2007) compared outcomes with versus without paclitaxel (following AC) in HER2-negative and HER2-positive subgroups, separately for each of two groups they randomly selected for HER2 testing. For each group, OS and DFS for HER2-positive patients given paclitaxel were superior to the same outcomes in HER2-positive patients not given paclitaxel. In contrast, OS and DFS for HER2-negative patients given paclitaxel appeared similar to the same outcomes for HER2-negative patients not given paclitaxel. They used Cox multivariate analyses, separately in each randomly selected group, and in the two groups combined, to test the statistical significance of an interaction term for HER2 positivity and paclitaxel treatment. Results for Group 2 and for Groups 1 and 2 pooled showed a statistically significant interaction favoring paclitaxel treatment in HER2-positive patients:

  • Group 1, n=643: recurrence, HR=0.63, p=.15; death, HR=0.61, p=.17
  • Group 2, n=679: recurrence, HR=0.52, p=.03; death, HR=0.52, p=.03
  • Groups 1+2, n=1,322: recurrence, HR=0.59, p=.01; death, HR=0.57, p=.01

Hayes, Thor, Dressler, et al. (2007) also investigated whether patients' estrogen-receptor status modified the impact of HER2 status on outcomes of paclitaxel. The researchers reported results of an exploratory analysis suggesting that, among HER2-positive patients, paclitaxel improved DFS whether patients were estrogen-receptor negative or positive. However, among HER2-negative patients, paclitaxel apparently improved DFS for ER-negative patients but not for ER-positive patients. HER2-negative, ER-positive patients comprised more than 50 percent of the patients in this study. However, the authors caution that additional prospective studies are needed to validate this finding before clinical practice changes and HER2-negative, ER-positive patients are no longer offered taxanes.

Martin, Pienkowski, Mackey, et al. (2005) compared DFS in patients randomized to AC plus docetaxel (TAC, n=745; HER2 positive, 155; HER2 negative, 475; HER2 unknown, 115) versus AC plus fluorouracil (FAC, n=746; HER2 positive, 164; HER2 negative, 468; HER2 unknown, 114). Subgroup analyses using a Cox proportional hazards model adjusted for age, tumor size and other prognostic factors showed superior outcomes with TAC compared to FAC for all subgroups, including by known HER2 status. A test for interaction of HER2 status with treatment effect, using the ratio of hazard ratios, was not statistically significant (ratio of HRs=0.85; p=.41).

Six studies on preoperative neoadjuvant chemotherapy. The primary outcome of interest for studies on neoadjuvant (preoperative) therapy is pathologic complete (pCR) and partial (PR) response rates, although clinical responses (cCR, cPR) also are considered. One randomized, controlled trial compared responses after neoadjuvant chemotherapy regimens (AC) with versus without added docetaxel (AC+D) (Learn, Yeh, McNutt, et al., 2005). Rates of cPR were similar with each regimen for HER2-positive (22 percent of each subgroup; AC, n=32; AC+D, n=9) and HER2-negative (24 percent of each subgroup; AC, n=37; AC+D, n=26) patients. Multivariate logistic regression analysis of overall clinical responses (ORR = cCR+cPR) showed a statistically significant increase with added docetaxel in HER2-negative patients (AC, ORR=51 percent; AC+D, ORR=81 percent; p<.05) but not in HER2-positive patients (AC, ORR=75 percent; AC+D, ORR=78 percent; p, NS). However, investigators did not report inclusion of an interaction term in their analysis.

Although two (Zhang, Yang, Smith, et al., 2003; Tulbah, Ibrahim, Ezzat, et al., 2002) of five uncontrolled series did report OS and/or DFS outcomes, these may have been influenced by postsurgical treatments that were not identical for all patients. Three of five series reported statistically significantly higher likelihood of response in the HER2-positive subgroups. Arriola, Moreno, Varela, et al. (2006) evaluated clinical and pathologic responses after preoperative treatment with doxorubicin alone. Although they did not report response rates for the HER2-positive (n=43) and HER2-negative (n=180) subgroups, a Mann-Whitney U test showed p=.03 for association of HER2 positivity with pCR. Park, Kim, Lim, et al. (2003) also investigated preoperative therapy with doxorubicin alone. They reported statistically significantly higher pCR (16 percent versus 0) and PR (71 percent versus 47 percent) in the HER2-positive (n=31) than the HER2-negative (n=36) subgroups, p=.013 by Fisher's exact test.

The study reported by Tinari, Lattanzio, Natoli, et al. (2006)compared marker assay results in paired core biopsy specimens (pre-chemotherapy) and resected tumors (post-chemotherapy), and focused primarily on changes induced by anthracycline-based neoadjuvant chemotherapy in HER2 and topoisomerase IIα (TopIIα) expression. However, they also used multivariate logistic regression analysis to compare pathologic tumor responses (TR, defined as either a pCR or minimal residual disease) in HER2 subgroups by core biopsy assays. Tinari and colleagues (2006) reported a 5.28-fold increase (95 percent CI: 1.57–19.6; p=.008) in the likelihood of achieving TR in HER2-positive than in HER2-negative patients.

Zhang, Yang, Smith, et al. (2003) investigated preoperative FAC in HER2-positive (n=28) and HER2-negative (n=69) patients. While overall clinical response rate was higher for the HER2-positive than the HER2-negative subgroup (CR+PR: 93 percent versus 78 percent), the risk ratio for response was not statistically significant (RR=1.2, 95 percent CI: 1.1–1.4, p=.14, Fisher's exact test). Overall pathologic response rates (pCR plus minimal residual disease, MRD) showed an even smaller difference between HER2-positive and HER2-negative subgroups that also was not statistically significant (18 percent versus 13 percent, RR=1.4, 95 percent CI: 0.54–3.67, p=.53, Fisher's exact test). Tulbah, Ibrahim, Ezzat, et al. (2002) investigated preoperative paclitaxel plus cisplatin in HER2-positive (n=21) and HER2-negative (n=31) subgroups. Pathologic complete response rates did not differ significantly between the groups (29 percent versus 23 percent; p=NS).

Three studies on chemotherapy for advanced or metastatic breast cancer. One of three studies did not compare different regimens and pooled data across arms randomized to different paclitaxel doses (Harris, Broadwater, Lin, et al., 2006); one compared monotherapy with doxorubicin (A) versus monotherapy with docetaxel (T) (Di Leo, Chan, Paesmans. et al., 2004); and one compared epirubicin plus cyclophosphamide (EC) versus epirubicin plus paclitaxel (ET) (Konecny, Thomssen, Luck, et al., 2004).

Harris, Broadwater, Lin, et al. (2006) used log rank analysis to compare Kaplan-Meier curves for OS between HER2-positive and HER2-negative patients, separately for test results by three different HER2 assays: CB11 IHC, the HercepTest™ IHC, and FISH. Differences between the curves were not statistically significant for any comparison. They also compared overall response rates (ORR=CR+PR) for subgroups defined by each HER2 assay. Results were statistically significant (HER2-positive, n=46, ORR=35 percent; HER2-negative, n=105, ORR=18 percent; p=.026) only with the HercepTest™ assay, and only when both 2+ and 3+ scores were considered HER2 positive.

Di Leo, Chan, Paesmans, et al. (2004) compared OS and time to progression (TTP) in patients randomized to A or T in HER2-positive (A, n=15; T, n=21) and HER2-negative (A, n=63; T, n=50) subgroups. There were no statistically significant differences between treatment arms for either outcome in either HER2 status subgroup. In contrast, ORR statistically significantly favored T over A in the HER2-positive subgroup (T, n=21, ORR=67 percent versus A, n=15, ORR=27 percent; OR=5.50, 95 percent CI: 1.28–23.69; p=.04). However, the difference was not statistically significantly different for the HER2-negative subgroup (T, n=50, ORR=40 percent versus A, n=63, ORR=35 percent; OR=1.24, 95 percent CI: 0.58–2.68; p=.70).

Konecny, Thomssen, Luck, et al. (2004) compared HER2-positive (EC, n=49; ET, n=48) and HER2-negative (EC, n=88; ET, n=90) subgroups randomized to EC or ET for OS and PFS. With the EC regimen, OS (median, 33.1 versus 16.4 months, log rank p=.01) and PFS (median, 10.4 versus 7.1 months, log rank p=.01) were significantly greater among HER2-positive than among HER2-negative patients. In each other comparison (OS or PFS; for the ET regimen by HER2 status, or for EC versus ET separately in subgroups by HER2 status) the difference was not statistically significant. Univariate chi square tests suggested each ORR difference was statistically significant (between all HER2-positive versus all HER2-negative patients, and separately by treatment arm and HER2 status subgroups; excluding those randomized to EC by HER2 subgroups). However, the interaction of treatment effect with HER2 status was not statistically significant (p=.256) by multivariate logistic regression.

Conclusions and Discussion, Key Question 3a

Across all three treatment settings (adjuvant, neoadjuvant, advanced/metastatic), currently available evidence comparing chemotherapy outcomes in HER2-positive and HER2-negative patient subgroups may be used to generate hypotheses, but is too weak to test hypotheses. Only one study (on adjuvant therapy; Martin, Pienkowski, Mackey, et al., 2005) is from a randomized, controlled trial that prespecified a multivariate subgroup analysis by HER2 status. Investigators reported the interaction of assigned treatment (with versus without paclitaxel) with HER2 status to predict outcome was not statistically significant (ratio of HRs=0.85; p=.41).

All other evidence is from post-hoc analyses on subgroups not directly randomized, selected, or stratified by HER2 status. All other reports from randomized, controlled trials were secondary or correlative analysis on patient subgroups with archived tissue samples available for HER2 testing. Many compared baseline characteristics and prognostic factors of patients with known versus unknown HER2 status, sometimes separately by treatment arm, but more often pooled across treatment arms. However, since few directly compared baseline characteristics and prognostic factors for HER2-positive and HER2-negative subgroups separately from each arm, it is uncertain whether these subgroups were well balanced. A minority of studies reported multivariate analyses that tested the statistical significance of interactions between treatment effects of different regimens and HER2 status.

Evidence on adjuvant CMF chemotherapy. Evidence from two studies (one randomized, controlled trial and one series) suggests HER2-positive patients may derive quantitatively smaller benefit from CMF (smaller improvements in OS and DFS) than experienced by HER2-negative patients. However, such evidence cannot prove that CMF provides no benefit to HER2-positive patients.

Evidence on adjuvant anthracycline therapy. An analysis from one of four randomized, controlled trials reports a statistically significant interaction between use of a regimen that includes an anthracycline and HER2 status as outcome predictors. Data from this study suggest HER2-positive patients (but not HER2-negative patients) experience a statistically significant improvement in outcome from inclusion of an anthracycline in their treatment regimen. Again, this does not prove that HER2-negative patients do not benefit from anthracycline therapy. Given the highly statistically significant result favoring anthracycline therapy for the large population of breast cancer patients included in the Early Breast Cancer Trialists' Collaborative Group (EBCTCG 2005) overview analysis, a more complete test of this hypothesis is needed before one can conclude that omitting anthracyclines from adjuvant chemotherapy regimens does not worsen outcome in HER2-negative patients. The absence of a statistically significant interaction in three other randomized, controlled trials is not informative, given the differences in specific treatment regimens, populations studied, and small numbers in the HER2-positive subgroups.

Two trials compared different doses or dose intensities (frequencies) of anthracycline-based regimens. One (Dressler, Berry, Broadwater, et al., 2005) reported a statistically significant interaction of CAF dose with HER2 status to predict treatment outcome, whether HER2 status was based on FISH, IHC, or PCR assays. Data from this study suggested the highest of three CAF doses (now considered by many oncologists the standard dose for all patients) improved outcomes for HER2-positive patients, but suggested no benefit from the highest dose for HER2-negative patients. In contrast, the interaction of dose intensity (frequency) with HER2 status to predict treatment outcome was not statistically significant in a second randomized, controlled trial (Del Mastro, Bruzzi, Nicolo, et al., 2005). Available data are too weak to conclude that HER2-positive patients clearly experience better outcomes with the higher-dose or dose-intensity anthracycline-based regimens.

Evidence on adding paclitaxel to adjuvant AC chemotherapy. A correlative analysis from one randomized, controlled trial (Hayes, Thor, Dressler, et al., 2007) provides evidence that adding paclitaxel after AC improves OS and DFS for HER2-positive patients, but may not improve these outcomes for HER2-negative patients. Here again, these strongly suggestive data are too weak by themselves to conclude that use of paclitaxel in adjuvant regimens is not beneficial in HER2-negative patients. Additionally, the only trial with a prespecified multivariate subgroup analysis (Martin, Pienkowski, Mackey, et al., 2005) reported that the interaction of concurrently added paclitaxel with HER2 status was not statistically significant.

The potential interaction between HER2 status, estrogen receptor status, and progesterone receptor status as predictors of chemotherapy efficacy is receiving increasing attention. The Hayes, Thor, Dressler, et al. (2007) article is the only included study on chemotherapy for breast cancer that addresses this issue, although the analysis only includes HER2 status and ER status. In an exploratory analysis, the authors found that adding paclitaxel improved survival for all HER2-positive patients and for HER2-negative/ER-negative patients, but not for HER2-negative/ER-positive patients. As discussed in the Conclusions and Discussion for Chapter 2, many researchers are investigating breast cancer subtypes identified by different combinations of ER, PR, and HER2, including the so-called “triple-negative” subtype (i.e., negative for HER2, estrogen receptor, and progesterone receptor), and the luminal subtypes (luminal A or luminal B) that are negative for HER2 but positive for at least one of the hormone receptors. There is evidence that the triple negative and luminal subsets differ with respect to prognosis, chemotherapy response, and outcomes (Carey, Dees, Sawyer et al., 2007; Liedtke, Mazouni, Hess et al., 2008), and they clearly differ with respect to effects of endocrine therapy. New phase III trials for patients with triple negative or “basal-like” breast cancer (Kilburn, 2008) should provide more insight in the future.

Systematic reviews on adjuvant chemotherapy. Recent systematic reviews and meta-analyses on HER2 status to predict chemotherapy outcomes were reported by Gennari and colleagues (Gennari, Sormani, Pronzato, et al., 2008) and by Pritchard and colleagues (Pritchard, Messersmith, Elavathil, et al., 2008; Dhesy-Thind, Pritchard, Messersmith, et al., 2008). Gennari and co-workers (2008) pooled data from eight randomized trials that compared adjuvant regimens with versus without an anthracycline (four of which did not meet selection criteria for this review). Two (NSABP B11, Paik, Bryant, Park, et al., 1998; NSABP B15, Paik, Bryant, Tan-Chiu, et al., 2000) considered patients HER2-positive if membranes of any tumor cells showed antibody staining by IHC, a threshold for HER2 positivity inconsistent with the ASCO/CAP and NCCN guidelines. Substantial numbers of patients from these early (but otherwise well done) randomized, controlled trials may have been classified as HER2 positive who would now be classified as HER2 negative using the currently recommended thresholds. Thus, pooling data from these analyses with later analyses that used current IHC scoring criteria to classify patients may potentially bias the outcome comparisons. We excluded a third study included by Gennari and colleagues (2008) since it was only published as an abstract, without slides available on the web (De Laurentiis, Caputo, Massarelli, et al., 2001). We excluded a fourth study they included (Di Leo, Gancberg, Larsimont, et al., 2002), since patients were not treated identically within each arm and patients with unknown hormone receptor status were given tamoxifen. We replicated the results of the Gennari, Sormani, Pronzato, et al., (2008) meta-analysis including the same studies the authors did and reached the same results. Then we redid the analysis including only the studies meeting criteria for the current review, which meant excluding the four studies mentioned above. Removing these studies widened the confidence intervals, but did not alter the overall conclusions.

The systematic reviews and meta-analyses reported by Pritchard and colleagues (Pritchard, Messersmith, Elavathil, et al., 2008; Dhesy-Thind, Pritchard, Messersmith, et al., 2008) also included randomized, controlled trials that did not meet selection criteria for this review. In addition to the four discussed above, we excluded three trials on anthracycline-based regimens that were reported only as meeting abstracts but without slides, audio or video available on the web to provide full access to presented data (Petruzelka, Pribylova, Vedralova, et al., 2000; Vera, Albanell, Lirola, et al., 1999; Arnould, Fargeot, Bonneterre, et al., 2003; Bonneterre, Roche, Kerbrat, et al., 2003). We also excluded one fully published study in which patients were not treated identically within each arm (Di Leo, Larsimont, Gancberg, et al., 2001) and a second fully published study on high-dose chemotherapy with autologous stem-cell transplant that did not report data by HER2 status separately for the conventional-dose arm (Rodenhuis, Bontenbal, van Hoesel, et al., 2006).

The Gennari and co-workers (2008) meta-analysis reports statistically significant improvement in DFS (six trials included) and OS (seven trials included) of HER2-positive patients given an anthracycline compared to the same outcomes for HER2-positive patients not given an anthracycline (HR for relapse=0.71, 95 percent CI: 0.61–0.83; p<.001; HR for death =0.73, 95 percent CI: 0.62–0.85; p<.001). In contrast, including an anthracycline apparently did not statistically significantly improve DFS or OS for patients with HER2-negative disease (HR for relapse=1.00, 95 percent CI: 0.90–1.11; p=.75; HR for death=1.03, 95 percent CI: 0.92–1.16; p=.60). The meta-analysis reported by Pritchard and co-workers (2008) included the same six trials for DFS and the same seven trials for OS, and reported identical pooled results (hazard ratios, confidence intervals) as those reported by Gennari and co-workers (2007). These analyses support the need for more definitive tests of the hypothesis that the balance of potential benefit versus harm of anthracyclines in HER2-negative patients may not justify their use. Furthermore, as discussed in Key Question 2 and in this section, future analyses and new studies should probably subdivide the HER2 negative group, and analyze subsets who are triple-negative (or “basal-like”) separately from those who are positive for one or both hormone receptors (luminal A or B).

Pritchard, Messersmith, Elavathil, et al. (2008) also reported a meta-analysis on DFS that included three randomized, controlled trials comparing higher-dose or intensity versus lower-dose or intensity anthracycline regimens: two are included here (Dressler, Berry, Broadwater, et al., 2005; Del Mastro, Bruzzi, Nicolo, et al., 2005), and one we excluded (Di Leo, Larsimont, Gancberg, et al., 2001). They found significant improvement of DFS at higher doses for HER2-positive patients (HR=0.54; 95 percent CI: 0.38–0.79) but not for HER2-negative patients (HR=0.98; 95 percent CI: 0.78–1.22). However, a test for the interaction of anthracycline regimen dose or dose intensity with HER2 status to predict DFS was not statistically significant. Thus, present evidence is too weak to support conclusions about HER2 status as a sole predictor of differences in outcome between higher- and lower-dose anthracycline-based regimens. Longer-term data on potential toxicities (particularly decreased ejection fraction and congestive heart failure) of the higher doses are also needed.

Pritchard, Messersmith, Elavathil, et al. (2008) reported on a final meta-analysis that pooled results on DFS from two randomized, controlled trials on adjuvant therapy (Hayes, Thor, Dressler, et al., 2007; Martin, Pienkowski, Mackey, et al., 2005) and one on neoadjuvant therapy (Learn, Yeh, McNutt, et al., 2005) that compared taxane-containing versus non-taxane-containing regimens. While all three trials were included in this systematic review, the validity of pooling them for meta-analysis seems uncertain. Postsurgical therapy in the Learn, Yeh, McNutt, et al. (2005) trial may have affected DFS and may not have been uniform in all three arms. The meta-analytic results suggest the magnitude of benefit from including a taxane in the regimen may be greater for HER2-positive patients (HR=0.60; 95 percent CI: 0.46–0.78) than for HER2-negative patients (HR=0.83; 95 percent CI: 0.71–0.98). However, these results also show statistically significant evidence of benefit for each group from including a taxane in the regimen. Thus, the evidence is presently too weak to support conclusions on HER2 status as a sole predictor of whether or not any subgroup of breast cancer patients benefits from paclitaxel therapy.

These meta-analyses were thorough and used appropriate methodologies. The difference in the trials included in the meta-analyses versus the current systematic review is due to varying prespecified inclusion and exclusion criteria, which are a matter of opinion. The main concern regarding the meta-analyses is their relevance to current practice. The current ASCO/CAP guidelines recommend a different approach to measuring HER2 status than used in the trials incorporated into the meta-analyses, which is why we chose not to perform a formal meta-analysis. Whether and how the change in measurement of HER2 status alters the results of the trials and meta-analyses is unknown since necessary data are unavailable.

Evidence on neoadjuvant chemotherapy. Available evidence on whether HER2 status affects rates of complete pathologic response (pCR) to neoadjuvant chemotherapy is limited to four uncontrolled series (retrospective analysis in three). Although two of four reported statistically significantly higher pCR rates in HER2-positive than HER2-negative patients, these data are too weak to conclude that the regimens tested are of no benefit to HER2-negative patients. Furthermore, data are lacking to directly compare any neoadjuvant regimens. Since a number of trials have already compared different neoadjuvant therapies, correlative studies using archived tissue samples may be useful. However, it is also possible that conclusions on relative benefits of different regimens from studies in the adjuvant setting may generalize to the neoadjuvant setting.

Evidence on chemotherapy for advanced disease. Evidence also is limited on differences by HER2 status for outcomes of chemotherapy for advanced or metastatic disease. Three randomized, controlled trials investigated different treatments: one studied paclitaxel alone (at different doses), one studied an anthracycline alone versus a taxane alone, and one studied an anthracycline plus cyclophosphamide versus an anthracycline plus a taxane. Small patient groups limited statistical power.

In summary, although present evidence is suggestive, it is too weak to determine in either the adjuvant, neoadjuvant, or metastatic disease settings, whether a more favorable balance of benefit versus risk from chemotherapy can be achieved by selecting patients for anthracycline- or taxane-based regimens based on HER2 status.

Research needs. Future trials that compare adjuvant chemotherapy regimens with versus without an anthracycline, or with versus without a taxane, could determine HER2 status at the time of diagnosis, and stratify randomization by HER2 assay results. This approach might provide more definitive tests for the hypotheses that neither an anthracycline nor a taxane improves outcomes of HER2-negative patients. Another possibility is for the EBCTCG to collect individual patient data on HER2 status using current scoring thresholds from all trials that compared adjuvant regimens with versus without an anthracycline, or with versus without a taxane. If sufficient tumor samples are available, this might be a more efficient and more definitive approach for testing hypotheses on the interaction of HER2 status with assigned treatment to predict outcome. Future analyses should also obtain more complete information on estrogen and progesterone receptor status of all patients. This would enable investigators to further subdivide the HER2-negative subset, so that triple-negatives (or those with “basal-like” breast cancer if gene array data were obtained) can be analyzed separately from the luminal A and B subtypes.

Key Question 3b

For breast cancer patients, what is the evidence on clinical benefits and harms of using HER2 assay results to guide selection of hormonal therapy?

Study Selection

Of the 219 articles retrieved for Question 3, 66 were assessed for potential relevance to Question 3b. Only six articles met the selection criteria. The primary reasons for article exclusion are as follows: not reporting outcomes identified in selection criteria; not reporting outcomes by HER2 status, nonidentical treatment of patients, measurement of HER2 status inconsistent with current specialty society recommendations; lack of primary data; or inclusion of only HER2-positive patients, only HER2-negative patients, or fewer than 20 HER2-positive cases.

Two of the studies that did not meet the selection criteria were by Berry, Muss, Thor, et al. (2000) and by Ellis, Coop, and Singh, et al. (2001). The first uses data from the CALBG 8541 trial, and data from this trial are included in the previous section on chemotherapy for breast cancer. It is excluded here because while the chemotherapy regimens were randomized across patients, the use of tamoxifen was not. Rather, tamoxifen was prescribed based on clinician preferences. Its use increased over time after recommendations for its use in ER-positive, postmenopausal women were released during the course of the trial and as the percentage of postmenopausal women recruited also rose. Although the study by Ellis, Coop, and Singh, et al. (2001) on the neoadjuvant use of letrozole versus tamoxifen reportedly affected clinical practice, it is excluded from this systematic review for two reasons: It reported on clinical response (breast palpation) rather than the more definitive pathological response, and it used a broader definition of HER2 positivity (IHC scores of 2+ and 3+ were designated as positive, without any further evaluation of IHC 2+ scores using FISH).

Four of the six studies that met selection criteria investigated outcomes of tamoxifen; while two others compared an aromatase inhibitor (letrozole or anastrozole) to tamoxifen (Tables 15 and 16). No studies on selective estrogen receptor modulators met selection criteria. Five of the studies were secondary analyses by HER2 status of randomized, controlled trials, while the sixth was a prospective, uncontrolled series. One of the secondary analyses addressed neoadjuvant therapy; four focused on adjuvant therapy; and the uncontrolled series reported on metastatic disease. None of these studies used trastuzumab for HER2-positive patients; studies addressing the use of trastuzumab were reviewed in Chapter 2.

Table 15. Hierarchy of evidence, KQ3b.

Table 15

Hierarchy of evidence, KQ3b.

Table 16. Summary study quality assessment, KQ3b.

Table 16

Summary study quality assessment, KQ3b.

The neoadjuvant study (von Minckwitz, Sinn, Raab, et al., 2007) was a secondary analysis of a randomized trial comparing a chemotherapy regimen (doxorubicin and docetaxel) with or without the addition of tamoxifen. The four secondary analyses of randomized trials of adjuvant therapy included comparisons of (1) letrozole versus tamoxifen (Rasmussen, Regan, Lykkesfeldt, et al., 2008; Mauriac, Keshaviah, Debled, et al., 2007); (2) anastrozole versus tamoxifen (Dowsett, Allread, Knox, et al., 2008); (3) tamoxifen plus radiotherapy versus radiotherapy alone (Knoop, Bentzen, Nielsen, et al., 2001); (4) tamoxifen versus no tamoxifen following mastectomy or breast-conserving surgery plus radiotherapy (Ryden, Jirstrom, Bendahl, et al., 2005). The study of metastatic disease (Arpino, Green, Allred, et al., 2004) was a prospective, uncontrolled series of HER2-positive or HER2-negative patients given tamoxifen. Study hierarchy, quality assessment, summary descriptions, and results are summarized in Tables 1519; detailed abstraction data can be found in Appendix Tables IIIb-AIIIb-K *.

Table 17. Summary design, enrollment and treatment, KQ3b.

Table 17

Summary design, enrollment and treatment, KQ3b.

Table 18. Summary time to event outcomes, KQ3b.

Table 18

Summary time to event outcomes, KQ3b.

Table 19. Summary tumor response and quality of life, KQ3b.

Table 19

Summary tumor response and quality of life, KQ3b.

Patient Characteristics

Patients in the von Minckwitz, Sinn, Raab, et al. (2007) neoadjuvant trial had unilateral primary breast carcinoma at least 3 cm in largest diameter with no distant metastases or inflammatory disease. They comprised 194 of the 250 patients in the GEPARDO [German Preoperative Adriamycin-Docetaxel] trial. The average age was 48 years and 51 percent (control [Cx] group) to 57 percent (tamoxifen [TAM] group) were premenopausal. Forty-seven percent (Cx) to 53 percent (TAM) had clinically positive lymph nodes, and all had a Karnofsky score of at least 70 percent. For hormone-receptor status, 53 percent (TAM) to 59 percent (Cx) were ER-positive, while 35 percent (TAM) to 44 percent (Cx) were PR positive. HER2 status was measured centrally using IHC, and a HercepTest™ score of 3+ was considered positive. About 24 percent of the participants were HER2 positive.

Patients in the Rasmussen, Regan, Lykkesfeldt, et al. (2008) study comprised 3,533 of the 4.922 patients in the monotherapy arms of the BIG 1–98 trial. They were postmenopausal with early stage invasive cancer. The median age was around 60 years, and about 37 percent (HER2-negative patients) to 45 percent (HER2-positive patients) had tumors larger than 2 cm. Fewer than half had positive lymph nodes (42 percent for HER2-negative pts; 47 percent for HER2-positive patients). The median estrogen receptor level was 85 for HER2-positive patients and 90 for HER2-negative patients (p<0.0001), while the median progesterone receptor level was 10 in HER2-positive patients and 70 in HER2-negative patients (p<0.0001). HER2 positivity was defined as amplification by FISH or HercepTest™ 3+ by IHC (in 0.5 percent of patients with no FISH result). Seven percent of the patient population was HER2-positive.

Patients in the Dowsett, Allread, Knox, et al. (2008) study comprised 1,782 of the 5,880 patients in the monotherapy arms of the ATAC trial; most were from the United Kingdom. Sixty-seven percent of the patients had prior radiotherapy; 9 percent, prior chemotherapy; and 3 percent, tamoxifen prior to surgery. The median age was 63 years; and all of the women were postmenopausal. Sixty-seven percent had tumors that were no larger than 2 cm; 66 percent had negative lymph nodes; and all were hormone receptor positive (78 percent were PR+). HER2-positivity was defined by a score of 3+ on IHC or 2+ on IHC plus FISH amplification. Ten percent of the patients in the study were HER2-positive.

Patients in the Knoop, Bentzen, Nielsen, et al. (2001) adjuvant study were postmenopausal with a median age of 66 years. They had a high risk of recurrence, defined as having positive axillary lymph node(s), tumor larger than 5 cm diameter, or skin/deep fascial involvement. Sixty-six percent of the patients were estrogen-receptor (ER) positive, and 43 percent, progesterone-receptor (PR) positive.

In the original randomized, controlled trial, the Danish Breast Cancer Cooperative Group's 77c protocol, patients were randomized to receive tamoxifen three times daily for a year or to observation. All patients were also treated with mastectomy, lower axillary lymph node dissection, and radiotherapy. In the secondary analysis, data on HER2 status were available on a subset (n=1,515, 88 percent) of those in the original trial. Eighteen percent of these patients were HER2 positive by IHC, but approximately 11 percent had IHC results roughly comparable to a 3+ score by HercepTest™*. However, the proportions of HER2-positive patients differed between the arms of the trial: 8 percent of patients in the tamoxifen arm were HER2 positive, while 14 percent of those in the control arm were HER2 positive (p=0.001).

Patients in the Ryden, Jirstrom, Bendahl, et al. (2005) and Ryden, Landberg, Stal, et al. (2007) adjuvant trial had Stage II invasive cancer and included 470 or the 564 patients in the original trial. The median age was 45 years, and all were premenopausal or younger than 50 years old. The median tumor size ranged from 22 in the control group to 25 in the tamoxifen group. Both hormone-receptor-positive and hormone-receptor-negative patients were included. Fifty-four percent of patients in the tamoxifen group and 57 percent of patients in the control group were ER positive and PR positive, respectively; 30 percent and 26 percent, were ER negative and PR negative, respectively; the remainder were either ER negative/PR positive or ER positive/PR negative. Approximately 70 percent of the patients had positive lymph nodes. Patients were randomized to tamoxifen for two years versus no tamoxifen. Patients also underwent mastectomy or breast-conserving surgery plus radiotherapy. Less than 2 percent of patients, evenly distributed across arms in the original trial, received additional chemotherapy (n=8) or goserelin (n=1).

Data on HER2 status were available on 428 patients, or 76 percent of the original trial participants. The authors reported that baseline prognostic factors were similar in the groups with and without archived pathological specimens available for the secondary analysis. HER2 status was measured by FISH, using a cutoff of six signals/tumor cell (13 percent of patients were HER2 positive) and by IHC using a cutoff of 3+ on the HercepTest™ (15 percent were HER2 positive). The correlation between IHC 3+ and FISH amplification was r=0.82 (p<0.001); κ=0.84.

Patients with metastatic disease in the Arpino, Green, Allred, et al. (2004) single-arm study were drawn from the Southwest Oncology Group's (SWOG) protocol 8228 and ancillary study 9314. Approximately 60 percent of the patients were younger than 65 years old, and approximately 14 percent were premenopausal. All patients were ER positive; 78 percent of the HER2-positive and 96 percent of the HER2-negative patients were PR positive. Patients received tamoxifen twice daily as first-line therapy until disease progression.

Data on HER2 status were available on 136 patients, or about 39 percent of the original study participants. HER2 status was measured by FISH with a cutoff of HER2/CEP17 ratio of 2 or more (24 percent of patients were HER2 positive) and by IHC with a cutoff of complete membrane staining in 10 percent or more of tumor cells (21 percent of patients were HER2 positive), but only the FISH results were used in this analysis.

Outcomes Reported and Followup

The outcome for the neoadjuvant study (von Minckwitz, Sinn, Raab, et al., 2007) was pathological complete response, and surgery was performed within 14–28 days after chemotherapy was completed. In the two studies on the BIG 1–98 trial, Mauriac, Keshaviah, Debled, et al. (2007) assessed time to early tumor recurrence (TETR), defined as a recurrence within 2 years, which was also the median followup; while Rasmussen, Regan, Lykkesfeldt, et al. (2008) reported on disease-free survival with a median followup of 51 months. In the comparison of anastrozole versus tamoxifen from the ATAC trial, Dowsett, Allread, Knox, et al. (2008) examined time to recurrence; the duration of followup was unclear, possibly 68 months. The only outcome reported in the Knoop, Bentzen, Nielsen, et al. (2001) adjuvant study was disease-free survival (DFS); the duration of followup was not reported, but the tables included estimates of DFS at 10 years. The Ryden, Jirstrom, Bendahl, et al. (2005) adjuvant trial only reported recurrence-free survival (RFS) and had 14 years; median followup for patients without a breast cancer event. The Arpino, Green, Allred, et al. (2004) uncontrolled study on metastatic disease reported overall response rates (ORR; sum of complete plus partial responses), time to failure (TTF), and overall survival (OS). “Nearly all” of the tumor blocks were more than 10 years old; some were more than 20 years old.

Results by Hierarchy Level, Study Quality Assessment

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, post-hoc multivariate subgroup analysis. Five of the six studies that met the selection criteria were post-hoc analyses of randomized controlled trials. The only neoadjuvant study compared pathological tumor response in patients receiving doxorubicin and docetaxel with or without tamoxifen (von Minckwitz, Sinn, Raab, et al., 2007). The pCR rate among ER-positive and HER2-positive patients was 0 percent for those receiving tamoxifen versus 9 percent for those not receiving it; among HER2-negative patients the corresponding numbers were 24 percent and 21 percent. The numbers were small, however. There were only 25 ER-positive and HER2-positive patients, with 1 pCR, while there were 61 ER-positive but HER2-negative patients, with 14 pCRs. In a multivariate logistic regression model including menopausal status, tumor size, grade, and nodal status, the odds ratio for HER2 was 3.66 (95 percent CI: 0.69–19.30, p=.126). Analysis of the interaction term between HER2 status and treatment group was not reported. Consequently, the study confirms that the prognosis is poorer in HER2-positive patients, but it does not indicate whether or not tamoxifen is more or less effective in HER2-positive versus HER2-negative patients.

Two studies compared the use of an aromatase inhibitor versus tamoxifen. In secondary analyses of the BIG 1–98 trial, disease-free survival and time to early tumor recurrence were examined. Rasmussen, Regan, Lykkesfeldt, et al. (2008) reported a hazard ratio of letrozole versus tamoxifen among HER2-positive patients of 0.62 (95 percent CI: 0.37–1.03) and among HER2-negative patients of 0.72 (95 percent CI: 0.59–0.87). While the numerical values of the hazard ratios are similar, the result for HER2-negative patients is statistically significant, while that for HER2-positive patients is not. The number of HER2-positive patients is 239, much smaller than the 3,294 HER2-negative patients. Mauriac, Keshaviah, Debled, et al. (2007) report that the time to early tumor recurrence does not appear to be statistically significantly different by treatment group in either HER2-positive or HER2-negative patients, and the HER2 status/treatment group interaction term in a multivariate analysis is not statistically significant. Consequently, this study suggests that letrozole increases disease-free survival among HER2-negative patients relative to tamoxifen, but it does not provide evidence on a greater effect among HER2-positive patients.

In the secondary analysis of the ATAC trial, Dowsett, Allread, Knox, et al. (2008) compare the effect of anastrozole and tamoxifen by HER2 status. They examine time to treatment recurrence by HER2 status and report hazard ratios of HER2-negative versus HER2-positive patients of 2.25 (p=.0018) for anastrozole and 3.27 (p<.0001) for tamoxifen. These results demonstrate that HER2-positive patients have a poorer prognosis than HER2-negative patients but do not compare the effectiveness of each treatment within each HER2 group. The authors report that there is “no indication of a greater differential of anastrozole over tamoxifen in the HER-2-positive patients. However, there were only 44 events in the HER-2-positive group, so the CIs are wide.” No further details of the analysis are provided. In the multivariate analysis, no analysis of an interaction term between HER2 status and treatment group is reported.

Two studies compared patients treated with tamoxifen versus a control group; they both included a multivariate analysis. Table 20 summarizes results reported by Knoop, Bentzen, Nielsen, et al. (2001) from their secondary analysis on outcomes of adjuvant tamoxifen by HER2 status in hormone-receptor-positive patients. The results showed that patients who were HER2 negative or low HER2 positive had statistically significantly longer disease-free survival when they were treated with tamoxifen; the difference in survival (with versus without tamoxifen) was not statistically significant for patients that were high HER2 positive.

Table 20. Summary results for DFS in Knoop, Bentzen, Nielsen, et al. (2001).

Table 20

Summary results for DFS in Knoop, Bentzen, Nielsen, et al. (2001).

A multivariate Cox model was constructed that included tumor size, proportion node positive, histologic grade, p53 value, EGFR, HER2, tamoxifen, and interactions between tamoxifen and p53, HER2 and EGFR. The coefficients for HER2 and for the interaction term for HER2 and tamoxifen were not statistically significant (specific p values and coefficients not reported for these variables). Node positive proportion (RR=1.011), grade (RR=1.103), p53 (1.54), and tamoxifen (RR=0.73) were statistically significant at p<.01. In other words, after controlling for other variables, HER2 was not a statistically significant predictor for outcomes of treatment with tamoxifen in this study.

The results of the secondary analysis of the adjuvant trial by Ryden, Jirstrom, Bendahl, et al. (2005) are summarized in Table 21. All patients were ER positive. No result was statistically significant.

Table 21. Summary results for RFS in Ryden, Jirstrom, Bendahl, et al. (2005).

Table 21

Summary results for RFS in Ryden, Jirstrom, Bendahl, et al. (2005).

The authors also reported that among untreated patients, the difference in outcome between HER2-positive and HER2-negative patients (measured with either IHC or FISH; in both univariate and multivariate Cox proportional hazard models) was not statistically significant. In contrast, the marker VEGFR2 was a statistically significant predictor of outcome of tamoxifen treatment. In a univariate analysis among ER-positive/PR-positive patients with HER2 status measured using IHC, the duration of RFS was longer among tamoxifen-treated patients than controls in the HER2-negative subgroups (p=.03) but not among HER2-positive (p=.3) patients.

In a multivariate Cox model, the interaction term between treatment (tamoxifen versus control) and HER2 status was not statistically significant when the model was run for ER-positive patients (p=.4) or ER-positive/PR-positive patients (p=.3). The covariates in the model were not clearly listed but probably included age, tumor size, nodal status, Nottingham histologic grade, tamoxifen, and the interaction term.

Randomized trial, treatment by HER2 subgroup analysis. No studies of this type were identified.

Single-arm study, prespecified multivariate analysis. No studies of this type were identified.

Single-arm study, post-hoc multivariate analysis. The prospective but uncontrolled study on use of tamoxifen for metastatic disease by Arpino, Green, Allred, et al. (2004) compared outcomes for HER2-positive versus HER2-negative patients. ORR was 56 percent for HER2-negative patients and 47 percent for HER2-positive patients (χ2 test, p=NS). Median TTF was 7 months for HER2-negative patients versus 5 months for HER2-positive patients (log rank p=.007). Finally, median OS was 31 months for HER2-negative patients versus 25 months for HER2-positive patients (log rank p=.07). While all of the patients were ER positive, median ER levels were lower in HER2-positive than in HER2-negative patients.

Multivariate, partially nonparametric Cox models for TTF and OS included menopausal status, disease-free interval, ER and PR levels, HER1 status, and HER2 status. HER2-positive status was not a statistically significant predictor of either TTF or overall survival. HER1 status, premenopausal status, and disease-free interval before recurrence were statistically significant predictors of TTF, while ER and PR levels and disease-free interval prior to recurrence were significant predictors of OS. The hazard ratios for HER2-positive versus HER2-negative subgroups were 1.15 (p=.54) for TTF and 0.99 (p=.97) for OS. Therefore, after controlling for other factors, this study provided no evidence of a difference in outcomes after treatment with tamoxifen between HER2-positive and HER2-negative patients.

Single-arm study, univariate analysis. No studies of this type were identified.

Conclusions, Key Question 3b

The evidence on use of HER2 status to predict outcomes of hormonal therapy is weak and inconclusive. Four studies reviewed here addressed use of tamoxifen in different breast cancer patient populations; two compared tamoxifen with aromatase inhibitors. Evidence is lacking from the most informative types of studies, trials in which randomization is stratified by HER2 status or randomization to therapy directed by HER2 results or not. Less-informative designs were used, including post-hoc multivariate analyses in five randomized trials and one post-hoc multivariate analysis in a single-arm study. In comparing tamoxifen with aromatase inhibitors in a secondary analysis of randomized, controlled trial results, the most persuasive finding would be a significant interaction term between HER2 status and treatment group, after controlling for other important prognostic factors.

In the two comparison studies included, one had an insignificant interaction term (suggesting that there is no differential in the impact of the two treatments based on a patient's HER2 status), and the other did not report an interaction term although they included a qualitative statement that there was no evidence that one treatment was more effective than the other in HER2 positive patients. Some results suggest that tamoxifen may be more effective among HER2-negative patients, but a conclusion is undermined by the paucity of studies and inconsistent findings. Importantly, data demonstrating a difference in magnitude of benefit by HER2 status would not by themselves be sufficient to conclude there is no benefit in HER2-positive patients also positive for hormone receptors. Studying the differential impact of hormonal therapy by HER2 status is hindered by the inverse relationship between HER2 status and hormone receptor status, which leads to relatively small numbers of HR-positive and HER2-positive patients on which to base the results.

Key Question 4

What is the evidence that monitoring serum or plasma concentrations of HER2 extracellular domain in patients with HER2-positive breast cancer predicts response to therapy, or detects tumor progression or recurrence, and if so, what is the evidence that decisions based on serum or plasma HER2 assay results improve patient management and outcomes?

Study Selection

Studies were included for Key Question 4 if they were:

  • randomized trials, prospective single-arm studies, or retrospective series of identically treated patients; that
  • measure serum or plasma HER2 concentrations in breast cancer patients, either at baseline or at multiple time points; and either:
    a.

    associate baseline values or changes in HER2 concentration with one or more outcomes of interest (primary or secondary); or

    b.

    compare outcomes of treatment decisions based on assay results with outcomes of decisions made in absence of assay results.

Of 15 studies meeting selection criteria, five were randomized trials and 10 single-arm designs. One of the randomized trials compared three different doses of a single selective estrogen receptor modulator, droloxifene (Yamauchi, O'Neill, Gelman, et al., 1997). Since the range of doses assessed in the trial do not produce different results, the data pooled across dosing groups will be treated as a single-arm design, therefore, four randomized trials and 11 single-arm designs are presented in separate summary tables; detailed abstraction data can be found in Appendix Tables IV-AIV-K *. All but one study meeting study selection criteria addressed subgroup analyses of baseline sHER2 measurements to predict outcomes after treatment. The study reported by Fornier, Seidman, Schwartz, et al. (2005) was the only one that focused on changes in serial measurements. No studies meeting selection criteria addressed whether serial sHER2 measurements confer lead time compared with other monitoring techniques.

Patient Characteristics

Randomized trials. Two of the four trials (Table 22) selected patients with metastatic breast cancer undergoing first-line systemic therapy. The comparisons in these two trials were paclitaxel with or without trastuzumab, and epirubicin with either paclitaxel or cyclophosphamide. The third trial included postmenopausal patients with locally advanced (stage IIIB), locoregionally recurrent or metastatic breast cancer randomized to either letrozole or tamoxifen. The fourth trial selected patients with locally advanced or metastatic breast cancer given capecitabine with or without lapatinib as second-line treatment after progression following treatment with an anthracycline, a taxane and trastuzumab. A total of 1,153 patients were included in these trials, with individual samples sizes ranging from 101 to 562.

Table 22. Randomized trials, design, treatment, patient characteristics, KQ4.

Table 22

Randomized trials, design, treatment, patient characteristics, KQ4.

Two of the randomized trials selected patients for being positive on tissue (t) HER2 testing. Gasparini, Gion, Mariani, et al. (2007) selected patients with 2+ or 3+ scores on the IHC HercepTest™. Cameron, Casey, Press, et al. (2008) included patients who were 3+ on IHC or 2+ with a positive FISH result. Muller, Witzel, Luck, et al. (2004) performed tissue testing on only 29 of 103 patients and only nine patients had 3+ results by Dako-style scoring of an IHC assay using the CB11 mAb. No tHER2 results were reported for Lipton, Ali, Leitzel, et al. (2003).

Patient characteristics were reported in various ways. Only age was reported by all four studies. Baseline data in the two treatment groups in the Muller, Witzel, Luck, et al. (2004) trial were combined; median age was 48 years. In the Gasparini, Gion, Mariani, et al. (2007) and Cameron, Casey, Press, et al. (2008) trials, median ages by treatment group were in the low and mid-50s and in the Lipton, Ali, Leitzel, et al. (2003) study median ages were in the mid-60s.

The proportion of patients with three or more disease sites was 27 percent in the Gasparini, Gion, Mariani, et al. (2007) study, 49 percent in the Cameron, Casey, Press, et al. (2008) trial and 10 percent and 11 percent of the two treatment groups studied by Lipton, Ali, Leitzel, et al. (2003).

Gasparini, Gion, Mariani, et al. (2007) used the ECOG performance status scale, finding that 82 percent and 81 percent had the highest level (0). Cameron, Casey, Press, et al. (2008) reported that 62 percent and 59 percent were at ECOG level 0. Median Karnofsky Performance Scale values were 90 in both groups included by Lipton, Ali, Leitzel, et al. (2003).

In the study by Gasparini and co-workers, 37 percent were both estrogen and progesterone-receptor positive, while the proportions for the twp groups from Lipton and co-workers' study was 38 percent and 40 percent, respectively. Muller, Witzel, Luck, et al. (2004) only noted that 61 percent were estrogen-receptor positive. Cameron, Casey, Press, et al. (2008) reported the proportions of patients in the two groups who were either positive on one or both receptors: 48 percent and 46 percent.

Single-Arm Designs. All 11 studies selected patients with metastatic breast cancer (Summary Table 23). The total number of patients across studies is 706; individual sample sizes ranged from 35 to 94. Treatments were first-line systemic therapy in six studies, second-line in one study, second- or third-line in one study, second-line or higher in one study and a mix of first- and second-line or higher in two studies. Regimens in six studies were taxane-based (two with anthracyclines, two with trastuzumab); one study combined trastuzumab with vinorelbine, one study used the aromatase inhibitor letrozole, one study used the selective estrogen receptor modulator droloxifene, and three studies used other chemotherapy regimens.

Table 23. Single-arm studies, design, enrollment and treatment, KQ4.

Table 23

Single-arm studies, design, enrollment and treatment, KQ4.

Two studies selected patients who were tHER2 3+ on IHC or positive on FISH. Five studies included mixed patient populations that were positive and negative on HER2 tissue testing (Colomer, Llombart-Cussac, Lloveras, et al., 2007; Colomer, Montero, Lluch, et al., 2000; Im, Kim, Lee, et al., 2005; Fornier, Seidman, Schwartz, et al., 2005; Sandri, Johansson, Colleoni, et al., 2004). The remaining four studies did not provide data on tissue HER2 testing (Yamauchi, O'Neill, Gelman, et al., 1997; Colomer, Llombart-Cussac, Lluch, et al., 2004; Luftner, Henschke, Flath, et al., 2004; Colomer, Llombart-Cussac, Tusquets, et al., 2006).

Regarding age, one study had a median age of 48 years, another had a median of 49 years. One study had 53 percent at age 64 or older, another had a median age of 64 years and a third had mean ages in sHER2 positive and negative groups of 63 and 64 years. The other 6 studies had median ages in the 50s.

Nine studies gave the distribution of patients by number of disease sites and one study gave the number of involved organs (43 percent had three or more involved organs). In seven studies, the percentage of patients with three or more disease sites ranged from 18 percent to 43 percent; in another study all patients had two or fewer disease sites. Four studies provided average number of disease sites: the medians were two in two studies and three in two studies.

Four studies provided ECOG performance status data: the percentages in categories 0 or 1 (better performance status) were 75, 98, 98, and 88 percent. Two studies used the Karnofsky Performance Scale: in one study the mean value was 90 percent and in the other 83 percent were at 80 percent or 90 percent on the scale.

Seven studies gave baseline information on hormone receptor status, 4 of which reported the proportion of patients those estrogen positive, ranging from 49 percent to 67.3 percent. One study gave the proportion progesterone positive (34 percent). Two studies gave percentages of different combinations of hormone receptor status: the proportions who were both estrogen and progesterone positive were 17 percent and 37 percent; the proportions who were either estrogen or progesterone positive were 34 percent or 18 percent.

Evidence Hierarchy and Quality Assessment

No studies conducted stratified randomization on sHER2 status or randomized patients to whether sHER2 guided treatment (Tables 24 and 25) and only one performed prespecified subgroup analyses (Gasparini, Gion, Mariani, et al., 2007). Three randomized trials reported results from post-hoc treatment by sHER2 subgroup analyses (Cameron, Casey, Press, et al. 2008; Muller, Witzel, Luck, et al., 2004; Lipton, Ali, Leitzel, et al., 2003). Two single-arm studies included multivariate analyses (Colomer, Montero, Lluch, et al., 2000; Yamauchi, O'Neill, Gelman, et al., 1997). Overall, the bulk of studies (7 of 13) belonged to lowest category of the hierarchy.

Table 24. Hierarchy of evidence, KQ4.

Table 24

Hierarchy of evidence, KQ4.

Table 25. Study quality assessment, KQ4.

Table 25

Study quality assessment, KQ4.

Results by Hierarchy Level

Multivariate analysis was performed in only three studies: one randomized trial (Gasparini, Gion, Mariani, et al., 2007) and two single-arm designs (Colomer, Montero, Lluch, et al., 2000; Yamauchi, O'Neill, Gelman, et al., 1997). Summary study descriptions and results are arrayed in Tables 2629.

Table 26. Randomized trials, summary time to event outcomes, KQ4.

Table 26

Randomized trials, summary time to event outcomes, KQ4.

Table 27. Randomized trials, summary tumor response, KQ4.

Table 27

Randomized trials, summary tumor response, KQ4.

Table 28. Single-arm studies, summary time to event outcomes, KQ4.

Table 28

Single-arm studies, summary time to event outcomes, KQ4.

Table 29. Single-arm studies, summary tumor response, KQ4.

Table 29

Single-arm studies, summary tumor response, KQ4.

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. The only trial that performed a prespecified multivariate subgroup analyses was Gasparini, Gion, Mariani, et al. (2007, n=123 patients given first-line treatment by paclitaxel with or without trastuzumab for metastatic breast cancer). One quality concern was uncertainty over whether sHER2 results were scored blindly to outcome. Also, this study addressed 11 predictor variables plus treatment interaction terms in logistic and Cox regression analyses, however there appeared to be too few events in terms of response and progression to support models with so many variables. Thus, the study was not large enough for the type of modeling used. Overall, it is unclear whether the multivariate analysis was well-conducted. It is unclear how candidate variables were selected, what model-building strategy was used, whether assumptions were tested, whether the standard metastatic breast cancer prognostic factors were included in final models, how continuous variables were categorized; also, the model did not appear to go through validation.

For time-to-progression, the Cox regression treatment by sHER2 interaction was nearly statistically significant (p=0.0538). Among patients with elevated sHER2 values, results significantly favored paclitaxel plus trastuzumab, while in those with normal sHER2, results nonsignificantly favored paclitaxel alone. Logistic regression analysis of overall response rate showed no significant treatment by sHER2 interaction (p=.6044); in both groups, combination treatment was favored, but not significantly.

Randomized trial, post-hoc multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, treatment by HER2 subgroup analysis. Among three randomized trials that described treatment by sHER2 subgroup analyses, Muller, Witzel, Luck, et al. (2004) reported on a subset of 101 patients with serum available, out of 597 patients (17 percent) randomized to epirubicin plus either paclitaxel (ET) or cyclophosphamide (EC). This study was a retrospective analysis of previously reported randomized trial. These authors found within the ET group a trend for worse overall survival for sHER2 positive patients (p=.092), but no significant difference between sHER2 groups receiving EC. Regarding progression-free survival, outcomes for the two treatments did not differ among the sHER2 negative, but results were significantly worse for EC among those sHER2 positive. For overall response rate, sHER2 groups did not differ among those receiving ET, but those getting EC had worse results when sHER2 was positive. No test for treatment by sHER2 interaction was reported.

These results should be viewed cautiously because the analyzed subset comprised less than 20 percent of those originally randomized and multivariate analysis was not used to adjust for any imbalances between treatments by sHER2 subgroups. Additionally, it is unclear sHER2 results were scored blindly with respect to outcome.

Lipton, Ali, Leitzel, et al. (2003) addressed 562 postmenopausal women given either letrozole or tamoxifen as first-line therapy for advanced breast cancer. This retrospective analysis included 62 percent of all patients randomized in the trial; however, this is the only randomized trial that used blinded assessment of sHER2 in relation to outcome. Results were better in terms of time-to-progression and time to treatment failure for those receiving letrozole, regardless of sHER2 status. For overall response rate and rate of clinical benefit (overall response plus stable disease), letrozole was significantly better than tamoxifen for sHER2 negative patients, but not for those sHER2 positive. No tests of treatment by sHER2 interaction were reported.

Cameron, Casey, Press, et al. (2008) randomized 399 patients with tissue HER2 positive locally advanced or metastatic breast cancer to receive capecitabine with or without lapatinib. Exploratory analyses of the relation between sHER2 status and progression-free survival were conducted in 92 percent of those randomized. When sHER2 was divided into the highest quartile versus other quartiles, both sHER2 subgroups had significantly better progression-free survival when treated with capecitabine plus lapatinib compared to capecitabine alone. This study did not describe the sHER2 assay methods clearly, did not report that sHER2 was scored blind to outcome and used an uncommon threshold for sHER2 positivity. No test of treatment by sHER2 status was reported.

Randomized trial results summary. The methodologic quality of these randomized trials is generally poor. Only one randomized trial was conducted with a prespecified plan to assess the relation of sHER2 to outcome. The same trial was the only one that conducted multivariate analyses, however it appeared to have too few events to support the large number of predictor and interaction terms used and the modeling techniques were overall poorly described. The other three trials performed retrospective treatment by sHER2 subgroup analyses of 17 percent, 62 percent and 93 percent of patients originally enrolled. Only one study used blinded assessment of sHER2 in relation to outcome.

These four randomized trials each addressed a different comparison of treatments. The only study that tested treatment by sHER2 status interactions found them to be nonsignificant for TTP and ORR in a comparison of paclitaxel with and without trastuzumab. A comparison of epirubicin either with paclitaxel or cyclophosphamide did not consistently find sHER2 to be related to different treatment outcomes (OS, PFS, ORR). A trial comparing letrozole and tamoxifen found sHER2 to be a more consistent predictor of treatment outcome for TTP and TTF, less so for ORR and clinical benefit. A trial of capecitabine with or without lapatinib found better PFS for those receiving combination treatment for both those in the highest quartile and lower quartiles of sHER2 values. Only the Gasparini, Gion, Mariani, et al. (2007) trial, which analyzed nearly all patients randomized, used multivariate methods, while the other two trials used univariate analyses of much smaller subsets of those randomized.

Single-arm study, multivariate analysis. Among three single-arm studies that conducted multivariate analysis, Colomer, Llombart-Cussac, Lloveras, et al. (2007) included 226 patients with metastatic breast cancer who received letrozole. The authors prespecified their interest in assessing the relation between sHER2 status and treatment outcomes; however they provided inadequate detail in describing Cox regression methods such as selection of candidate variables, model-building strategy, testing of assumptions, forcing of standard prognostic variables and handling of continuous variables. It is unclear if sHER2 results were scored blind to outcomes and validation of the final model was not mentioned. The multivariate analysis found sHER2 and ECOG performance status to be significant independent predictors of time to progression.

Colomer, Montero, Lluch, et al. (2000) included 55 patients with metastatic disease who were receiving first-line doxorubicin and paclitaxel. Of the 77 patients originally enrolled in this Phase II study, 75 percent had evaluable serum samples. The plan to assess the relation between sHER2 and outcome was prespecified in this study; however, the multivariate logistic and Cox regression techniques were poorly described. It is unclear how candidate variables were selected, what model-building strategy was used, whether assumptions were tested, whether final models included all standard prognostic variables and whether continuous variables were well handled. Furthermore, models did not appear to be validated and it is unclear if sHER2 was scored blindly to outcome. In the logistic regression of response, there were only 39 events, but six variables entered into the multivariate model (more than the recommended one variable per greater than 10 events). A similar problem existed for the Cox regression of response duration. These authors found elevated sHER2 to be significantly associated with poorer results on response duration and overall response rate, in both univariate and multivariate analyses.

The study by Yamauchi, O'Neill, Gelman, et al. (1997) was originally a randomized comparison of three doses of droloxifene as first-line hormonal therapy. Of the 369 patients randomized, 94 were included in this retrospective analysis (25 percent). Logistic regression of overall response and Cox regression of time-to-progression and overall survival all used the stepwise model building strategy, a method with major weaknesses. The description of modeling methods was poor, lacking details on: candidate variable selection, whether assumptions were tested, whether final models included standard prognostic variables and whether continuous variables were well handled. The article did not make clear whether sHER2 results were scored blindly to outcome. Multivariate analyses entered dose into models but was not retained, suggesting similar results by different doses and dose groups were pooled. After adjustment for other variables, this study found consistently worse results for sHER2 positive patients on time to progression, overall survival and overall response rate.

Single-arm study, univariate analysis. These studies reported on 55 patients or fewer. With the exception of the study by Esteva, Valero, Booser, et al. (2002), positive sHER2 results were associated with worse outcomes. The lack of multivariate analyses in these studies makes these findings of limited use for guiding treatment decisions. These studies could be described as exploratory, hypothesis-generating investigations that might inform future, more sophisticated studies.

Single-arm study results summary. This body of evidence is quite heterogeneous with respect to treatment regimens, outcomes assessed, and definitions of elevated sHER2. Only three of 11 studies conducted multivariate analyses, but the modeling methods were poorly described. Evidence from single-arm series more often shows that sHER2 status predicts outcomes among patients treated, however, there were several instances in which it was nonpredictive and one study found better response among those with elevated sHER2 in conflict with all other studies.

Conclusions, Key Question 4

The evidence is weak on whether sHER2 predicts outcome after treatment with any regimens in any setting. Evidence primarily focused on first-line or second- and subsequent-line treatment of metastatic disease using variety of regimens. Furthermore these studies used different thresholds for a positive sHER2 result and varied on whether patient selection required positive tissue HER2 status. There were only four randomized trials and only one used multivariate analysis, while three single-arm studies performed multivariate analysis. The quality of reporting on multivariate analyses lacked sufficient detail. Univariate analyses provide very limited information value, suggesting candidate variables for future multivariate analyses. These studies do not support clear conclusions for whether sHER2 predicts disease progression, treatment response, or outcomes of any specific treatment regimen.

Key Question 5

In patients with ovarian, lung, prostate, or head and neck cancers, what is the evidence that:

a.

testing tumor tissue for HER2; or

b.

monitoring serum or plasma concentrations of HER2;

either predicts response to therapy, or detects tumor progression or recurrence; and if so, what is the evidence that decisions based on her2 assay results improve patient management and outcomes?

Study Selection

Studies were included for Key Question 5 if they were:

  • randomized trials, prospective single-arm studies, or retrospective series of identically treated patients; that
  • measured HER2 in tumor tissue, serum, or plasma from patients with ovarian, lung, prostate, or head and neck cancers, and either:
    a.

    associated HER 2 status from tissue assays, or baseline values or changes in serum or plasma HER2 concentration, with one or more outcomes of interest (primary or secondary; see above); or

    b.

    compared outcomes of treatment decisions based on tumor HER2 status, or serum or plasma assay results, with outcomes of decisions made in absence of test results.

Part I. Lung Cancer

Overview. A total of 13 studies met study selection criteria (total N=1,500 patients). The study by Krug, Miller, Patel, et al. (2005) was originally a randomized comparison of trastuzumab plus either docetaxel or paclitaxel, but which combined the two treatment arms. Thus, the Krug and co-workers study is treated as a single-arm design and is presented with 12 other single-arm studies. Study hierarchy, quality assessment, summary descriptions, and results are summarized in Tables 3034; detailed abstraction data for all parts of Key Question 5 can be found in Appendix Tables V-AV-RR *.

Table 30. Hierarchy of evidence, KQ5, lung cancer.

Table 30

Hierarchy of evidence, KQ5, lung cancer.

Table 31. Study quality assessment, KQ5, lung cancer.

Table 31

Study quality assessment, KQ5, lung cancer.

Table 32. Single-arm studies: summary design, treatment, patient characteristics, KQ5, lung cancer.

Table 32

Single-arm studies: summary design, treatment, patient characteristics, KQ5, lung cancer.

Table 33. Single-arm studies, summary time to event outcomes, KQ5, lung cancer.

Table 33

Single-arm studies, summary time to event outcomes, KQ5, lung cancer.

Table 34. Single-arm studies, summary tumor response, KQ5, lung cancer.

Table 34

Single-arm studies, summary tumor response, KQ5, lung cancer.

Study populations. All studies were single-arm designs that included patients with non-small cell lung cancer (NSCLC). Of the 13 studies, 5 addressed the use of surgery without adjuvant treatments. Four of these studies included early stage (I or II) patients (Koukourakis, Giatromanolaki, Guddo, et al., 2000; Koukourakis, Giatromanolaki, O'Byrne, et al., 1999; Saad, Liu, Han, et al., 2004; Pelosi, Del Curto, Dell'Orto, et al., 2005) and the fifth study included a range of patients across stages I–IV (Pfeiffer, Clausen, Andersen, et al., 1996). The eight studies of systemic or multimodality treatments included patients with locally advanced, recurrent or late stage (III-IV) disease. Eight studies report summary age data; all average age values (means or medians) were in the 50s and 60s. Five studies examined outcomes of treatment with gefitinib for advanced NSCLC (Cappuzzo, Ligorio, Janne, et al., 2007; Daniele, Macri, Schena, et al., 2007; Cappuzzo, Gregorc, Rossi, et al., 2003; Hirsch et al., 2005; Cappuzzo, Varella-Garcia, Shigematsu, et al., 2005). Two studies gave combination chemotherapy regimens that included trastuzumab and a taxane to patients with advanced NSCLC (Krug, Miller, Patel, et al., 2005; Langer, Stephenson, Thor, et al., 2004). The remaining study offered multi-modality therapy (chemotherapy, surgery and radiotherapy) to patients with stage IIIA NSCLC (Graziano, Kern, Herndon, et al., 1998).

Results by hierarchy level, study quality assessment

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, post-hoc multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, treatment by HER2 subgroup analysis. No studies of this type were identified.

Single-arm study, prespecified multivariate analysis. No studies of this type were identified.

Single-arm study, post-hoc multivariate analysis. Of the 13 studies, four conducted multivariate analyses, none of which was prespecified. Two multivariate analyses addressed surgery for early stage NSCLC (Koukourakis, Giatromanolaki, O'Byrne, et al., 1999; Saad, Liu, Han, et al., 2004) and two that offered gefitinib to patients with advanced NSCLC (Cappuzzo, Varella-Garcia, Shigematsu, et al., 2005; Hirsch, Varella-Garcia, McCoy, et al., 2005). Among the surgical series, the retrospective series by Koukourakis and co-workers found that HER2 in univariate analysis was not associated with overall survival and was, therefore, not entered into a multivariate model. This multivariate analysis was generally poorly described. Details were lacking about how candidate variables were selected, how models were constructed, whether assumptions were tested, whether standard prognostic factors were include in final models and whether continuous variables were well handled. In addition, no mention was made of model validation. Saad and co-workers analyzed two separate retrospective groups of 50 surgical patients with adenocarcinoma (AC) and bronchioalveolar carcinoma (BAC) of the lung. For both subgroups, HER2 was significant univariate and multivariate predictor of overall survival, however very few details were provided for the multivariate analyses. The Saad, Liu, Han, et al. (2004) study is the only multivariate analysis which clearly used an assessor of HER2 results who was blinded to outcome.

Among prospective gefitinib series, Hirsch, Varella-Garcia, McCoy, et al. (2005) found that HER2 was not a significant univariate predictor of overall survival and was not entered in the multivariate model; HER2 was also not associated with response. These authors used a stepwise multivariate model selection procedure, a weak method, and otherwise provided few details about analytic techniques. Hirsch and co-workers also found that FISH-negative patients had a higher response rate than FISH-positive patients in univariate analysis. The Cappuzzo, Varella-Garcia, Shigematsu, et al. (2005) study of gefitinib reported a significant univariate association with overall response rate that nearly achieved statistical significance in multivariate analysis. Univariate analyses by Cappuzzo and co-workers (2005) of overall survival and time to progression appeared significant, but the article was flawed with discrepancies in reporting of results, and poor reporting of multivariate analysis methods.

Single-arm study, univariate analysis. Nine single-arm studies conducted univariate analyses of the association between HER2 status and outcomes; five were prospective designs and 4 were retrospective. Three studies addressed use of surgery alone as treatment (Pelosi, Del Curto, Dell'Orto, et al., 2005; Koukourakis, Giatromanolaki, Guddo, et al., 2000; Pfeiffer, Clausen, Andersen, et al., 1996), 3 studies gave patients gefitinib (Cappuzzo, Ligorio, Janne, et al., 2007; Daniele, Macri, Schena, et al., 2007; Cappuzzo, Gregorc, Rossi, et al., 2003), two studies involved trastuzumab-based combination regimens (Krug, Miller, Patel, et al., 2005; Langer, Stephenson, Thor, et al., 2004) and one study used multimodality therapy entailing chemotherapy, surgery and radiotherapy (Pfeiffer, Clausen, Andersen, et al., 1996). The Krug, Miller, Patel, et al. (2005) study was originally a randomized trial of trastuzumab plus either docetaxel or paclitaxel. Since no difference in efficacy was seen between the two taxane groups, Krug and co-workers combined arms to assess the relation between HER2 and outcome; thus, this study is treated for purposes of this analysis as a single-arm design. None of these seven studies reported significant associations between HER2 and overall survival, disease-free survival, progression-free survival, time-to-progression or overall response rate.

Evidence summary-lung cancer. Overall, the evidence on the relation between HER2 and outcome for treatment of lung cancer is weak and heterogeneous. No randomized studies have analyzed whether there are HER2 by treatment effect interactions. Of 13 single-arm studies, only 4 were multivariate analyses. All four multivariate analyses were poorly described, and none were prespecified, thus it is unclear if they were well-conducted. The two multivariate studies of surgery for early stage NSCLC found conflicting results (one study suggesting HER2 is predictive, one study did not). Similar results were found in the two multivariate studies of gefitinib. Seven studies were univariate analyses of single-arm studies. Univariate analyses provide very limited information value, at best suggesting candidate variables for future multivariate analyses. Future research should place studies at higher levels in the evidence hierarchy. The body of evidence is not promising, with mixed results among post-hoc multivariate analyses and lack of significant findings among univariate studies.

Part II. Ovarian Cancer

Overview. A total of seven studies met study selection criteria (578 patients). The study by Camilleri-Broet, Hardy-Bessard, Le Tourneau, et al. (2004) was originally a randomized trial comparing cisplatin, epirubicin and one of two doses of cyclophosphamide. Efficacy results did not differ between cyclophosphamide dose groups, so results of the two arms were combined in this retrospective analysis. This study is, therefore, treated as a single-arm design using a retrospective (post-hoc) multivariate analysis. A randomized trial by Malamou-Mitsi, Crikoni, Timotheadou, et al. (2007) is similarly treated as a single-arm design, in which two groups treated with paclitaxel and platinum compounds are pooled in a retrospective multivariate analysis. The third multivariate analysis of a single-arm study was prespecified by Di Leo, Bajetta, Biganzoli, et al. (1995). The remaining four studies were single-arm studies that presented univariate analyses. Study hierarchy, quality assessment, summary descriptions, and results are arrayed in Tables 3539.

Table 35. Hierarchy of evidence, KQ5, ovarian cancer.

Table 35

Hierarchy of evidence, KQ5, ovarian cancer.

Table 36. Study quality assessment, KQ5, ovarian cancer.

Table 36

Study quality assessment, KQ5, ovarian cancer.

Table 37. Single-arm studies, design, enrollment and treatment, KQ5, ovarian cancer.

Table 37

Single-arm studies, design, enrollment and treatment, KQ5, ovarian cancer.

Table 38. Single-arm studies, summary time to event outcomes, KQ5, ovarian cancer.

Table 38

Single-arm studies, summary time to event outcomes, KQ5, ovarian cancer.

Table 39. Single-arm studies, summary tumor response, KQ5, ovarian cancer.

Table 39

Single-arm studies, summary tumor response, KQ5, ovarian cancer.

Study populations. All seven studies included patients with ovarian cancer that was either advanced or relapsed/refractory. Six studies focused on chemotherapy regimens, including one study using cisplatin, epirubicin and cyclophosphamide (Camilleri-Broet, Hardy-Bessard, Le Tourneau, et al., 2004), one that used paclitaxel and a platinum compound (Malamou-Mitsi, Crikoni, Timotheadou, et al., 2007); one that used a platinum compound and cyclophosphamide (Hengstler, Lange, Kett, et al., 1999), one that used liposomal doxorubicin (Campos, Penson, Mays, et al., 2001) and one that combined mitoxantrone and ifosfamide (Di Leo, Bajetta, Biganzoli, et al., 1995). One study gave the hormonal agent, letrozole (Bowman, Gabra, Langdon, et al., 2002) and one study offered patients trastuzumab (Bookman, Darcy, Clarke-Pearson, et al., 2003). Median age values of five studies reporting them were in the 50s in four studies and in the 60s in two studies. Distributions of tumor grade in five studies placed the majority of patients in moderately or poorly differentiated categories.

Results by hierarchy level, study quality assessment

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, post-hoc multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, treatment by HER2 subgroup analysis. No studies of this type were identified.

Single-arm study, prespecified multivariate analysis. The study by Di Leo, Bajetta, Biganzoli, et al. (1995) conducted the only prespecified multivariate analysis. In this Phase II study, 72 patients received mitoxantrone plus ifosfamide for persistent or relapsed ovarian cancer. Observers evaluating HER2 were blinded to outcome data. Cox regression and recursive partitioning was carried out, but the report provides poor details about selection of candidate variables, model-building strategies, testing of assumptions, whether standard prognostic factors were included in final models and whether continuous variables were handled well. The article does not mention validation of models. The candidate variables that were tested included tumor imaging, tumor grade, residual tumor volume, number of disease sites, tumor responsiveness, p53 marker values and HER2 values. The only significant variable on univariate or multivariate analyses of time-to-treatment-failure and overall survival was clinically or radiologically detectable disease on study entry. HER2 was not predictive for these outcomes, nor did it predict response on univariate analysis.

Single-arm study, post-hoc multivariate analysis. Two studies of this type are available (Camilleri-Broet, Hardy-Bessard, Le Tourneau, et al., 2004; Malamou-Mitsi, Crikoni, Timotheadou, et al., 2007). Authors of the former study gave cisplatin, epirubicin and cyclophosphamide to 117 of 164 patients with advanced ovarian cancer. Analyses were performed in a both mixed group of marker data taken from primary and metastatic lesions and a subset (of unspecified size) with primary tumor specimens. Focusing on the primary tumor subset, both univariate and multivariate analyses found HER2 and the presence of ascites to be significant predictors of progression-free and overall survival. Multivariate analyses are poorly described in this article. The study by Malamou-Mitsi and colleagues (2007) entailed giving 95 patients with stage IIc-IV epithelial ovarian cancer paclitaxel plus carboplatin or alternating regimens of paclitaxel plus either carboplatin or cisplatin. In a retrospective multivariate analysis, standard prognostic variables were entered into the Cox regression models for overall survival and time-to-progression, but investigators used an inappropriate stepwise selection method for building the final model. Furthermore, it is unclear if validation was conduction. IHC HER2 was not found to be a significant predictor of either outcome.

Single-arm study, univariate analysis. Of four studies with sample sizes between 41 and 70 patients, three found significant relationships between HER2 and at least one outcome (Bookman, Darcy, Clarke-Pearson, et al., 2003; Bowman, Gabra, Langdon, et al., 2002; Hengstler, Lange, Kett, et al., 1999). The study by Bookman, Darcy, Clarke-Pearson, et al. (2003) addressed progression-free survival, overall survival response and toxicity, finding a significant relation only between HER2 and cycle one trastuzumab toxicity. Hengstler, Lange, Kett, et al. (1999) found that HER2 results on a RNA PCR assay were significantly related to overall survival among 44 patients treated with a platinum compound plus cyclophosphamide. Bowman, Gabra, Langdon, et al. (2002) gave letrozole to 50 patients, finding that IHC HER2 results were related to CA125 progression. Campos, Penson, Mays, et al. (2001) showed that IHC HER2 status was not associated with CA125 response among 70 patients treated with liposomal doxorubicin.

Evidence summary-ovarian cancer. The evidence on the relation between HER2 results and outcome comes from six ovarian cancer studies that each addressed a different treatment. Three generally poorly reported multivariate analyses of single-arm series using different chemotherapy regimens are available, the prespecified analysis found HER2 not to be predictive, while among the two post-hoc analyses, it was a significant independent predictor in one and not the other. No randomized studies are available to address potential treatment by HER2 interactions. Four univariate analyses provide are capable of only suggesting candidate variables for future multivariate analyses, showing mixed results with respect to whether HER2 is associated with outcome. Future research should place studies at higher levels in the hierarchy. This weak body of evidence does not support conclusions about whether HER2 predicts treatment outcomes.

Part III. Prostate Cancer

Overview/study populations. Only four studies met selection criteria (total N=147). One study focused on neoadjuvant therapy for high-risk nonmetastatic prostate cancer (Prayer-Galetti, Sacco, Pagano, et al., 2007). Two studies addressed hormonal therapy for advanced prostate cancer (Nishio, Yamada, Kokubo, et al., 2006; Arai, Yoshiki, Yoshida, et al., 1997) and one study managed patients with stage A1 disease expectantly (Fox, Persad, Coleman, et al., 1994). Three studies used tissue HER2 testing (Prayer-Galetti, Sacco, Pagano, et al., 2007; Nishio, Yamada, Kokubo, et al., 2006; Fox, Persad, Coleman, et al., 1994) and the fourth study used serum testing (Arai, Yoshiki, Yoshida, et al., 1997). Study hierarchy, quality assessment, summary descriptions, and results are arrayed in Tables 4043.

Table 40. Hierarchy of evidence, KQ5, prostate cancer.

Table 40

Hierarchy of evidence, KQ5, prostate cancer.

Table 41. Study quality assessment, KQ5, prostate cancer.

Table 41

Study quality assessment, KQ5, prostate cancer.

Table 42. Single-arm studies, design, enrollment and treatment, KQ5, prostate cancer.

Table 42

Single-arm studies, design, enrollment and treatment, KQ5, prostate cancer.

Table 43. Single-arm studies, summary time to event outcomes, KQ5, prostate cancer.

Table 43

Single-arm studies, summary time to event outcomes, KQ5, prostate cancer.

Results by hierarchy level, study quality assessment

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, post-hoc multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, treatment by HER2 subgroup analysis. No studies of this type were identified.

Single-arm study, prespecified multivariate analysis. No studies of this type were identified.

Single-arm study, post-hoc multivariate analysis. No studies of this type were identified.

Single-arm study, univariate analysis. All four studies involved univariate analyses, three of which were retrospective case series and 1 was a prospective phase II study. The phase II study (Prayer-Galetti, Sacco, Pagano, et al., 2007) selected 22 patients with nonmetastatic high risk prostate cancer, giving the chemohormonal therapy prior to surgery. IHC HER2 status was not found to be related to either pathologic response or disease-free survival. Nishio, Yamada, Kokubo, et al. (2006) included 47 patients treated with maximal androgen blockade for advanced disease manifested by bone metastases. Tissue IHC HER2 was found to be associated with disease-specific survival and prostate-specific antigen (PSA) relapse-free survival. Arai, Yoshiki, Yoshida, et al. (1997) selected 33 patients with advanced (stage D2) patients treated with antiandrogen monotherapy and found that serum HER2 was associated with progression-free survival. Fox, Persad, Coleman, et al. (1994) reported on 45 patients with stage A1 disease treated expectantly and observed an association between IHC HER2 and overall survival.

Evidence summary-prostate cancer. This small body of evidence is too weak to show whether HER2 predicts outcomes after treatment for prostate cancer. No randomized studies or multivariate analyses of single-arm studies are available. The only studies meeting selection criteria were three small retrospective case series and one phase II study, all using univariate analyses. Two studies found tissue IHC HER2 to predict outcomes, one study found IHC HER2 did not predict outcomes, and one study found serum HER2 to be predictive. These exploratory studies would need to be confirmed by large studies higher in the evidence hierarchy.

Part IV. Head and Neck Cancer

Overview/study populations. Two studies met selection criteria (total n=113). One study examined surgery alone for patients with malignant salivary tumors (Nagler, Kerner, Ben-Eliezer, et al., 2003). The other study (Khan, King, Smith, et al., 2002) gave surgery and external beam radiotherapy (EBRT) to patients with squamous cell carcinoma of the oral cavity or oropharynx. Study hierarchy, quality assessment, summary descriptions, and results are arrayed in Tables 4447.

Table 44. Hierarchy of evidence, KQ5, head and neck cancer.

Table 44

Hierarchy of evidence, KQ5, head and neck cancer.

Table 45. Study quality assessment, KQ5, head and neck cancer.

Table 45

Study quality assessment, KQ5, head and neck cancer.

Table 46. Single-arm studies, design, enrollment and treatment, KQ5, head and neck cancer.

Table 46

Single-arm studies, design, enrollment and treatment, KQ5, head and neck cancer.

Table 47. Single-arm studies, time to event outcomes, KQ5, head and neck cancer.

Table 47

Single-arm studies, time to event outcomes, KQ5, head and neck cancer.

Results by hierarchy level, study quality assessment

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, post-hoc multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, treatment by HER2 subgroup analysis. No studies of this type were identified.

Single-arm study, prespecified multivariate analysis. No studies of this type were identified.

Single-arm study, post-hoc multivariate analysis. No studies of this type were identified.

Single-arm study, univariate analysis. The study of surgery for malignant salivary gland tumors (Nagler, Kerner, Ben-Eliezer, et al., 2003) found HER2 to be a significant predictor of overall survival. In contrast, the study by Khan, King, Smith, et al. (2002) of surgery plus external-beam radiation therapy for squamous cell carcinoma of the oral cavity and oropharynx found that IHC HER2 was not a significant predictor of disease-free survival or overall survival. Khan and co-workers also reported that FISH HER2 was not significantly associated with overall survival.

Evidence summary-head and neck cancer. The evidence on whether HER2 predicts outcomes after treatment for head and neck cancer is weak. No randomized studies or single-arm designs using multivariate analyses met study selection criteria. Two studies were univariate analyses of single-arm studies. Additional studies are needed that are placed at higher levels in the evidence hierarchy.

Conclusions, Key Question 5

This systematic review found only weak evidence on how well serum or tissue HER2 testing predicts outcomes after treatment for malignancies in any of these sites: lung, ovary, head and neck, or prostate. Overall, the evidence is heterogeneous with respect to treatment regimens and thresholds for positive HER2 test results. Of 22 studies addressed for the four types of malignancies, there were no randomized trials that could have analyzed HER2 by treatment effect interactions. Six multivariate analyses in single-arm designs were performed, all of which were poorly described, so it is unclear if they were well conducted. Data from these exploratory analyses did not consistently find that HER2 status predicts treatment results. Univariate analyses provide very limited information value, at best suggesting candidate variables for future multivariate analyses.

Footnotes

*

Appendixes cited in this report are provided electronically at http://www​.ahrq.gov/downloads​/pub/evidence/pdf/her2/her2.pdf.

*

The description of the criteria used to designate HER2 status is unclear, but it appears that cases were considered positive if at least 50% of the cancer cells had membrane staining (roughly comparable to HercepTest™ 3+ score). Cases with membrane staining of less than 50% of the tumor cells were designated low HER2 positive. Overall HER2-positive proportions include high and low positive scores, but in the analysis, patients were grouped into negative/low positive findings vs. high positive findings.

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...