NCBI » Bookshelf » Health Services/Technology Assessment Text (HSTAT) » AHRQ Evidence Reports » HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors
 
hserta
AHRQ Evidence Reports
public health

Chapter  172:  HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors

A287866

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0026

Prepared by:

Blue Cross and Blue Shield Association

Technology Evaluation Center

Evidence-based Practice Center

Chicago, Illinois

Investigators

Jerome Seidenfeld, Ph.D.

David J. Samson, M.S.

Barbara M. Rothenberg, Ph.D.

Claudia J. Bonnell, B.S.N., M.L.S.

Kathleen M. Ziegler, Pharm.D.

Naomi Aronson, Ph.D.

AHRQ Publication No. 09-E001

November 2008

This report is based on research conducted by the Blue Cross and Blue Shield Association Technology Evaluation Center Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0026). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.

This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

Suggested Citation:

Seidenfeld J, Samson DJ, Rothenberg BM, Bonnell CJ, Ziegler KM, Aronson N. HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors. Evidence Report/Technology Assessment No. 172. (Prepared by Blue Cross and Blue Shield Association Technology Evaluation Center Evidence-based Practice Center, under Contract No. 290-02-0026.) AHRQ Publication No. 09-E001. Rockville, MD: Agency for Healthcare Research and Quality. November 2008.

No investigators have any affiliations or financial involvement (e.g., employment, consultancies, honoraria, stock options, expert testimony, grants or patents received or pending, or royalties) that conflict with the material presented in this report.

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0026

Prepared by:

Blue Cross and Blue Shield Association

Technology Evaluation Center

Evidence-based Practice Center

Chicago, Illinois

Investigators

Jerome Seidenfeld, Ph.D.

David J. Samson, M.S.

Barbara M. Rothenberg, Ph.D.

Claudia J. Bonnell, B.S.N., M.L.S.

Kathleen M. Ziegler, Pharm.D.

Naomi Aronson, Ph.D.

AHRQ Publication No. 09-E001

November 2008

This report is based on research conducted by the Blue Cross and Blue Shield Association Technology Evaluation Center Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0026). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.

This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

Suggested Citation:

Seidenfeld J, Samson DJ, Rothenberg BM, Bonnell CJ, Ziegler KM, Aronson N. HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors. Evidence Report/Technology Assessment No. 172. (Prepared by Blue Cross and Blue Shield Association Technology Evaluation Center Evidence-based Practice Center, under Contract No. 290-02-0026.) AHRQ Publication No. 09-E001. Rockville, MD: Agency for Healthcare Research and Quality. November 2008.

No investigators have any affiliations or financial involvement (e.g., employment, consultancies, honoraria, stock options, expert testimony, grants or patents received or pending, or royalties) that conflict with the material presented in this report.

Preface

The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.

To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.

AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.

We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by e-mail to .

Carolyn M. Clancy, M.D. Director Agency for Healthcare Research and QualityJean Slutsky, P.A., M.S.P.H. Director, Center for Outcomes and Evidence Agency for Healthcare Research and Quality
Beth A. Collins Sharp, Ph.D., R.N. Director, EPC Program Agency for Healthcare Research and QualityGurvaneet Randhawa, M.D., M.P.H. EPC Program Task Order Officer Agency for Healthcare Research and Quality

Acknowledgments

The research team would like to acknowledge the efforts of Maxine A. Gere, M.S., for general project management and editorial assistance; Elizabeth De La Garza and Joyce Gonzalez for administrative support; Ariel Katz, M.D., M.P.H., for study selection and data abstraction; Thomas Ratko, Ph.D., for fact-checking; and Gurvaneet Randhawa, M.D., M.P.H., for advice as our Task Order Officer.

Structured Abstract

Objectives: Systematic review of trastuzumab outcomes among breast cancer patients who have negative, equivocal, or discordant HER2 assay results; use of HER2 assay results to predict outcomes of chemotherapy or hormonal therapy regimen for breast cancer; use of serum HER2 to monitor treatment response or disease progression in breast cancer patients; and use of HER2 testing to manage patients with lung, ovarian, prostate, or head and neck tumors. Also, narrative review of concordance of HER2 assays.

Data Sources: We abstracted data from: three articles plus one conference abstract on negative, equivocal, or discordant HER2 results; 26 studies on selection of chemotherapy or hormonal therapy; 15 studies on serum HER2; and 26 studies on ovarian, lung, prostate, or head and neck tumors. Foreign-language studies were included.

Review Methods: We sought randomized trials or single-arm series (prospective or retrospective) of identically treated patients that presented relevant outcome data associated with HER2 status.

Results: HER2 assay results are influenced by multiple biologic, technical, and performance factors. Many aspects of HER2 assays were standardized only recently, so inconsistencies confound the literature comparing different methods. The evidence is weak on outcomes of trastuzumab added to chemotherapy for HER2-equivocal, -discordant, or -negative patients. Evidence comparing chemotherapy outcomes in HER2-positive and HER2-negative patient subgroups may generate hypotheses, but is too weak to test hypotheses. Only a rigorous test can resolve whether HER2-positive patients (but not HER2-negative patients) benefit from an anthracycline regimen. Evidence is available only from uncontrolled series on whether HER2 status predicts complete pathologic response to neoadjuvant chemotherapy. Evidence also is weak regarding differences by HER2 status for outcomes of chemotherapy for advanced or metastatic disease; with most studies lacking statistical power. Data from studies of tamoxifen and aromatase inhibitors suggest that future studies should examine whether HER2 status predicts response to specific hormonal therapies among estrogen-receptor-positive patients. The evidence is weak on whether serum HER2 predicts outcome after treatment with any regimens in any setting, as is the evidence on use of serum or tissue HER2 testing for malignancies of lung, ovary, head and neck, or prostate.

Conclusions: Overall, few studies directly investigated the key questions of this systematic review. Going forward, cancer therapy trial protocols should incorporate elements to facilitate robust analyses of the use of HER2 status and other biomarkers for managing treatment.

Executive Summary

The human epidermal growth factor receptor-2 (HER2) gene is amplified and the HER2 protein overexpressed in approximately 18–20 percent of breast cancer cases. Amplification or overexpression of HER2 is associated with poor prognosis. Evidence from randomized trials demonstrates that adding trastuzumab, a therapeutic monoclonal antibody that targets HER2, to adjuvant chemotherapy regimens for HER2-positive breast cancer improves survival. HER2 also is overexpressed in other epithelial malignancies such as ovarian, thyroid, lung, salivary gland/head and neck, stomach, colon, and prostate cancers.

This report is a systematic review of the evidence on other applications of HER2 testing to the management of cancer patients including: potential for response to trastuzumab among breast cancer patients who have negative, equivocal, or discordant HER2 assay results; use of HER2 assay results to guide selection of breast cancer treatments other than trastuzumab (i.e., chemotherapy regimen or hormonal therapy regimen); the use of serum HER2 to monitor treatment response or disease progression in breast cancer patients; and use of HER2 testing to manage patients with ovarian, lung, prostate, or head and neck tumors. The concordance and discrepancy of HER2 measurement methods are discussed in a narrative review.

Methods

The review methods were defined prospectively in a written protocol. A technical expert panel provided consultation. The draft report was also reviewed by other experts and stakeholders.

A narrative review was conducted on Key Question 1, which addressed concordance and discrepancy among HER2 assays in breast cancer. HER2 assay results are influenced by multiple biologic, technical, and performance factors. Since many aspects of HER2 assays were standardized only recently, we could not isolate effects of these disparate influences on assay results and patient classification. This challenged the validity of using systematic review methods to compare available assay technologies.

For Key Questions 2–5, we sought randomized trials or single-arm series (prospective or retrospective) of identically treated patients that presented relevant outcome data associated with HER2 status. Primary outcomes were: overall survival (OS); disease-free survival (DFS); progression-free survival (PFS); time to failure (TTF) or progression; quality of life; palliation of symptoms; and treatment-related adverse effects.

Our search had no language restrictions and used these electronic databases:

  • MEDLINE® (through February 2007)

  • EMBASE® (through February 2007)

  • Cochrane Controlled Trials Register (through February 2007)

The searches were updated in April 2008, using the Cochrane clinical trial filter.

Additional sources were the past two years of conference proceedings of the American Association for Clinical Chemistry (AACC), American Society of Clinical Oncology (ASCO), College of American Pathologists (CAP), and the San Antonio Breast Cancer Symposium (SABCS).

Of 6,337 citations, 666 articles were retrieved and 70 were selected for inclusion:

  • Three articles plus one abstract on use of trastuzumab among HER2-negative or -discordant breast cancer patients;

  • 26 articles on chemotherapy or hormonal therapy for breast cancer patients;

  • 15 articles on plasma or serum HER2 in patients treated for breast cancer; and

  • 26 articles on serum or tissue HER2 in patients with lung cancer, ovarian cancer, head and neck cancer, and prostate cancer.

A single reviewer screened citations for article retrieval; citations judged as “uncertain” were reviewed by a second reviewer. The same procedure was used to select articles for inclusion in the review. A single reviewer performed data abstraction and a second reviewed the evidence tables for accuracy. However, study quality was appraised by dual independent review. All disagreements were resolved by consensus.

The quality of predictive studies was assessed using the general approach described in the “Reporting Recommendations for Tumor Marker Prognostic Studies” (REMARK) statement (McShane, Altman, Sauerbrei, et al., 2005). In addition, we used a hierarchical framework for evaluating how informative different designs and analytic strategies would be to predictions of outcomes according to HER2 status. Most informative is a trial that randomizes patients to receive treatment guided by HER2 results or not; or, alternatively, a trial that stratifies randomized assignment to treatment groups by HER2 status (Conley and Taube, 2004). Other types of studies, in decreasing order of information value, include: randomized trials using prespecified multivariate subgroup analyses, randomized trials using post-hoc multivariate subgroup analyses, randomized trials presenting HER2 by treatment subgroup analyses, single-arm studies using prespecified multivariate analyses, single-arm studies using post-hoc multivariate analyses, and single-arm studies using univariate analyses.

Results

Key Question 1: Concordance and Discrepancy of HER2 Methods

HER2 assay results are influenced by multiple biologic, technical and performance factors. Since many aspects of HER2 assays were standardized only recently, these disparate influences confound the existing literature that compares results of different methods. Discordances between immunohistochemistry (IHC) and fluorescent in situ hybridization (FISH) results might arise in one of three ways. They may be artifacts of one accurate and one inaccurate test or of two inaccurate tests, as preanalytic, analytic, and postanalytic practices can vary among laboratories within a study, as well as among studies. Interobserver variability can play a role. Alternatively, discordances may reflect a threshold issue, either related to changes in threshold definitions over time, or an inherent problem of using a continuous measure to classify patients dichotomously. Finally, discordant test results might accurately reflect a variation among patients with respect to the biologic mechanisms that can increase membrane levels of the HER2 protein. This clearly affects the interpretation of evidence on the use of “HER2 status” to predict treatment or disease outcomes, which presumes accurate classification by tissue assays.

Notably, there is no recognized gold standard to determine the HER2 status of tumor tissue, which also precludes consensus on one “best” HER2 assay. Recent guidelines acknowledge present uncertainty, permit clinicians and laboratories to choose an initial well-validated and properly performed HER2 assay method, and recommend confirming results with an alternative assay when initial tests are equivocal. The ASCO/CAP expert panel (Wolff, Hammond, Schwartz, et al., 2007a) defines equivocal HER2 assay results as IHC 2+, or HER2 gene copy number from 4.0 to 6.0, or HER2/CEP17 ratio from 1.8 to 2.2, if ISH is the first or only assay.

Key Question 2: HER2-Negative or -Discrepant Breast Cancer

Currently available evidence on outcomes of trastuzumab added to chemotherapy for most HER2-equivocal, -discordant, or -negative patients may generate hypotheses, but is too weak to test hypotheses. Most of this evidence is from post-hoc analyses on subgroups not directly randomized or stratified by HER2 status. Scant but intriguing evidence suggests the hypothesis that some patients currently classified as HER2 negative may benefit from adjuvant trastuzumab. Data reported from a post-hoc subgroup analysis of one adjuvant trial (NSABP B31) showed significantly longer DFS and relapse-free interval (RFI) in FISH-negative IHC ≤2+ patients given trastuzumab than in patients managed without trastuzumab, whether the analysis did or did not include those who were IHC 0. However, analysis of data from another similar adjuvant trial (NCCTG N9831) found no significant differences. Both were interim analyses of trials in which fewer than 25 percent of subjects had reached a failure event. Followup analyses from these trials will be of interest.

CALGB 9840 investigators also analyzed a subgroup of metastatic FISH-negative patients that either had (n=38) or did not have (n=103) polysomy 17; overall response rate (ORR) was significantly higher with versus without trastuzumab for those with polysomy 17, but was identical with or without trastuzumab for those without polysomy 17. However, a study in the adjuvant setting (Reinholz, Jenkins, Hillman, et al., 2007) reports no impact of polysomy 17 on benefit from trastuzumab. Additionally, other studies report conflicting data on association of polysomy 17 with overexpression of HER2 protein.

Key Question 3: Breast Cancer Patients Receiving Chemotherapy (3a) or Hormonal Therapy (3b)

For Question 3a, across all three treatment settings (adjuvant, neoadjuvant, or advanced/metastatic), currently available evidence comparing chemotherapy outcomes in HER2-positive and HER2-negative patient subgroups may generate hypotheses, but is too weak to test hypotheses. In the only study that prespecified multivariate subgroup analysis by HER2 status, interaction of assigned adjuvant treatment (with or without paclitaxel) with HER2 status to predict outcome was not statistically significant (ratio of hazard ratios [HRs]=0.85; p=.41). All other evidence is from post-hoc analyses on subgroups not directly randomized, selected, or stratified by HER2 status, and used data from secondary or correlative analysis on patient subgroups with archived tissue samples. It is uncertain whether these subgroups were well balanced. No studies for Question 3a used trastuzumab for HER2-positive patients.

Available evidence focuses on three types of adjuvant chemotherapy: cyclophosphamide plus methotrexate plus fluorouracil (CMF), regimens with an anthracycline, and paclitaxel after or with doxorubicin (Adriamycin®) plus cyclophosphamide (AC). Evidence from two studies (one randomized, controlled trial and one series) suggests HER2-positive patients may benefit less from CMF (smaller improvements in OS and DFS) than HER2-negative patients. Only one of four randomized, controlled trials reports a statistically significant interaction that suggests HER2-positive patients (but not HER2-negative patients) benefit from including an anthracycline in their treatment regimen. Given the highly statistically significant result favoring anthracycline therapy for the entire population (N=14,000) of breast cancer patients included in the Early Breast Cancer Trialists' Collaborative Group (EBCTCG 2005) patient-level meta-analysis, a rigorous test of this hypothesis is necessary before one can conclude that omitting anthracyclines from adjuvant chemotherapy regimens would not worsen outcome for HER2-negative patients.

Two trials compared different doses or frequencies of anthracycline-based regimens. One reported statistically significant interaction of cyclophosphamide, doxorubicin, and fluorouracil (CAF) dose with HER2 status to predict treatment outcome, but the second showed no relationship. One study found that adding paclitaxel after AC improves OS and DFS for HER2-positive patients, but may not improve these outcomes for HER2-negative patients. In contrast, the only randomized, controlled trial with a prespecified multivariate subgroup analysis found no difference by HER2 status in outcomes of concurrently added paclitaxel. Thus, for each of the adjuvant chemotherapy regimens compared, available evidence is too weak to rule out the possibility that HER2-negative patients may benefit from using the added drug or higher dose.

Evidence on whether HER2 status predicts complete response (pCR) to neoadjuvant chemotherapy is limited to four uncontrolled series (retrospective analysis in three). Data are lacking to directly compare any neoadjuvant regimens. There is also limited evidence on differences by HER2 status for outcomes of chemotherapy for advanced or metastatic disease, with most studies lacking statistical power.

For Question 3b, four studies addressed use of tamoxifen in various breast cancer patient populations, and two compared tamoxifen with aromatase inhibitors. None of these studies included trastuzumab. There were no trials that stratified randomization by HER2 status or randomization to therapy directed by HER2 results or not. Less informative designs were used, including post-hoc multivariate analyses in five randomized trials and one post-hoc multivariate analysis in a single-arm study. Data are too weak to reach new conclusions about differences between subgroups based on HER2 status in effects of specific hormone therapies for patients who are hormone-receptor positive.

Key Question 4: Plasma or Serum HER2 (sHER2) in Patients Treated for Breast Cancer

Of 13 included studies, three were randomized trials and 11 were single-arm designs. The evidence is weak on whether sHER2 predicts outcome after treatment with any regimens in any setting. Evidence primarily focused on first-line or second- and subsequent-line treatment of metastatic disease using variety of regimens. Studies used different thresholds for a positive sHER2 result and varied on whether patient selection required positive tissue HER2 status. One randomized and two single-arm studies performed multivariate analysis, although reporting lacked sufficient detail. Univariate analyses provide very limited information value, suggesting candidate variables for future multivariate analyses. Overall, the evidence is too weak to assess whether sHER2 predicts disease progression, treatment response, or outcomes of any specific treatment regimen.

Key Question 5. Serum or Tissue HER2 Testing in Malignancies of Lung, Ovary, Head and Neck, or Prostate

With respect to use of serum or tissue HER2 testing for malignancies of lung, ovary, head and neck, or prostate, the evidence is quite weak. Studies were heterogeneous regarding treatment regimens and thresholds for positive HER2 test results. Of 22 studies addressed for the four types of malignancies, there were no randomized trials that could have analyzed HER2 by treatment effect interactions. Six multivariate analyses in single-arm designs were performed, all of which were poorly described; it is unclear if they were well conducted. Data from these exploratory analyses did not consistently find that HER2 status predicts treatment results. Univariate analyses provide very limited information value, at best suggesting candidate variables for future multivariate analyses.

Discussion and Future Research

Overall, few trials directly investigated the key questions of this systematic review. Going forward, cancer therapy trial protocols should incorporate elements to facilitate robust analyses of the potential of HER2 to improve treatment management. These elements include:

  • Detailed reporting of how HER2 status was ascertained.

  • Stratified randomization by HER2 status or prospectively specified HER2 subgroup analysis of outcomes.

  • Detailed recording of relevant data and archiving of tissue samples for all participants, and accessible to other researchers, to permit future subgroup analyses of outcomes by HER2 status.

The rationale is strongest for breast cancer therapy trials, as many therapeutic agents, classes, and regimens have been and will be tested. This approach can be generalized to other tumors, to promising biomarkers other than HER2, and to serial collection of serum samples for sHER2 levels. Maximizing data collection in trials planned for other purposes offers an opportunity to screen for potential applications of HER2 and other biomarkers.

For Key Question 2, potential for response to trastuzumab among breast cancer patients who have equivocal, discordant, or negative HER2 assay results, evidence is scant but intriguing. Whether other markers might predict response to trastuzumab for these subgroups could be explored using tissue samples from completed trials.

For Key Question 3, the most compelling question is whether anthracyclines benefit HER2-negative patients. A pragmatic approach for future research is to use individual patient data, of the Early Breast Cancer Trialists' Collaborative Group (EBCTCG) meta-analysis, which compared survival with anthracyclines versus CMF in 14,000 patients. However, this approach may be limited by availability of sufficient tumor samples. Also of interest is evidence to clarify whether aromatase inhibitors are more effective than tamoxifen in HER2-positive patients.

For Key Questions 4 and 5, evidence does not support conclusions about use of serum HER2 for any treatment setting within breast cancer or about any use of serum or tissue HER2 for cancer of the lung, ovary, head and neck, or prostate. Future exploratory studies in these areas using preserved or prospectively collectively specimens should be designed with attention to study quality concerns.

Conclusions

Since many technical and performance aspects of HER2 assays were not standardized until very recently, differences in preanalytic, analytic, and postanalytic practices confound the existing literature. Available evidence supports hypotheses generation but is too weak to test hypotheses. Scant but intriguing evidence suggests the hypothesis that some patients currently classified as HER2 negative may benefit from adjuvant trastuzumab. Future research should focus on biomarkers that might select such patients. Evidence suggests HER2-positive, but not HER2-negative, patients may benefit from chemotherapy regimens with an anthracycline; but rigorous testing of this hypothesis is necessary. Also worth additional testing is the hypothesis that aromatase inhibitors may be more beneficial than tamoxifen for HER2-positive, hormone-receptor-positive breast cancer patients. Overall, few trials directly investigate the key questions of this systematic review.

Going forward, cancer therapy trial protocols should incorporate elements to facilitate robust analyses of the use of HER2 status and other biomarkers for managing treatment. Given the human and financial cost of cancer therapy trials, the limited resources available, and the long duration of followup needed to assess outcomes, particularly for early stage or slowly growing cancers, it is imperative that tumor tissue blocks be collected, optimally fixed, saved, and made available for correlative tumor marker studies from all randomized patients. Agreement to share blocks with investigators should be made a condition for institutions seeking to participate in cooperative group trials.

Chapter 1. Introduction

Table 1

Estimated new cases and deaths in the U.S. in 2007 for epidermal cancers (of which varying proportions overexpress HER2) (Jemal, Siegel, Ward, et al., 2008)
Cancer TypeEstimated New CasesEstimated Deaths
Breast cancer (female)182,46040,480
Ovarian cancer21,65015,520
Thyroid cancer37,3401,590
Lung cancer215,020161,840
Head and neck
• oral cavity/pharynx35,3107,590
• larynx12,2503,670
Stomach21,50010,880
Colon108,07049,960
Prostate186,32028,860
The human epidermal growth factor (EGF) receptor-2 (HER2; also referred to as HER2/neu and as ERBB2) gene, located at position 17q12 on chromosome 17, is amplified (i.e., gene copy number greater than 2) and/or the HER2 protein is overexpressed (i.e., cell membrane has excess of HER2 protein molecules compared to normal cells) in approximately 18 to 20 percent of breast cancer cases (Owens, Horten, and Da Silva, 2004; Yaziji, Goldstein, Barry, et al., 2004; Wolff, Hammond, Schwartz, et al., 2007a; Slamon, Clark, Wong, et al., 1987; Hanna, O'Malley, Barnes, et al., 2007). Amplification and/or overexpression of HER2 have been associated with increased tumor aggressiveness and poor prognosis. The HER2 gene is one of four (HER1 through HER4) in the EGF receptor gene family; each codes for a membrane-spanning protein that can form homodimers and heterodimers and functions in signal transduction. All but HER2 bind (EGF or another) ligand outside the cell, and all but HER3 have enzymatic activity that phosphorylates tyrosine residues in proteins (i.e., tyrosine kinase activity) and that is activated by ligand binding. Ligand-activated tyrosine kinase initially phosphorylates tyrosine residues of the receptor's intracellular domain, and subsequently can phosphorylate tyrosine residues of other intracellular proteins. HER2 also is overexpressed in varying proportions of other epithelial malignancies such as ovarian, thyroid, lung, salivary gland/head and neck, stomach, colon and prostate cancers (Baselga and Mendelsohn 1994; Blank, Chang, and Muggia, 2005; Gross, Jos, and Agus, 2004). Table 1 provides a listing of the estimated new cases and deaths in the U.S. for these cancers in 2008.

Implications of Accurately Determining HER2 Status

Laboratory assays for the HER2 gene and protein in tumor tissue are used to determine the HER2 status of patients with breast cancer (positive if either HER2 gene amplification or HER protein overexpression is present; negative if neither is present). As outlined in guideline recommendations for HER2 testing in breast cancer from the American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP; Wolff, Hammond, Schwartz, et al., 2007a), and in a report from a task force of the National Comprehensive Cancer Network (NCCN; Carlson, Moench, Hammond, et al., 2006), information regarding a patient's HER2 status can contribute to treatment and other patient management decisions in several ways. HER2 overexpression has been associated with clinical outcomes in patients with breast cancer (Press, Pike, Chazin, et al., 1993; Press, Bernstein, Thomas, et al., 1997; Yamauchi, Stearns, Hayes, 2001). Because HER2 positivity is associated with a worse prognosis in patients with newly diagnosed breast cancer who do not receive systemic adjuvant chemotherapy, HER2 status may be incorporated along with other prognostic factors into decision making regarding such therapy (Wolff, Hammond, Schwartz, et al., 2007a; Carlson Moench, Hammond, et al., 2006).

HER2 positivity also appears to be associated with relative, but not absolute, resistance to certain endocrine therapies (e.g., tamoxifen; less so for aromatase inhibitors) and lower benefit from nonanthracycline, nontaxane-containing chemotherapy regimens (Konecny, Pauletti, Pegram, et al., 2003; Ellis, Coop, Singh, et al., 2001; Menard, Valagussa, Pilotti, et al., 2001). HER2 status is also used to determine whether a patient is eligible to receive biologic therapy specifically targeted to HER2 activity, e.g., trastuzumab (Herceptin®, Genentech, San Francisco, CA) or lapatinib (Tykerb®, GlaxoSmithKline, Research Triangle Park, NC).

Additionally, therapies have been developed that specifically target the HER2 protein (Dinh, de Azambuja, Piccart-Gebhart, et al., 2007; Pal and Pegram, 2007; Viani, Afonso, Stefano, et al., 2007; Lin and Rugo, 2007). Evidence from multiple randomized trials demonstrates that trastuzumab, a therapeutic monoclonal antibody that targets HER2, decreases the risk of recurrence and mortality when added to adjuvant chemotherapy regimens for resected HER2-positive breast cancer. A recent meta-analysis (five trials; pooled N=9,117) reported an odds ratio (OR) for mortality with versus without trastuzumab of 0.52 (95 percent CI: 0.44–0.62; p<0.00001), while OR for recurrence was 0.53 (95 percent CI: 0.46–0.60; p<.00001) (Viani, Alfonso, Stefano et al. 2007). In patients with metastatic HER2-positive breast cancer, trastuzumab alone or with chemotherapy increases time to disease progression and improves survival. Thus, there is increased emphasis on accurately determining the HER2 status of patients with newly diagnosed or recurrent breast cancer.

Table 2

HER2 assays used in tissue specimens and serum: clinical trials, clinical practice, and under development (adapted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007a and including information from Carlson, Moench, Hammond, et al., 2006)
A. IHC Assays: measure HER2 protein overexpression in tissue
AssayMfrMethodologyScoring CriteriaFDA Status
Clinical Trials AssayDeveloped by independent laboratoryCB11 and 4D5 MAb0 and 1+ negative, 2+ weakly positive, 3+ strongly positiveResearch assay used in trials of trastuzumab in metastatic breast cancer
HercepTest™DAKO*A0485 polyclonal antibodyWeakly positive (2+): weak to moderate complete membrane staining in >10% of tumor cells; strongly positive (3+): strong complete membrane staining in >10% of tumor cells*U.S. Food and Drug Administration (FDA) approved as an aid in the assessment of patients for whom Herceptin™ (trastuzumab) treatment is being considered
PATHWAY™Ventana†CB11 MAbPositive (2+): weak complete staining of the membrane, >10% of cancer cells; positive (3+): intense complete staining of the membrane, >10% of cancer cells†FDA approved as an aid in the assessment of patients for whom Herceptin™ (trastuzumab) treatment is being considered
B. In-Situ Hybridization (ISH) Assays: measure HER2 gene amplification in tissue
PathVysion® HER2 DNA Probe Kit (FISH)Abbott‡Hybridization of fluorescent DNA probes to HER2 gene (orange) and chromosome 17 centromere (green)HER2 amplification: HER2/CEP17 ratio ≥2 on average for 60 cells; results at or near the cut off point (1.8–2.2) should be interpreted with caution (Persons, Tubbs, Cooley, et al., 2006; Dal Lago, Durbecq, Desmedt, et al., 2006)FDA approved as an aid in the assessment of patients for whom Herceptin™ (trastuzumab) treatment is being considered
INFORM HER2/neu Probe (FISH)Ventana§Hybridization of biotin-labeled DNA probe to HER2 gene and fluorescently labeled avidinHER2 amplification: average of >6 HER2 gene copies/nucleus; an average of >4.0 <6.0 gene copies/nucleus for 60 cells described as equivocal in one publication (Dal Lago, Durbecq, Desmedt, et al., 2006; Vera-Roman and Rubio-Martinez, 2004)FDA approved as an adjunct to existing clinical and pathologic information currently used as prognostic indicators in the risk stratification of breast cancer in patients with a primary, invasive, localized, node-negative tumor
HER2 FISH pharmDx™ KitDako[down-pointing small open triangle]Hybridization of fluorescent DNA probes to HER2 gene (red) and PNA probes to chromosome 17 centromere (CEN-17; green)Count 20 nuclei per tissue specimen, when possible from distinct tumor areas. Specimens with a HER2/CEN-17 ratio ≥2 should be considered HER2 gene amplified (Kallioniemi, Kallioniemi, Kurisu, et al., 1992; Ellis, Dowsett, Bartlett, et al., 2000; Hanna, 2001; Tsuda, Akiyama, Terasaki, et al., 2001). Results at or near the cut-off (1.8–2.2) should be interpreted with caution. If the ratio is borderline (1.8–2.2), count an additional 20 nuclei and recalculate the ratio for the 40 nucleiFDA approved as an adjunct to clinicopathologic information currently used for estimating prognosis in stage II, node-positive breast cancer patients and as an aid in assessment of patients being considered for Herceptin™ (trastuzumab) treatment
SPoT-Light (CISH)Invitrogen/Zymed¶Hybridization of digoxigenin-labeled DNA probe to HER2 gene; detection via mouse antidigoxigenin antibody followed by antimouse-peroxidaseHigh HER2 amplification defined as >10 dots, or large clusters, (low if >5 dots to 10 dots, or small clusters) or mixture of multiple dots and large clusters of the HER2 gene present per nucleus in >50% tumor cells (Hanna and Kwok, 2006)DNA probe kit not available in the U.S.
EnzMet GenePro (SISH)VentanaHybridization of dinitrophenol-labeled DNA probe to HER2 gene; detection via peroxidase-labeled multimer followed by enzyme metallographyAmplification defined as six or more dots, or large clusters of dots, in 30% or more of invasive tumor cells (Downs-Kelly, Pettay, Hicks, et al., 2005)DNA probe kit not available in the U.S.
C. HER2 Extracellular Domain (ECD) Assays: detect HER2 ECD in serum
Immuno 1®/ADVIA Centaur®BayerEnzyme immunoassay (EIA); primary MAbs NB-3 and TA-1 (one is labeled with fluorescein and the other is either linked to an enzyme or a chemiluminogenic molecule) specific for the ECD of HER2 added to sera; detection via binding of immunocomplex to antifluorescein antibodies in the solid phase, followed by addition of substrate in case of Immuno 1 assayElevated ECD concentrations often defined as >15 ng/mL (Payne, Allard, Anderson-Mauser, et al., 2000; Esteva, Cheli, Fritsche, et al., 2005)FDA approval for followup and monitoring patients with metastatic breast cancer only

CISH: chromogenic in situ hybridization; ECD: extracellular domain; IHC: immunohistochemistry; FISH: fluorescent in situ hybridization; MAb: monoclonal antibody; Mfr: manufacturer; SISH: silver enhanced in situ hybridization;

There are several assays available to measure or detect HER2 in tissue specimens: immunohistochemistry (IHC) assays measure overexpressed protein coded for by the HER2 gene, and in-situ hybridization techniques that rely on fluorescence (FISH), chromogenic (CISH), or silver-enhanced (SISH) assays, measure gene amplification (Table 2). Additionally, these and other methods (e.g., mRNA assays) can detect or measure HER2 in circulating tumor cells (Meng, Tripathy, Shete, et al., 2004; Apostolaki, Perraki, Pallis, et al., 2007). There is also a serum-based enzyme-linked immunosorbent assay (ELISA; Immuno 1®/ADVIA Centaur®, Bayer) that measures circulating levels of extracellular domain of HER2 (Carlson, Moench, Hammond, et al., 2006; Harris, Fritsche, Mennel, et al., 2007); however, the tissue-based assays are most commonly used to establish a patient's tumor HER2 status.

Key Questions for this Systematic Review

This systematic review will address five key questions regarding HER2 testing to manage patients with breast cancer or other solid tumors:

  • 1

    What is the evidence on concordance and discrepancy rates for methods (e.g., FISH, IHC, etc.) used to analyze HER2 status in breast tumor tissue?

  • 2

    For patients who are not unequivocally HER2 positive, what is the evidence on outcomes of treatment targeting the HER2 molecule (trastuzumab, etc.), or on differences in outcomes of a common chemotherapy or hormonal therapy regimen with versus without additional treatment targeting the HER2 molecule, in:

    • a)

      Breast cancer patients characterized by discrepant HER2 results from different tissue assay methods performed adequately; and

    • b)

      For those with HER2-negative breast cancer?

  • 3

    For breast cancer patients, what is the evidence on clinical benefits and harms of using HER2 assay results to guide selection of:

    • a)

      Chemotherapy regimen; or

    • b)

      Hormonal therapy?

  • 4

    What is the evidence that monitoring serum or plasma concentrations of HER2 extracellular domain in patients with HER2-positive breast cancer predicts response to therapy, or detects tumor progression or recurrence, and if so, what is the evidence that decisions based on serum or plasma HER2 assay results improve patient management and outcomes?

  • 5

    In patients with ovarian, lung, prostate, or head and neck cancers, what is the evidence that:

    • a)

      Testing tumor tissue for HER2; or

    • b)

      Monitoring serum or plasma concentrations of HER2;

either predicts response to therapy, or detects tumor progression or recurrence; and if so, what is the evidence that decisions based on HER2 assay results improve patient management and outcomes?

The first Key Question will be dealt with via a narrative review of the recent ASCO/CAP guidelines and evidence published subsequently.

Chapter 2. Methods

This report reviews and synthesizes available evidence on outcomes of using HER2 test results to manage patients with breast cancer or other solid tumors. Five Key Questions are addressed (see “Introduction”). After extensive consideration, we concluded that since a myriad of technical, biologic and performance matters influence HER2 diagnostic performance, that these variables could not be adequately captured in a systematic review. Thus, Key Question 1 will be addressed by a narrative review and Key Questions 2 through 5 will be addressed by systematic review.

This chapter describes the search strategies used to identify literature; criteria and methods used for selecting eligible articles; methods for data abstraction; methods for quality assessment; and, finally, the process for technical expert advice and peer review.

The methods of this review are generally applicable to all Key Questions except Key Question 1. However, as noted, there were variations in specific aspects of the methods as necessary to satisfy requirements of each question.

Peer Review

A technical expert panel provided consultation for the systematic review and reviewed the draft report. The draft report was also reviewed by 12 external reviewers, including invited clinical experts and stakeholders (Appendix D *). Revisions were made to the draft report based on reviewers' comments.

Study Selection Criteria

Types of Participants

For Key Questions 1–4, populations of interest are patients with breast cancer, with separate analyses for early stage patients receiving adjuvant therapy and those undergoing treatment for metastatic disease.

For Key Question 5, populations of interest are patients with cancers of the lung, ovary, prostate, and head and neck.

Types of Outcomes

In general, outcomes should be standard, valid, reliable, and clinically meaningful. Two types of outcomes are relevant to Key Question 1:

  • Diagnostic accuracy (e.g., analytic sensitivity, specificity, reliability, etc.);

  • Concordance between assay methods; and

Multiple levels of outcomes will be addressed for Key Questions 2 through 5:

  • Lead time for detection of progression, recurrence or metastasis.

  • Patient management decisions, which may be altered by test results;

  • Primary (health) outcomes, which may be affected through management changes guided by test results, such as:

    • Duration of survival, disease-free survival, progression-free survival, and/or time to failure or progression.

    • Quality of life.

    • Palliation of measurable symptoms.

    • Treatment-related adverse effects.

  • Secondary (intermediate) outcomes include:

    • Objective clinical response rates (complete and partial responses; separately and summed).

    • Pathologic complete response rates in patients undergoing neoadjuvant therapy followed by surgery.

    • Response durations.

Health outcomes will be given greatest emphasis. However, it will likely be necessary to construct causal pathways to connect assay results to health outcomes through patient management decisions.

Types of Interventions

The interventions of interest for Key Questions 1, 2, 3, and 5 are tissue assays to evaluate tumor HER2 status by:

  • Immunohistochemistry;

  • Fluorescence in-situ hybridization;

  • Chromogenic in-situ hybridization;

  • Polymerase chain reaction; or

  • Other methods.

The interventions of interest for Key Question 4, and also of interest for parts of Key Question 5, are assays to measure serum concentration of the HER2 extracellular domain.

Practice Settings

Interventions relevant to Key Questions 1–5 are used in the following settings:

  • Pathology and laboratory medicine.

  • Hospitals.

  • Outpatient surgery facilities.

  • Office-based practices.

Types of Studies

Following are study selection criteria specific to each key question.

HER2 assay results are influenced by multiple biologic, technical and performance factors. Since many aspects of HER2 assays were not standardized until very recently, we could not isolate effects of these disparate influences on assay results and patient classification.

This challenged the validity of using systematic review methods to compare available assay technologies. For that reason, we provide a narrative review of the following factors influencing HER2 test results and their use to classify patients: biologic processes, assay methods, and sources of variability.

Key Question 2. For patients who are not unequivocally HER2-positive, what is the evidence on outcomes of treatment targeting the HER2 molecule (trastuzumab, etc.), or on differences in outcomes of a common chemotherapy or hormonal therapy regimen with versus without additional treatment targeting the HER2 molecule, in:

  • a)

    Breast cancer patients characterized by discrepant HER2 results from different tissue assay methods performed adequately; and

  • b)

    For those with HER2-negative breast cancer?

Inclusion criteria

  • Randomized trials, or non-randomized studies (prospective or retrospective) on patients given a uniform chemotherapy regimen or hormonal treatment; that

  • Directly compare outcomes of treatment with versus without trastuzumab (or other HER2-targeted therapy); and also

  • Compare outcomes separately for one or more groups whose HER2 assay results are:

    • a)

      equivocal, or discordant by IHC and ISH, with results separately reported for IHC 2+ and 3+ cases (IHC 0 and 1+ cases may be pooled); or

    • b)

      unequivocally negative by both IHC and ISH.

Key Question 3. For breast cancer patients, what is the evidence on clinical benefits and harms of using HER2 assay results to guide selection of:

  • a)

    Chemotherapy regimen; or

  • b)

    Hormonal therapy?

Inclusion criteria

  • Randomized trials, prospective or retrospective studies on identically treated patients, including:

    • Identical hormonal therapy for all patients in studies on chemotherapy; and

    • Identical chemotherapy for all patients in studies on hormonal therapy; or

    • Separate reporting on identically treated groups.

  • Report outcomes of a breast cancer treatment regimen separately by HER2 status;

  • Report outcomes separately for patients undergoing treatment in the neoadjuvant, adjuvant or advanced (recurrent, refractory, or metastatic) settings

  • Report:

    • Pathologic response (i.e. objective tumor regression) rates for studies on neoadjuvant therapy;

    • Disease-free, relapse-free, recurrence-free or progression-free survival for studies on adjuvant therapy; and

    • Progression-free or overall survival for advanced disease.

  • Defined HER2 positivity consistently with the algorithm recommended in the ASCO/CAP guideline.

  • Included at least 20 HE4R2-positive patients.

Separate evidence tables and analyses will focus on:

  • Treatment setting (neoadjuvant, adjuvant or for advanced disease);

  • Chemotherapy regimens (e.g., anthracycline-based regimens, or a taxane); and

  • Hormonal therapies (e.g., tamoxifen versus aromatase inhibitors).

Key Question 4. What is the evidence that monitoring serum or plasma concentrations of HER2 extracellular domain in patients with HER2-positive breast cancer predicts response to therapy, or detects tumor progression or recurrence, and if so, what is the evidence that decisions based on serum or plasma HER2 assay results improve patient management and outcomes?

Inclusion criteria

  • Randomized trials, prospective single-arm studies, or retrospective series of identically treated patients; that

  • Measure serum or plasma HER2 concentrations in breast cancer patients, either at baseline or at multiple time points; and either:

    • Associate baseline values or changes in HER2 concentration with one or more outcomes of interest (primary or secondary); or

    • Compare outcomes of treatment decisions based on assay results with outcomes of decisions made in absence of assay results.

Key Question 5. In patients with ovarian, lung, prostate, or head and neck cancers, using tumor tissue HER2 or monitoring serum or plasma concentrations of HER2 predicts response to therapy, or detects tumor progression or recurrence. Inclusion criteria:

  • Randomized trials, prospective single-arm studies, or retrospective series of identically treated patients; that

  • Measure HER2 in tumor tissue, serum, or plasma from patients with ovarian, lung, prostate, or head and neck cancers, and either:

    • Associate HER 2 status from tissue assays, or baseline values or changes in serum or plasma HER2 concentration, with one or more outcomes of interest (primary or secondary; see above); or

    • Compare outcomes of treatment decisions based on tumor HER2 status, or serum or plasma assay results, with outcomes of decisions made in absence of test results.

Search Strategy and Review

Search Strategy

Electronic databases. The following databases were searched for citations. The full search strategy is displayed in Appendix A *. The search was not limited to English-language references; however, foreign-language references without abstracts were disregarded.

The MEDLINE® search was performed through 2/23/07. The EMBASE® search was performed through 2/23/07. The Cochrane Controlled Clinical Trials Register search was performed through 2/23/07. Search updates limited by the Cochrane clinical trial filter were performed for all 3 databases on 4/25/08.

Additional sources of evidence. The Technical Expert Panel and individuals and organizations providing peer review were asked to inform the project team of any studies relevant to the key questions that were not included in the draft list of selected studies.

We also examined the bibliographies of all retrieved articles for citations to any relevant study that was missed in the database searches. In addition, we sought studies published in conference published in conference proceedings and abstracts from the American Association for Clinical Chemistry (AACC), American Society of Clinical Oncology (ASCO), College of American Pathologists (CAP) and the San Antonio Breast Cancer Symposium (SABCS) over the past two years.

Search Screen

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-her2f1.jpg.

   Figure 1. QUOROM Diagram

Search results were stored in a ProCite® database. Using the study selection criteria for screening titles and abstracts, a single reviewer marked each citation as either: 1) eligible for review as full-text articles; 2) ineligible for full-text review; or 3) uncertain. Citations marked as uncertain were reviewed by a second reviewer and resolved by consensus opinion, with a third reviewer to be consulted if necessary. Using the final study selection criteria, review of full-text articles was conducted in the same fashion to determine inclusion in the systematic review. Of 6,337 citations, 666 articles were retrieved and 70 selected for inclusion (Figure 1). Records of the reason for exclusion for each paper retrieved in full-text, but excluded from the review, were kept in the ProCite® database (see Appendix B, Excluded Studies).

Data Extraction and Analysis

Data Elements

The data elements below were abstracted, or recorded as not reported, from included studies. Data elements to be abstracted were defined in consultation with the Technical Expert Panel.

Data elements from intervention studies (randomized, controlled trials, prospective single-arm studies, and retrospective consecutive series of identically treated patients) were:

  • Critical features of the study design (for example, patient inclusion/exclusion criteria, number of subjects, use of blinding)

  • Patient characteristics, including:

    • Age

    • Gender

    • Race/ethnicity

    • Disease and stage

    • Disease duration

    • Performance status

    • Other prognostic characteristics (e.g., estrogen or progesterone receptor status)

  • HER2 assay techniques (tissue versus serum, IHC, FISH, PCR, ELISA, scoring methods, cutoffs);

  • Treatment protocols (for example, regimen, dose, frequency, duration)

  • Patient monitoring procedures (for example, followup duration and frequency, outcome assessment methods) and

  • The specified key outcomes and data analysis methods (including techniques for assessing associations between HER2 findings and outcomes and methods for assessing treatment effect interactions)

Evidence Tables

Templates for evidence tables were created in Microsoft Excel® and Microsoft Word®. One reviewer performed primary data abstraction of all data elements into the evidence tables, and a second reviewer reviewed articles and evidence tables for accuracy. Disagreements were resolved by discussion, and if necessary, by consultation with a third reviewer. When small differences occurred in quantitative estimates of data from published figures, the values obtained by the two reviewers were averaged.

Assessment of Study Quality

For this systematic review we constructed a hierarchy of evidence quality for studies assessing HER2 status in predicting outcome. As addressed below, the continuum ranged from more informative specially designed randomized trials to less informative single-arm studies using univariate analyses. In addition to the hierarchy of evidence, we adapted acknowledged frameworks for evaluating the quality of prognostic or predictive studies. For assessing the quality of randomized trials, the general approach to grading evidence developed by the U.S. Preventive Services Task Force (Harris, Helfand, Woolf, et al., 2001) was applied. To assess the quality of predictive studies, we adapted the “Reporting Recommendations for Tumor Marker Prognostic Studies” (REMARK) statement (McShane, Altman, Sauerbrei, et al., 2005). The quality of included prospective, single-arm intervention studies and retrospective consecutive series of identically treated patients was assessed based on a set of study characteristics proposed by Carey and Boden (2003). The quality of the abstracted studies was assessed by two independent reviewers. Discordant quality assessments were resolved with input from a third reviewer, if necessary.

Evidence Hierarchy

Table 3

Hierarchy of study design and conduct for assessing HER2 status prediction of outcome
More informative Continuum Less informativeRandomized trial, randomization stratified on HER2 status OR patients randomized to HER2-guided treatment or non-HER2-guided treatment
Randomized trial, prespecified multivariate subgroup analysis
Randomized trial, post-hoc multivariate subgroup analysis
Randomized trial, treatment by HER2 subgroup analysis
Single-arm study, prespecified multivariate analysis
Single-arm study, post-hoc multivariate analysis
Single-arm study, univariate analysis
Table 3 shows the framework for evaluating how informative different designs and analytic strategies would be to predictions of outcomes according to HER2 status. The most informative scenario would be a trial in which randomized assignment to treatment groups would be stratified by HER2 status or patients were randomized to receive treatment guided by HER2 results or not (Conley and Taube, 2004). An adequately powered stratified randomization would allow valid inferences of treatment by HER2 interactions. Randomized trials generally are preferred because they convey the possibility of determining differences in the relative efficacy of two treatments, whereas single-arm studies can only assess the association between HER2 status and outcomes after a single treatment regimen. Subgroup analyses in randomized trials should ideally assess the significance of treatment effect interactions. Prespecified subgroups analyses guard against the problems of data dredging.

Post-hoc subgroup analyses may generate hypotheses, but may not support strong inferences about differential effectiveness. Multivariate subgroup analyses in randomized trials may be useful if the subgroup variable introduces imbalances between different variable by treatment combinations, particularly when only a subset of patients have tumor or serum specimens available. An alternative to multivariate subgroup analysis is cross tabulation of treatment by HER2 level results. The weakness of this approach is failure to control for imbalances in any important prognostic factors, particularly if the patients analyzed are a subset of those randomized. A formal test of interaction is preferred for any trial subgroup analysis. In single-arm (identically treated) studies, multivariate analyses may identify whether a variable is a significant independent predictor of treatment outcome while taking into account the separate influences of other predictors. The least informative situation would be a single-arm study that presents univariate comparisons of HER2 groups.

Assessment of Study Quality

Table 4

Interpretation rules for assessing quality of predictive studies
Quality CriterionRule
Prospective designApplies to original study design, whether predictive aspect was part of original focus or not.
Prespecified hypotheses about relation of marker to outcomeArticle must clearly state that investigation of relation of marker to outcome was prespecified primary or secondary objective of study. Must be coded no if original study design is retrospective. Retrospective analysis of originally prospective design is not a prespecified analysis (e.g., use of banked specimens).
Large, well-defined, representative study populationAt least 100 participants and must have at least 10 events (not participants) per candidate predictor variable.
Marker assay methods well-describedDetails or references available for detailed assay protocol including reagents or kits used, quality control procedures, reproducibility assessments, quantitation methods, scoring and reporting.
Blinded assessment of marker in relation to outcomeWere individuals assessing assay results blinded to outcomes?
Homogeneous treatment(s), either randomized or rule-based selectionAll patients within a study arm must be given the same treatment regimen (no differences in type and number of modalities). Exceptions made for members of a class within a modality or combinations that have been show to have comparable efficacy. Heterogeneity of treatment regimens allowable up to 5% of patient population.
Low rate of missing data (≤15%)Refers to number of participants originally enrolled.
Sufficiently long followupDepends on natural history of disease for patient population defined by stage and other prognostic factors.
Well-described, well-conducted multivariate analysis of outcome:
 1) clear candidate variable selectionMethods for selecting candidate variables should be clearly described.
 2) clear, appropriate model-building guidelinesModel building strategies should be based on previous evidence of predictive factors, not on arbitrary univariate significance levels or stepwise procedures.
 3) assumptions testedMention should be made, for example, that the proportional hazards assumption of the Cox regression was tested.
 4) standard prognostic variables includedA final model should include standard prognostic/predictive variables regardless of significance in univariate analysis.
 5) continuous variables well handledArbitrary cutoffs should be avoided, optimal cutoffs should be clearly explained, multiple analytic methods explored including keeping variable continuous and more than 2 categories.
 6) validationWas a validation procedure mentioned?
As stated, to assess the quality of predictive studies, we adapted the REMARK statement (McShane, Altman, Sauerbrei, et al., 2005). A checklist based on portions of REMARK and other sources (Gould Rothberg, and Bracken, 2006; Altman and Riley, 2005; Altman, 2001a, 2001b; Altman and Lyman, 1998; Brocklehurst and French, 1998; Altman, Lausen, Sauerbrei, et al., 1994; Simon and Altman, 1994) was developed. Table 4 identifies good quality characteristics that we looked for in predictive studies, including: prospective design; prespecified hypotheses about relation of marker to outcome; large, well-defined, representative study population; marker assay methods well-described; blinded assessment of marker in relation to outcome; homogeneous treatment(s), either randomized or rule-based selection; low rate of missing data (≤15 percent); sufficiently long followup; well-described, well-conducted multivariate analysis of outcome. Decision rules for evaluating each quality item are described in the table.

For assessing the quality of randomized trials, the general approach to grading evidence developed by the U.S. Preventive Services Task Force (Harris, Helfand, Woolf, et al., 2001) was applied.

  • a

    The quality of randomized, controlled trials will be assessed on the basis of the following criteria:

    • Initial assembly of comparable groups: adequate randomization, including concealment and whether potential confounders (e.g., other concomitant care) were distributed equally among groups.

    • Maintenance of comparable groups (includes attrition, crossovers, adherence, contamination).

    • Important differential loss to followup or overall high loss to followup.

    • Measurements: equal, reliable, and valid (includes masking of outcome assessment).

    • Clear definition of interventions.

    • All important outcomes considered.

    • Analysis: Adjustment for potential confounders, intention-to-treat analysis.

    Definition of ratings based on above criteria:

    • The rating of intervention studies encompasses the three quality categories described here.

    • Good: Meets all criteria: Comparable groups are assembled initially and maintained throughout the study (followup at least 80 percent); reliable and valid measurement instruments are used and applied equally to the groups; interventions are spelled out clearly; all important outcomes are considered; and appropriate attention is given to confounders in analysis. In addition, for randomized, controlled trials, intention to treat analysis is used.

    • Fair: Studies will be graded “fair” if any or all of the following problems occur, without the fatal flaws noted in the “poor” category below: In general, comparable groups are assembled initially but some question remains whether some (although not major) differences occurred with followup; measurement instruments are acceptable (although not the best) and generally applied equally; some but not all important outcomes are considered; and some but not all potential confounders are accounted for. Intention to treat analysis is done for randomized, controlled trials.

    • Poor: Studies will be graded “poor” if any of the following fatal flaws exists: Groups assembled initially are not close to being comparable or maintained throughout the study; unreliable or invalid measurement instruments are used or not applied at all equally among groups (including not masking outcome assessment); and key confounders are given little or no attention. For randomized, controlled trials, intention to treat analysis is lacking.

  • b

    The quality of included prospective single-arm intervention studies and retrospective consecutive series of identically treated patients was assessed based on a set of study characteristics proposed by Carey and Boden (2003), as follows:

    • Clearly defined question.

    • Well-described study population.

    • Well-described intervention.

    • Use of validated outcome measures.

    • Appropriate statistical analyses.

    • Well-described results.

    • Discussion and conclusion supported by data.

    • Funding source acknowledged.

Chapter 3. Results and Conclusions

Narrative Review for Key Question 1

What is the evidence on concordance and discrepancy rates for methods (e.g., FISH, IHC, etc.) used to analyze HER2 status in breast tumor tissue?

HER2 assay results are influenced by multiple biologic, technical and performance factors. Since many aspects of HER2 assays have not been standardized until very recently, the effects of these disparate influences could not be isolated. This challenged the validity of using systematic review methods to compare available assay technologies. For that reason, we provide a narrative review of the following factors influencing HER2 test results and their use to classify patients: biologic processes, assay methods, and sources of variability.

Biologic Processes that Influence Cell Membrane Levels of HER2 Protein

Genes such as those in the epidermal growth factor (EGF) receptor family (HER1 through HER4) affect cellular function through the proteins they encode. The HER2 gene is expressed and HER2 protein is found in membranes of all breast and other epithelial cells, and cut-points between “normal” and “overexpressed” levels of HER2 protein are imprecise. Nevertheless, studies have associated increased amounts of HER2 protein in cell membranes with more aggressive behavior of breast and other epithelial cancers and may predict treatment outcomes (Slamon, Clark, Wong, et al., 1987; Esteva, Pusztai, Symmans, et al., 2000; Rowinsky, 2004; Hynes and Lane, 2005; Ettinger, 2006; Serrano-Olvera, Duenas-Gonzalez, Gallardo-Rincon, et al., 2006).

Expression of HER2 and similar genes is a sequential process that (in a simplified overview) includes the following steps: transcription of DNA to messenger RNA (mRNA); processing mRNA to mature, translatable messages; and translation of mature mRNA to synthesize the protein's amino acid sequence. For many proteins (including HER2), additional steps required to produce functional molecules include: post-translational modification (e.g., glycosylation), three-dimensional folding, assembly of multi-subunit proteins, and movement to the relevant cellular site or organelle (not necessarily in this sequence).

We will discuss each of the following biologic mechanisms that potentially may increase the amount of HER2 protein in cell membranes:

  • A

    Increased gene copy number (i.e., more than diploid amounts of HER2 DNA in cell nuclei), by:

    • 1

      HER2 gene amplification, or

    • 2

      Chromosome 17 polysomy;

  • B

    Elevated HER2 protein levels in cells with diploid amounts of HER2 DNA, by

    • 1

      Increased rate of HER2 gene expression; or

    • 2

      Decreased degradation (increased stability) of HER2 mature message and/or protein.

Increased gene copy number

Gene amplification. In most HER2-positive cases, increased levels of HER2 protein in breast cancer cell membranes are attributable to an amplified HER2 gene (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Slamon, Clark, Wong, et al., 1987). Gene amplification increases the copy number for a segment from one arm of a chromosome (Albertson, 2006; Myllykangas and Knuutila, 2006); amounts of the central portion (centromere) and the chromosome's other arm remain unaltered. The amplified DNA segment (amplicon) can include one or several genes. It can be organized as extrachromosomal elements, as repeated units at a single locus (which lengthens the affected chromosome arm), or repeats can be spread throughout the genome. Typically, all or most copies of the amplified gene(s) are expressed, and amounts of the excess protein increase nearly exponentially with gene copy number per cell (Szollosi, Balazs, Feurenstein, et al., 1995; Konecny, Pegram, Venkatesan, et al., 2006).

The HER2 gene has been mapped to the long arm of chromosome 17, at position 17q12 (Vanden Bempt, Drijkoningen, and De Wolf-Peeters, 2007; Jarvinen and Liu, 2006; Kauraniemi and Kallioniemi, 2006; Mano, Rosa, De Azambuja, et al., 2007). Amplicon size can vary, with from two to ten (or more) other amplified genes mapping to the region from 17q12 to 17q21. Although not relevant to assays used to classify HER2 status of patients with breast cancer, note that the gene coding for the enzyme topoisomerase II-α (TOPIIA, a target of the anthracyclines) also is located in this segment. Co-amplification of these genes may be more relevant to predict outcomes of therapy with an anthracycline regimen than amplification of the HER2 gene alone, since excess TOPIIA activity is a potential mechanism of anthracycline resistance (see “Results and Conclusions, Key Question 3”).

Chromosome 17 polysomy. HER2 gene copy number also may rise if cells have more than two copies of chromosome 17. Obviously, cells that have replicated their DNA but not yet divided have four rather than two copies of each chromosome, thus also of the HER2 gene. But some breast or other cancer cells may have extra copies of one or more whole chromosomes (termed polysomy), and may stably pass this characteristic to daughter cells. Cells with chromosome 17 polysomy have extra copies of the HER2 gene, although the ratio of HER2 copy number to centromere copy number is the same as in diploid cells unless HER2 also is amplified. However, it is uncertain whether chromosome 17 polysomy is associated with overexpression of the HER2 protein (Vanden Bempt, Drijkoningen, and De Wolf-Peeters, 2007; Beser, Tuzlali, Guzey, et al., 2007; Corzo, Bellosillo, Corominas, et al., 2007; Hyun, Lee, Kim, et al., 2008; Torrisi, Rotmensz, Bagnardi, et al., 2007; Downs-Kelly, Yoder, Stoler, et al., 2005; Ma, Lespagnard, Durbecq, et al., 2005).

Elevated HER2 protein in cells with diploid HER2 DNA. Although uncommon, clinical investigators have reported breast cancer cases with elevated HER2 protein levels in malignant diploid cells (i.e., cells lacking amplified HER2 genes or polysomy 17; e.g., Mass, Press, Anderson, et al., 2005; Vogel, Cobleigh, Tripathy, et al., 2002; Pauletti, Godolphin, Press, et al., 1996). This probably arises through increased expression of the HER2 gene, although decreased rates of degradation for either the mRNA or protein are at least theoretically possible. Increased expression may involve enhanced rates of transcription, message processing, translation, and/or post-translational modification (selectively for the HER2 gene). Detailed review of mechanisms that may increase rates of these processes is outside this report's scope.

It is uncertain whether tumors with increased membrane HER2 protein but diploid HER2 DNA respond differently to therapies (targeted to the HER2 protein, or to others) than do tumors with amplified HER2 DNA that increases HER2 protein. It is also unknown if the route to excess HER2 protein (i.e., whether from increased mRNA production, protein synthesis, or decreased degradation of either) affects tumor biology and aggressiveness or treatment outcomes. In vitro data suggest that increased membrane HER2 protein affects cell physiology, proliferation, and treatment responses in the same way, regardless of how the excess is produced (Pierce, Arnstein, DiMarco, et al., 1991).

Tissue Assays Routinely Used in Clinical Practice to Determine HER2 Status of Breast Tumors

In current clinical practice, assays used to classify breast cancer patients with respect to HER2 status detect either HER2 protein or HER2 DNA (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Research laboratories use assays for HER2 mRNA to study molecular mechanisms and biologic regulation. They are technically more difficult than protein and DNA assays, and measure less-stable molecules. Although real-time reverse transcription polymerase chain reaction (RT-PCR) methods recently were adapted to measure HER2 mRNA in fixed, paraffin-embedded tissues and compared with IHC and ISH assays (Capizzi, Gruppioni, Grigioni, et al., 2008), RT-PCR assays for HER2 mRNA are still uncommon in clinical management of patients with breast cancer and thus are not included in this review.

Each method used to determine HER2 status applies results of a quantitative or semiquantitative assay to assign a binary (“yes/no”) classification. Thus, test results with each assay can vary with different scoring systems and thresholds for positivity. As discussed in a following section (“Postanalytic Factors”), scoring and thresholds may depend on choice of reagents to detect, visualize, and quantitate analytes. Scoring systems and thresholds also have changed over time, with standardized approaches recommended quite recently (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Data are lacking to determine whether differences in treatment outcome as a function of HER2 status are affected by reclassifying patients with currently recommended scoring systems and thresholds.

Methods to detect/measure amount of HER2 protein. Immunohistochemistry (IHC) is the assay used most widely for classifying HER2 status of breast cancer patients, since it uses techniques and equipment long used by most clinical pathology laboratories for other proteins such as estrogen and progesterone receptors (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). The assay incubates thin slices of fixed tissue on a microscope slide with an antibody to HER2, washes off unbound antibody, then visualizes bound antibody. Because IHC preserves tissue architecture and cellular structure (morphology), it permits scoring to focus on antibody specifically bound to membranes of invasive breast cancer cells. IHC also permits permanent storage of stained slides if later re-evaluation is needed.

IHC scoring systems consider the proportion of antibody-stained invasive cancer cells and the intensity of staining, a partly subjective judgment. Besides the U.S. Food and Drug Administration (FDA) -approved IHC kits (HercepTest™ and PATHWAY™; see Table 2, Introduction), various antibodies to HER2 protein are commercially available as analyte-specific reagents that can be used for independently developed (so-called “home-brew”) assays (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003; Hicks and Kulkarni, 2008). Some are polyclonal, with a mix of antibody molecules that may recognize different binding site (epitopes) on the HER2 protein. Others are monoclonal, homogeneous molecules that recognize a single epitope. These differences may lead to discrepant results with different antibody reagents (Press, Hung, Godolphin, et al., 1994). Other sources of variability in IHC results are discussed in the following section, “Sources of Variability in Classifying HER2 Status.”

Protein assays on homogenized tissue may use antibody to visualize HER2 after separating proteins in a solid matrix (Western blots), or quantitate HER2 by enzyme-linked immunosorbent assay (ELISA). These assays destroy the analyzed tissue samples. Additionally, tissue extracts may mix proteins from cytosol, membranes, and other organelles; and also from multiple cell types: normal breast, inflammatory cells, in situ tumor, and invasive cancer. HER2 levels of in situ breast tumor cells often are elevated, for uncertain reasons and with inadequately studied clinical implications (Allred, Clark, Tandon, et al., 1992; Hoque, Sneige, Sahin, et al., 2002; Collins and Schnitt, 2005). Guidelines stress avoiding areas of ductal carcinoma in situ when scoring assay results (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Nearly all clinical studies on HER2 protein assays to predict treatment outcomes used IHC on tissue slices rather than assays on tissue homogenates, and assigned HER2 status by amount of HER2 protein in membranes of invasive breast cancer cells.

Methods to detect/measure HER2 gene copy number or amount of HER2 DNA. In situ hybridization (ISH) is the most commonly used method to measure HER2 gene copy number in tissue samples from breast cancer patients (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007; Ross, Fletcher, Linette, et al., 2003; Hicks and Kulkarni, 2008). It uses a labeled probe complementary to the DNA sequence of interest (here, a unique segment from the HER2 gene). Double-stranded DNA in cell nuclei of the fixed tissue sample is denatured so the probe can hybridize (bind) to its complementary sequence, then unbound probe is washed away. As with IHC, tissue preparation for ISH preserves tissue and cell morphology, and scoring focuses on invasive breast cancer cells.

The gene-specific probes are visualized in one of three ways: by fluorescence (FISH), a chromogenic reaction (CISH; uses digoxigenin), or silver deposition (SISH; uses dinitrophenol for enzymatic metallography). FISH requires a fluorescence microscope (more expensive and unavailable in some smaller pathology laboratories), while CISH and SISH use routine light (brightfield) microscopy. Three FDA-approved kits are available for HER2 testing by FISH (PathVysion®, Inform™, and HER2 FISH pharmDx™), while kits for CISH (SPoT-Light) and SISH (EnzMet GenePro™) are not yet approved (see Table 2, Introduction). Slides prepared for FISH testing lose fluorescence, thus, cannot be stored for later review. In contrast, slides prepared for either CISH or SISH can be archived and re-evaluated. Additionally, it is sometimes difficult to identify invasive tumor cells with fluorescence microscopy. All three ISH methods require more time per sample than IHC for slide scoring. Because they were developed recently, fewer clinical studies used CISH or SISH than FISH to classify HER2 status of breast cancer patients.

In ISH assays, pathologists count fluorescent (FISH) or dark-colored (CISH, SISH) spots visible above the nucleus to measure HER2 gene copy number: two in diploid cells; more in cells with amplified HER2 or polysomy 17. Typically, one determines gene copy number for multiple invasive cancer cells on the slide, and averages results for the tissue sample. In some ISH assays, slides are hybridized simultaneously with two probes that fluoresce in or show different colors, to permit copy number measurement for the HER2 gene and chromosome 17 centromere (CEP17). With this approach, HER2 gene status is defined by the ratio of HER2 to CEP 17 copy numbers: greater than 2 if amplified, but approximately 2 if unamplified whether chromosome 17 polysomy is absent or present.

Early research studies extracted DNA from tissue homogenates and measured amounts of the HER2 gene by Southern or slot blots, or by quantitative polymerase chain reaction (PCR) assays. Southern blots first separate DNA molecules by their mobility in a matrix, while slot blots use the mixed extract. Each selectively visualizes the DNA sequence of interest by hybridizing to labeled probes as in ISH. PCR assays amplify (selectively replicate) DNA sequences of interest in vitro, detect them by fluorescent or other probes, and quantify the starting amount using standard curves. As with protein assays on tissue homogenates, these techniques dilute DNA from invasive cancer cells with DNA from surrounding normal tissues and inflammatory cells (Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). They also consume the samples they analyze. Southern and slot blots are less sensitive than PCR and require substantially larger amounts of DNA. Southern blot assays also are labor intensive and less widely available in clinical pathology labs. The remainder of this review focuses on IHC and ISH methods, the only HER2 assays with FDA-approved kits available for clinical use.

Sources of Variability in Classifying HER2 Status

Accurately determining HER2 status depends on proper performance of preanalytic, analytic, and postanalytic steps (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Preanalytic steps are those involved in obtaining, preserving (fixing), and storing tissue samples prior to staining and analysis. Analytic steps prepare and stain fixed tissue samples with antibody to HER2 for IHC, or prepare and hybridize them to HER2 gene probe for ISH, then visualize tissue-bound antibody or probe. Postanalytic steps score test results, classify patients, and assure test quality, consistency, and reproducibility. Some processes for these steps are the same for IHC or ISH, but many differ.

Preanalytic: tissue processing and storage. HER2 tests can use tissue from core (incisional) biopsy or tumor excised for biopsy, lumpectomy, or mastectomy (Wolff, Hammond, Schwartz, et al., 2007a). Tissue sources can be the primary tumor or a lymph node or distant metastasis (Carlson, Moench, Hammond, et al., 2006). While uncommon, studies have reported discordances in HER2 status between primary tumor and metastases (for references, see Carlson, Moench, Hammond, et al., 2006). Retesting HER2 status if metastases develop after a long disease-free or progression-free interval may be warranted, depending on where and how HER2 status of the primary tumor was determined.

Tissues are prepared and preserved for assays by slicing larger samples, fixing in a denaturing solution, and embedding fixed tissue for long-term storage (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Factors that may influence test results include: edge, retraction, or crush artifacts with some core needle biopsies; time from excision to slicing, and to fixation; type of and time in fixative; choice of embedding material; and conditions and duration of storage for fixed and embedded tissues.

Table 5

Summary of ASCO/CAP guideline recommendations (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007a)
Recommendation
Optimal algorithm for HER2 testingPositive for HER2 is either IHC HER2 3+ (defined as uniform intense membrane staining of >30% of invasive tumor cells) or FISH amplified (ratio of HER2 to CEP17 of > 2.2 or average HER2 gene copy number > six signals/nucleus for those test systems without an internal control probe)
Equivocal for HER2 is defined as either IHC 2+ or FISH ratio of 1.8–2.2 or average HER2 gene copy number four to six signals/nucleus for test systems without an internal control probe
Negative for HER2 is defined as either IHC 0–1+ or FISH ratio of < 1.8 or average HER2 gene copy number of < four signals/nucleus for test systems without an internal control probe
These definitions depend on laboratory documentation of the following:
  • 1

    Proof of initial testing validation in which positive and negative HER2 categories are 95% concordant with alternative validated method or same validated method for HER2

  • 2

    Ongoing internal QA procedures

  • 3

    Participation in external proficiency testing

  • 4

    Current accreditation by valid accrediting agency

Optimal FISH testing requirementsFixation for fewer than 6 hours or longer than 48 hours is not recommended
Test is rejected and repeated if
  • Controls are not as expected

  • Observer cannot find and count at least two areas of invasive tumor

  • >25% of signals are unscorable due to weak signals

  • >10% of signals occur over cytoplasm

  • Nuclear resolution is poor

  • Autofluorescence is strong

Interpretation done by counting at least 20 cells; a pathologist must confirm that counting involved invasive tumor
Sample is subjected to increased counting and/or repeated if equivocal; report must include guideline-detailed elements
Optimal IHC testing requirementsFixation for fewer than 6 hours or longer than 48 hours is not recommended
Test is rejected and repeated or tested by FISH if
  • Controls are not as expected

  • Artifacts involve most of sample

  • Sample has strong membrane staining of normal breast ducts (internal controls)

Interpretation follows guideline recommendation
  • Positive HER2 result requires homogeneous, dark circumferential (chicken wire) pattern in >30% of invasive tumor

  • Interpreters have method to maintain consistency and competency

Sample is subjected to confirmatory FISH testing if equivocal based on initial results
Report must include guideline-detailed elements
Optimal tissue handling requirementsTime from tissue acquisition to fixation should be as short as possible; samples for HER2 testing are fixed in neutral buffered formalin for 6–48 hours; samples should be sliced at 5–10 mm intervals after appropriate gross inspection and margins designation and placed in sufficient volume of neutral buffered formalin
Sections should ideally not be used for HER2 testing if cut >6 weeks earlier; this may vary with primary fixation or storage conditions
Time to fixation and duration of fixation if available should be recorded for each sample
Optimal internal validation procedureValidation of test must be done before test is offered
Initial test validation requires 25–100 samples tested by alternative validated method in the same laboratory or by validated method in another laboratory
Proof of initial testing validation in which positive and negative HER2 categories are 95% concordant with alternative validated method or same validated method for HER2
Ongoing validation should be done biannually
Optimal internal QA proceduresInitial test validation
Ongoing quality control and equipment maintenance
Initial and ongoing laboratory personnel training and competency assessment
Use of standardized operating procedures including routine use of control materials
Revalidation of procedure if changed
Ongoing competency assessment and education of pathologists
Optimal external proficiency assessmentParticipation in external proficiency testing program with at least two testing events (mailings)/year
Satisfactory performance requires at least 90% correct responses on graded challenges for either test
  • Unsatisfactory performance will require laboratory to respond according to accreditation agency program requirements

Optimal laboratory accreditationOnsite inspection every other year with annual requirement for self-inspection
  • Reviews laboratory validation, procedures, QA results and processes, results and reports

  • Unsatisfactory performance results in suspension of laboratory testing for HER2 for that method

Abbreviations: HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry; FISH, fluorescent in situ hybridization; QA, quality assurance.

Guidelines seeking to standardize methods were not published until recently (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007), although prior consensus conferences (cited in the guidelines) recommended many of the same methods. Importantly, the recommended preanalytic steps are identical for tissues to be tested by IHC or ISH; these are summarized in a following section (see Table 5 in “Current Guideline Recommendations”). Systematic reviews conducted for the American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) and National Comprehensive Cancer Network (NCCN) guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006) reported data were lacking to evaluate effects of nonadherence on test results for some aspects of tissue processing. The published guidelines did not include evidence tables summarizing effects of nonadherence on test results for those aspects of tissue processing that have been evaluated comparatively.

Notably, the literature review for this report showed that most studies reporting concordance and discordance rates of different IHC and ISH assays used archived samples, fixed and embedded elsewhere than the laboratory performing the HER2 assays. With exceptions, most publications did not report adequately on adherence to guideline or prior (consensus) recommendations for tissue processing.

Analytic: performing HER2 assays. Analytic steps for processing thin sections of fixed and embedded tissue cut onto glass slides differ for IHC and ISH assays (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Each begins by deparaffinizing thin tissue sections, but IHC assays use an antigen retrieval step that optimizes antibody binding to HER2 protein while ISH assays first unwind (denature) cells' double-stranded DNA so that the probe can hybridize to its complementary sequence. The temperature and duration of heating used to bake tissue sections on slides, as well as the conditions used for antigen retrieval, can introduce variability in IHC results. Each assay incubates slides with an analytic reagent (antibody for IHC; probe for ISH), removes unbound reagent in one or more washing steps, and incubates with other reactants to visualize bound analytic reagent. Some steps can be automated, which improves consistency and reproducibility if equipment is well-maintained and regularly calibrated. In addition to reagent choice (which antibody, for IHC; which DNA probe, for ISH), varying the conditions (temperatures, durations, etc.), solutions, and reactants used for each step can affect test results, as can poorly maintained or calibrated automated equipment.

While FDA-approved kits include protocols with optimized methods for each analytic step, guideline publications report that approximately half of surveyed laboratories did not adhere completely to protocol methods (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006). The guidelines stress the need to train and periodically assess the skills of staff conducting these assays, and that each run should include standardized positive and negative controls. They also emphasize that each laboratory offering HER2 testing services should validate its test results against a previously validated test, and that laboratories departing from protocol-specified methods with FDA-approved kits, and those using independently developed assays with analyte-specific reagents, should validate test results against established methods and develop their own standard protocols.

As with preanalytic steps, most published studies did not adequately report information needed to evaluate complete adherence with guideline or prior (consensus) recommendations on all analytic steps. Studies that used FDA-approved kits rarely commented on protocol adherence in the methods sections of their reports, and studies that used independently developed assays rarely described assay validation against approved kits.

Postanalytic factors. IHC scoring systems and positivity thresholds have changed over time, and these changes likely alter the proportion of patients classified as HER2 positive (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Some studies on archived tissues classified tumors as HER2 positive if any invasive cells showed strong, complete membrane staining (e.g., Paik, Bryant, Park, et al., 1998; Houston, Plunkett, Barnes, et al., 1999; Paik, Bryant, Tan-Chiu, et al., 2000). Others classified samples as HER2 positive if 1 percent or more of invasive cells were stained (e.g., MacGrogan, Mauriac, Durand, et al., 1996; Elledge, Green, Ciocca, et al., 1998; Di Leo, Larsimont, Gancberg, et al., 2001); yet others, only if 50 percent or more were stained (e.g., Agrup, Stal, Olsen, et al., 2000; Berry, Muss, Thor, et al., 2000; Colozza, Sidoni, Mosconi, et al., 2005). Few studies adopted (or adapted) Allred's system (Harvey, Clark, Osborne, et al., 1999; developed for IHC assays of estrogen receptors), which rates the proportion of stained invasive cells (from 0 to 5) and the intensity of staining (from 0 to 3), then adds for a final score between 0 and 8.

The scale recommended in FDA-approved IHC kits (0 to 3+; developed for HercepTest™ but also used with PATHWAY™) requires membrane staining in 10 percent or more of invasive cells for scores greater than 0. The scale assigns positive scores by staining intensity and totality of membrane staining: 1+ is faint or barely perceptible staining that is incompletely circumferential; 2+ is moderate intensity but complete circumferential staining; and 3+ is strong intensity and complete circumferential staining (www.dakousa.com/prod_downloadpackageinsert.pdf?objectid_105073003). However, some studies that used this scale defined HER2-positive cases as those scored 2+ or 3+, while others classified only those with a score of 3+ as HER2 positive. The ASCO/CAP guideline retains the original definitions for scores of 0 to 2+, but recommends scoring IHC 3+ only if more than 30 percent of invasive breast cancer cells show dark, homogeneous, circumferential membrane staining in a “chicken wire” pattern (Wolff, Hammond, Schwartz, et al., 2007a). Adequate data are lacking to compare accuracy or concordance for this wide variety of scoring systems and thresholds used to classify patients' HER2 status by IHC alone. However, in one recent study (Hameed, Chhieng, and Adams, 2007), three pathologists blinded to FISH results scored IHC-stained slides from 98 breast cancer cases separately using cut-offs of 10 percent, 30 percent, and 50 percent of stained cells to classify samples as HER2+. Specificity of IHC versus FISH was 82 percent, 86 percent, and 87 percent, respectively, for the three increasing cut-offs, while concordance rates of 3+ cases with FISH were 59 percent, 64 percent, and 65 percent.

Scoring and categorizing results of ISH assays also varies (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Guidelines stress that precision and accuracy depend on the number of cells counted and averaged, on accurately identifying and only counting invasive cells, and on counting invasive cells from two or more separate areas of each tumor on either the same or sequential slide(s) (Wolff, Hammond, Schwartz, et al., 2007a). With assays estimating gene copy number per cell without normalizing to a CEP17 probe, most published studies using FISH classified tissues averaging more than 4.0 copies per cell as HER2 positive (for references, see Wolff, Hammond, Schwartz, et al., 2007; Carlson, Moench, Hammond, et al., 2006; Laudadio, Quigley, Tubbs, et al., 2007a). Most published studies using CISH scored samples HER2 positive if the average gene copy number per cell was greater than 5, although some followed the manufacturer's recommendation and defined low-level amplification as copy numbers between 6 and 10. In contrast to published studies with FISH or CISH, recent guidelines consider average scores greater than 6.0 as FISH positive, scores less than 4.0 as FISH negative, and scores between 4.0 and 6.0 as equivocal (ASCO/CAP) or borderline (NCCN). Most studies that normalized to CEP17 classified HER2 to CEP17 ratios greater than 2.0 as HER2 positive (for references, see Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Laudadio, Quigley, Tubbs, et al., 2007). The guidelines consider a HER2/CEP17 ratio greater than 2.2 as positive, a ratio less than 1.8 as negative, and ratios between 1.8 and 2.2 as equivocal (ASCO/CAP) or borderline (NCCN). As with IHC scoring and thresholds, data are lacking to evaluate consequences of the newer classification criteria on accuracy or concordance.

Guidelines and reviews caution that assigning HER2 status is partially subjective and potentially inconsistent because IHC and FISH scoring criteria are variably interpreted and applied by different raters (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Hanna, O'Malley, Barnes, et al., 2007; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). Expert panels and reviewers emphasize that image analysis methods, using digital microscopy and automated cellular imaging systems (e.g., Bloom and Harrington, 2004; McCabe, Dolled-Filhart, Camp, et al., 2005; Tubbs, Pettay, Swain, et al., 2006; Ciampa, Xu, Ayata, et al., 2006; Tawfik, Kimler, Davis, et al., 2006; Moeder, Giltnane, Harigopal, et al., 2007), can decrease inter-rater variability and thus improve scoring consistency, accuracy, and precision, particularly for IHC assays. However, this requires careful validation and periodic recalibration of automated systems against standardized positive, negative, and equivocal control samples. Nevertheless, a study testing agreement between pathologists reported that use of digital microscopy to score IHC improved concordance with FISH and also decreased inter-rater variability (Bloom and Harrington, 2004).

Postanalytic steps also include reporting elements that should be provided to clinicians ordering HER2 testing, as well as quality assurance procedures (laboratory accreditation and proficiency testing; competency assessment for pathologists). However, these issues are outside the scope of this report. Readers are referred to recommendations in current guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007).

Is There a “Best” Method to Determine HER2 Status from Breast Tumor Tissue?

Although many studies reported concordance and discrepancy rates for collections of breast tumor tissue tested for HER2 status by IHC with different antibodies, or by IHC and ISH assays, or by multiple ISH assays, current evidence does not suggest one HER2 assay is superior to all others (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hicks and Kulkarni, 2008; Laudadio, Quigley, Tubbs, et al., 2007; Ross, Fletcher, Linette, et al., 2003). As described previously, preanalytic, analytic and postanalytic methods varied between studies, and all studies preceded guidelines for standardizing these methods. Additionally, data are lacking to fully evaluate effects of nonadherence with certain guideline recommendations on test results. Thus, it is difficult (perhaps impossible) to isolate effects of individual factors that contribute to discordance. As detailed above, these include differences in:

  • Fixing and embedding tissues, preparing and staining them for assays, or scoring and classifying test results;

  • Inherent differences in antibody binding, epitope stability, or antigen retrieval when comparing different antibodies used for IHC;

  • Different biologic mechanisms that can increase membrane HER2 protein, when comparing IHC assays versus ISH assays; or differences in sensitivity and specificity of diverse DNA probes and visualization techniques when comparing different ISH methods.

Identifying one “best” HER2 test clearly requires better comparative data than presently available, with assays that standardized key aspects of preanalytic, analytic, and postanalytic steps in HER2 assay methods.

The lack of a gold standard to determine breast tumors' HER2 status also prevents agreement on one “best” HER2 assay. Furthermore, seeking a single gold standard may be unrealistic, since HER2 status is used in different ways. The optimal assay (or combination of assays) may differ for HER2 as a prognostic marker, as a marker to predict clinical benefit from trastuzumab, or as a marker to predict benefit from a chemotherapy drug class (e.g., an anthracycline or a taxane). For example, HER2 gene amplification may best predict tumor aggressiveness hence prognosis, while membrane density of HER2 protein may best predict trastuzumab binding to tumor cells and thus clinical response. Furthermore, HER2 may only be a surrogate marker for other molecular alterations that more directly impact tumor cell sensitivity to certain chemotherapy drugs (e.g., anthracyclines).

Outcomes of well-designed and adequately powered comparative clinical trials with sufficient followup duration may be a gold standard to evaluate HER2 assays as predictors of treatment benefit. However, even the large randomized, controlled trials on adjuvant trastuzumab (Romond, Perez, Bryant, et al., 2005; Piccart-Gebhart, Procter, Leyland-Jones, et al., 2005; Slamon, Eiermann, Robert, et al., 2005; Joensuu, Kellokumpu-Lehtinen, Bono, et al., 2006) may not have adequately standardized preanalytic steps at local hospitals, did not test all patients with at least two assays, treated few patients with discordant results by different assays conducted in central laboratories; and presently lack sufficient followup to compare outcomes in subgroups of the main treatment arms (see “Results and Conclusions, Key Question 2”).

Current guidelines acknowledge present uncertainty, permit clinicians and laboratories to choose an initial HER2 assay method, and recommend confirming results with an alternative assay when initial tests are equivocal (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007).

Current Guideline Recommendations

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-her2f2.jpg.

   Figure 2. Algorithm for immunohistochemistry (IHC). (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007b)

Current guidelines recommend very similar algorithms for using well-validated IHC and ISH assays to classify breast cancer patients with respect to HER2 status (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). The algorithm shown in Figure 2, describes possible results, decision-making, and confirmatory testing when IHC is the initial test. All three guidelines agree that an IHC score of 3+ is definitively HER2 positive, a score of 0 or 1+ is definitively HER2 negative, and a score of 2+ is equivocal and requires ISH followup testing to determine HER2 status. In contrast to the other guidelines, the NCCN Task Force (Carlson, Moench, Hammond, et al., 2006) did not specify that an IHC 3+ score requires complete membrane staining in more than 30 percent of invasive cells. The ASCO/CAP expert panel recommended this change from FDA labeling (which requires staining in more than 10 percent of invasive cells), primarily to decrease the number of patients with false-positive results who might be given trastuzumab but are unlikely to benefit (Wolff, Hammond, Schwartz, et al., 2007a). This recommendation anticipates that true positives with equivocal IHC results will be correctly classified by followup ISH. However, data are currently lacking to test this hypothesis.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-her2f3.jpg.

   Figure 3. HER2 testing algorithm when ISH is the initial test (Reprinted with permission from the American Society of Clinical Oncology; Wolff, Hammond, Schwartz, et al., 2007a)

Figure 3 provides a similar algorithm if FISH is the initial test. The guidelines suggest that well-validated alternatives (CISH or SISH, currently available in the U.S. only as independently developed assays) probably can replace FISH. The algorithm considers HER2 gene copy numbers from 4.0 to 6.0 or HER2/CEP17 ratios between 1.8 and 2.2 as equivocal ISH results. It recommends additional cell counting, retesting by a reference laboratory, or followup testing by IHC before classifying equivocal cases. The other guidelines agree with this recommendation (Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). No studies reviewed for this report followed this recommendation; thus, data are lacking to determine whether confirmatory followup testing on patients with equivocal ISH results improves the accuracy of HER2 status as a predictor for treatment outcomes.

Importantly, the guidelines' treatment recommendations are not identical for all patients whose assay results remain in the equivocal range after additional cells are counted, a different assay method is used, and/or testing is repeated on another tumor section. The recommendation depends on whether the patient would have been included in or excluded from key randomized, controlled trials. For example, patients with HER2/CEP17 ratios 2.0 or greater but less than 2.2 were included and randomized in the adjuvant trastuzumab trials. Therefore, the guidelines view current evidence as too weak to deny such patients adjuvant therapy that includes trastuzumab. In contrast, patients with HER2/CEP17 ratios 1.8 or greater but less than 2.0 were excluded from these trials, and the guidelines view current evidence as too weak to support including trastuzumab in their adjuvant therapy regimens. Figures 2 and 3 include information on trial eligibility of patients whose test results are equivocal by each HER2 assay.

Interestingly, a recent study reported on 17 patients with breast core biopsy specimens showing invasive carcinoma and equivocal FISH results (HER2/CEP17 ratios between 1.8 and 2.2) (Striebel, Bhargava, Horbinski, et al., 2008). These patients were subsequently re-evaluated by IHC and FISH testing on resection specimens. For 10 of the 17 cases, equivocal results obtained with biopsy specimens were definitively resolved by retesting of resection specimens. Four patients were classified HER2 positive and treated with trastuzumab, while six were classified HER2 negative and managed without trastuzumab.

Other recommendations in the ASCO/CAP guideline focus on good laboratory practices for each preanalytic, analytic, and postanalytic step of IHC and ISH assays (Wolff, Hammond, Schwartz, et al., 2007a). They provide a more explicitly detailed set of recommendations than included in the other two guidelines (Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Table 5 reprints the summary of recommendations from the ASCO/CAP guideline. The remainder of this narrative review for Key Question 1 summarizes evidence published after these guidelines on the following four topics, and discusses unresolved issues and uncertainties:
  • Concordance and discordance of different assay methods

  • Discordance between central and local laboratory results

  • Validation and proficiency testing

  • Reports on polysomy 17

Evidence Reported Post-ASCO/CAP Guidelines on Concordance and Discrepancy of HER2 Assay Results

Concordance/discordance of different assay methods. Evidence reviewed by the ASCO/CAP expert panel (Appendixes C and G in Wolff, Hammond, Schwartz, et al., 2007a) led to consensus definitions for unequivocal IHC and ISH results. As shown in Figures 2 and 3, and in Table 5, the panel defined unequivocal HER2-positive results by IHC (i.e., 3+) as greater than 30 percent of invasive cells strongly stained in a homogeneous, circumferential “chicken-wire” pattern, and by ISH as HER2 gene copy number per cell greater than 6 or HER2/CEP17 ratio greater than 2.2. They defined unequivocal HER2-negative results by IHC as scores of 0 or 1+, and by ISH as HER2 gene copy number per cell less than 4.0 or HER2/CEP17 ratio less than 1.8. Equivocal results (defined as 2+ by IHC, HER2 gene copy number from 4.0 to 6.0, or HER2/CEP17 ratio from 1.8 to 2.2) probably imply low-level HER2 amplification and/or overexpression, and should not be considered discordant, whether results of followup testing are positive or negative. Some but not all of these samples may actually have an amplified HER2 gene, but require additional testing to define the patient's correct HER2 status. The ASCO/CAP expert panel found insufficient evidence to determine whether breast cancer patients with equivocal HER2 results benefit from HER2-targeted therapy, although as discussed above, some patients included in adjuvant trastuzumab trials fit this category (also see “Results and Conclusions, Key Question 2”).

For purposes of this review, discordant results are operationally defined as unequivocally positive results by one assay method and unequivocally negative results by a different assay method on sections from the same tumor, with both assays conducted using good laboratory practices, as recommended in the ASCO/CAP guideline (Wolff, Hammond, Schwartz, et al., 2007a). Presently, evidence is lacking to estimate discordance rates from studies that followed all ASCO/CAP recommendations on tissue preparation, testing practices, scoring systems, and thresholds to classify HER2 status of breast cancer patients. Therefore, in the following sections, we summarize evidence on discordance rates reported after the guideline was published by studies that used scoring systems and thresholds similar to those originally specified in U.S. Food and Drug Administration (FDA) -approved kits for IHC and ISH assays.

Investigators from the National Surgical Adjuvant Breast and Bowel Project's (NSABP) central pathology laboratory and colleagues at NSABP-approved reference laboratories conducted IHC (HercepTest™) and FISH (PathVysion®) assays on formalin fixed, paraffin embedded tumor blocks (Paik, Kim, Jeong, et al., 2007; Paik, Kim, and Wolmark, 2008). They reported results with both assays for 1,787 of 2,043 patients enrolled in the NSABP B31 randomized, controlled trial on adjuvant therapy with versus without trastuzumab (Romond, Perez, Bryant, et al., 2005). Of these, they found FISH-negative, IHC 3+ discordant results in 31 cases (1.7 percent). They also reported FISH-positive, IHC 0, 1+, or 2+ results in another 125 cases (7 percent), but did not separately report the proportion of those who tested FISH positive and IHC 0 or 1+.

Central and reference laboratory results with both IHC (HercepTest™) and FISH (PathVysion®) assays also are available (Perez, Romond, Suman, et al., 2007) for 1,779 of the 2,535 patients registered in a similar randomized, controlled trial conducted by the North Central Cancer Treatment Group (NCCTG N9831; Romond, Perez, Bryant, et al., 2005). Investigators reported discordant IHC 3+, FISH-negative results in 53 cases (3 percent), and FISH-positive, IHC 0, 1+, or 2+ results in 218 cases (12.3 percent). Here again, separate results were not reported for the proportion who tested FISH positive and IHC 0 or 1+. Data presently are unavailable on IHC/ISH discordance rates from three other randomized, controlled trials of adjuvant trastuzumab (Piccart-Gebhart, Procter, Leyland-Jones, et al., 2005; Slamon, Eiermann, Robert, et al., 2005; Joensuu, Kellokumpu-Lehtinen, Bono, et al., 2006).

In a retrospective study, a Canadian central reference laboratory used HercepTest™ and three other HER2 antibody IHC assays to retest tumors from patients diagnosed with metastatic breast cancer between 1999 and 2002, and compared the IHC results with central lab FISH using PathVysion® (O'Malley, Thomson, Julian, et al., 2008). Among 505 patients initially classified HER2 positive by IHC in local labs and treated with trastuzumab for metastatic disease, concordance between central IHC and central FISH ranged from 88.9 percent to 90.9 percent, depending on the HER2 antibody used. Concordance between IHC and FISH was highest (92.2 percent) when all four HER2 antibody assays were used to test each sample, and tumors were only classified IHC positive if positive by 2 or more assays. In a sequential sample of 205 invasive breast tumors locally classified IHC negative, from patients diagnosed with metastasis, concordance of central IHC and central FISH ranged from 93.7 percent to 99 percent for individual antibody assays, and was 98.1 percent if tumors were only classified IHC negative if negative by 2 or more assays. However, this study did not report FISH/IHC discordance rates separately by IHC score.

A study from Greece that separately compared IHC results (using HercepTest™ and two other methods) from central and regional laboratories versus central FISH (PathVysion®) reported on 375 breast tumors tested centrally by IHC and FISH (Papadopoulos, Kouvatseas, Skarlos, et al., 2007). FISH-positive, IHC 0/1+ discordances were seen in six cases (1.6 percent; 11.5 percent of 52 IHC 0/1+ cases), while FISH-negative, IHC 3+ discordances were seen in three cases (0.8 percent; 9.4 percent of 32 IHC 3+ cases). Another study from three Greek hospitals compared IHC results (CB11 antibody) with FISH (PathVysion®) for 194 resected breast cancer patients, and also with CISH (SpoT-Light) for 159 of these patients (Kostopoulou, Vageli, Kaisaridou, et al., 2007). This study reported no FISH-positive cases and only one CISH-positive case among 94 IHC 0/1+ patients. Of 30 patients with IHC 3+ results, one (3.3 percent) was FISH negative and CISH negative.

A study from Germany on patients evaluated for inclusion in a trial of trastuzumab for metastatic breast cancer reported central IHC (HercepTest™) and FISH (PathVysion®) results for 289 patients (Hofmann, Stoss, Gaiser, et al., 2008). Investigators reported no FISH-positive cases among 100 patients scored IHC 0/1+, and nine FISH-negative but IHC 3+ cases (8.4 percent of 107 scored IHC positive; 3.1 percent of all patients evaluated).

A small study (n=55) compared two dual-probe (i.e., for HER2 and CEP17) FISH kits (PathVysion® and HER2 FISH pharmDx), a single-probe FISH kit (Inform; HER2 only) and the SpoT-Light CISH kit versus two IHC assays (HercepTest™ and an independently developed test) (Cayre, Mishellany, Lagarde, et al., 2007). Investigators reported results with each assay (and with different positivity thresholds for Inform and SpoT-Light) separately for each sample. Four of 55 (7.3 percent) cases tested IHC 3+ with HercepTest™ and ISH-negative by all assays (other than a threshold of more than four signals for Inform). Three of the four were scored less than 3+ by independently developed IHC. All cases scored FISH positive by two or more kits also were scored IHC 3+ by HercepTest™.

Another small study (n=54) used the HercepTest™ and PathVysion® kits on all samples (Kuo, Wang, Chang, et al., 2007). Three cases (5.6 percent) that tested FISH negative were scored 3+ by IHC. In contrast, no cases that tested FISH positive were scored IHC 0 or 1+.

Table 6

Estimated discordance rates from meta-analysis of 17 studies on IHC and FISH
IHC Scoremedian % of patients95% credible intervalexpected # per 1,000 screened by IHC95% credible interval% discordant by FISHa95% credible intervalexpected # of discordances by FISH per 1,000 screened by IHC95% credible interval
036.14.4–64.336244–6421.60.9–2.861–13
1+35.57.4–67.435574–6744.92.6–17.9188–30
2+12.03.5–21.412035–214NAbNAbNAbNAb
3+16.210.7–22.9162107–2307.63.8–12.9126–21
a

percentages shown are of expected # patients with IHC score listed in left column;

b

NA = not applicable, since IHC 2+ is considered an equivocal result, thus defined as not discordant regardless of subsequent FISH result.

A systematic review abstracted data from 17 studies (all published before the ASCO/CAP guideline; pooled N=8,419) on FISH/IHC concordance (Dendukuri, Khetani, McIsaac, et al., 2007). Selection criteria sought studies that included consecutive patient series or a random sample, reported agreement between IHC and FISH using standard thresholds, and used assays licensed in Canada to select patients for trastuzumab therapy. All studies used PathVysion® for FISH; 16 used HercepTest™ and one used PATHWAY™ for IHC. Ten combined results for patients scored IHC 0 or 1+, and separately for those scored IHC 2+ or 3+ (pooled N=4,641); seven reported results separately for each IHC score (pooled N=3,778). Using Bayesian meta-analysis, they estimated proportions of breast cancer patients with each of the four possible IHC scores and proportions with each IHC score with positive results by FISH. Table 6 summarizes estimated IHC/FISH discordance rates based on results of the Dendukuri and coworkers' meta-analysis.

Three small studies (combined N=211) conducted outside North America compared results of different ISH methods. An Australian study on 49 breast cancer samples reported that each case (n=20) scored highly positive (greater than 10 signals/cell) by FISH, and seven of 10 cases scored low-positive (5–10 signals/cell) by FISH, also scored positive by CISH (Bilous, Morey, Armes, et al., 2006). Each sample scored IHC 3+ by HercepTest™ also tested CISH positive. A study from Germany reported agreement in 95 of 99 breast tumor samples tested by FISH (PathVysion®) and SISH, an overall concordance of 96 percent (Dietel, Ellis, Hofler, et al., 2007). Finally, a study from Poland compared FISH, CISH, and SISH on 63 breast tumor specimens selected for 2+ or 3+ staining by IHC (Sinczak-Kuta, Tomaszewska, Rudnicka-Sosin, et al., 2007). Investigators reported and interpreted multiple statistical tests (Pearson chi-square tests with p<0.01; gamma correlation coefficients of 0.89 to 0.96; Spearman rank correlation coefficients of 0.70 to 0.79; and Kappa coefficients of 0.38 to 0.58) for separate two-way comparisons of assay results (i.e., CISH versus FISH, FISH versus SISH, and SISH versus CISH) as evidence for good agreement between the methods, but did not report concordance or discordance rates. Larger studies are needed to estimate more reliably rates of concordance and discordance between FISH or IHC and newer ISH methods (CISH, SISH). Furthermore, FDA-approved kits for CISH or SISH are not yet available.

To summarize, evidence from seven studies and a meta-analysis reported after the ASCO/CAP guideline (Wolff, Hammond, Schwartz, et al., 2007a) suggests variable but perhaps non-negligible rates for FISH-negative, IHC 3+ discordance (albeit by the older definition of strong, complete membrane staining in greater than 10 percent of invasive cells), ranging from 0.5 percent to 7.3 percent of breast cancer cases. The meta-analysis also estimated that 0.6 percent (95 percent CI: 0.1–1.3 percent) of cases might be scored IHC 0 and FISH positive, while 1.8 percent (95 percent CI: 0.8–3.0 percent) of cases might be scored IHC 1+ and FISH positive. However, data are unavailable to estimate discordance rates for either group using the current ASCO/CAP definition of IHC 3+ (greater than 30 percent of invasive cells stained).

Disagreement between central and local laboratory results. Evidence reviewed by the ASCO/CAP expert panel demonstrated disagreement between central and local laboratory HER2 test results in approximately 20 percent of cases (Wolff, Hammond, Schwartz, et al., 2007a). This included data from the first 104 patients registered for NSABP B31, showing disagreement in 18 percent of cases (Paik, Bryant, Tan-Chiu, et al., 2002), which resulted in a protocol amendment limiting HER2 testing to 23 approved laboratories. The evidence also included data from NCCTG N9831 showing agreement in 88.1 percent of 813 cases rated FISH positive, 81.6 percent of 1,063 cases scored IHC 3+ by HercepTest™, and 75.0 percent of 636 cases scored IHC 3+ by non-HercepTest™ assays (Perez, Suman, Davidson, et al., 2006). Finally, it included data from a community-based clinical study on trastuzumab for metastatic breast cancer showing 77 percent agreement on samples scored IHC 3+ by local laboratories, but only 26 percent agreement on samples locally scored IHC 2+ (Reddy, Reimann, Anderson, et al., 2006). Based on the available evidence, the panel recommended specific measures for assay validation, self-assessment, accreditation, and proficiency testing by laboratories conducting HER2 assays. In the following section, we summarize new evidence comparing local versus central laboratory results, published since the ASCO/CAP review. Although published after the ASCO/CAP guideline, these studies preceded the guideline and scored samples as originally recommended by manufacturers and FDA labeling.

Final data from NSABP B31 showed disagreement on HER2 status in 174 of 1,787 cases (9.7 percent) classified HER2 positive by local laboratories but HER2 negative by both FISH (PathVysion®) and IHC assays in central or reference laboratories (Paik, Kim, Jeong, et al., 2007; Paik, Kim, and Wolmark, 2008). Data presently are unavailable on rates of disagreement between local and central laboratories from three other randomized, controlled trials of adjuvant trastuzumab (Piccart-Gebhart, Procter, Leyland-Jones, et al., 2005; Slamon, Eiermann, Robert, et al., 2005; Joensuu, Kellokumpu-Lehtinen, Bono, et al., 2006).

A small study compared central and local laboratory IHC results on breast tumor samples initially scored IHC 2+ locally and found FISH positive after referral for central laboratory confirmation (Barrett, Magee, O'Toole, et al., 2007). Investigators reported that of 153 IHC 2+ cases referred to the central laboratory for FISH confirmation, 29 (19 percent) had amplified HER2 genes. With repeat IHC in 25 of the 29, the central laboratory scored 18 cases (72 percent) as IHC 3+ and agreed with the local laboratory score of IHC 2+ in only 7 cases (28 percent). Since the central laboratory did not repeat IHC testing for the 124 cases with nonamplified HER2 genes by FISH, the overall rate of agreement with local results cannot be determined.

A larger study compared IHC results in local (regional) and central laboratories (Papadopoulos, Kouvatseas, Skarlos, et al., 2007). Of 458 available samples, 369 were tested by IHC both regionally and centrally and scores agreed for 296 (80.2 percent). Disagreement was greatest among samples (n=11) scored IHC 3+ by regional laboratories (63 percent concordance). Concordance was better among those (n=20) scored IHC 0 or 1+ and those scored IHC 2+ (n=338) at regional laboratories (85 percent and 80 percent, respectively).

A central reference laboratory analyzed tumor specimens from 315 of 399 (79 percent) patients randomized to capecitabine with or without lapatinib, using both IHC (antibody not reported) and FISH (PathVysion®), seeking confirmation of local laboratory results that classified these patients HER2 positive thus eligible for this randomized, controlled trial (Cameron, Casey, Press, et al., 2008). Central testing found 241 of 315 (77 percent) HER2 positive, including 211 with IHC 3+ results and 30 with IHC 2+, FISH-positive results.

In the Canadian study cited previously, central laboratory testing of breast tumor tissue samples confirmed the IHC-positive status of 79.3 percent to 89.6 percent of 505 cases found IHC positive by local laboratory results (O'Malley, Thomson, Julian, et al., 2008). Among 205 cases found IHC negative by local labs, central IHC testing confirmed local results in 94.8 percent to 100 percent of cases. The concordance rates varied, depending on which of four IHC assays the central laboratory used.

To summarize, data reported after publication of the ASCO/CAP guideline (Wolff, Hammond, Schwartz, et al., 2007a) confirm the estimate of approximately 20 percent disagreement between local (or regional) and central laboratories with respect to HER2 assay results. Data are presently lacking to evaluate the effects of adherence to guideline recommendations for preanalytic, analytic, and postanalytic steps on rates of local/central disagreement.

Validation and proficiency testing. Since these issues are outside the scope of this evidence report, interested readers are referred to current guidelines for specific recommendations on best practices to validate assays and test laboratory proficiency (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). Evidence reviewed by the expert panel included a summary of results from 2004 and 2005 surveys of laboratories participating in CAP-sponsored interlaboratory comparisons of IHC results, using tissue microarrays as the test material (Fitzgibbons, Murphy, Dorfman, et al., 2006). The key finding was that 97 of 102 laboratories (95 percent) in 2004 and 129 of 141 laboratories (91 percent) in 2005 correctly scored 90 percent or more of the test cases. In the following section, we briefly summarize evidence published after the ASCO/CAP guideline. Again, these studies scored samples as originally recommended by manufacturers and FDA labeling.

An international study compared five pathology reference centers (from Netherlands, Canada, France, Belgium, and Germany) on assay scoring and HER2 status classification for separate samples tested by IHC (n=20) or by FISH (n=20) (Dowsett, Hanna, Kockx, et al., 2007). Agreement was uniform among centers on HER2 status classifications for all 20 IHC test cases, although some scoring differences were noted, and some equivocal cases (i.e., those scored IHC 2+) required FISH confirmation to determine HER2 status. Agreement was uniform among centers 16 of 20 (80 percent) FISH test cases. Each of the other four cases was scored in the equivocal range (HER2/CEP17 ratio 1.7–2.3).

A similar international study (from Netherlands, Australia, Canada, France, and Germany) compared results from five central laboratories on 211 breast cancer specimens tested by CISH, FISH and IHC (van de Vijver, Bilous, Hanna, et al., 2007). Each central laboratory sent unstained sections from samples they tested to four other (“outside”) central laboratories. Investigators reported uniform agreement by CISH in the “outside” laboratories on 73 of 76 cases (96 percent) scored highly amplified (HER2/CEP17 greater than 4.0) by FISH in the initial laboratory. Similarly, “outside” CISH uniformly agreed with 94 of 100 (94 percent) cases initially scored as not amplified by FISH (HER2/CEP17 less than 2.0). Among 35 cases scored as equivocal by initial FISH testing (HER2/CEP17 2.0–4.0), 20 were scored as CISH positive and 15 were scored as CISH negative. Overall interlaboratory concordance was 95 percent for cases with normal HER2 gene copy number (1–5) and was 92 percent for cases with 6 or more copies of the HER2 gene.

A brief report by investigators from the Italian Network for Quality Assessment of Tumor Biomarkers (INQUAT) and the United Kingdom National External Quality Assessment Service (U.K. NEQAS) highlighted the importance of including both preanalytic and analytic steps in proficiency testing programs (Paradiso, Miller, Marubini, et al., 2007). The U.K. NEQAS program for HER2 testing focuses on preanalytic aspects of the IHC assay, while the INQUAT program focuses on intra- and interlaboratory variability in scoring a set of fixed and stained IHC slides. Twelve Italian laboratories participated in both quality control programs during 2003, and only one achieved high-quality performance in preanalytic processing steps and in intra- and interlaboratory reproducibility. Some laboratories that achieved high-quality performance in preanalytic steps did not score slides reproducibly, or vice versa. Three of the 12 laboratories did not perform adequately on either preanalytic or analytic steps.

A recent study covalently attached fixed and unfixed samples of synthetic HER peptide to glass microscope slides with unstained sections of invasive breast carcinomas (Vani, Sompuram, Fitzgibbons, et al., 2008). The peptide fragments were used as positive analyte controls on slides distributed to 192 laboratories participating in the CAP 2006 HER2-B proficiency testing survey. Stained slides were returned and centrally reviewed (n=109 laboratories), permitting participants to evaluate sources of variability in HER2 staining performance. Investigators reported suboptimal staining in 20 of 109 slides (18.3 percent). Of these, seven cases (35 percent of the 20 failures) were attributable to errors in the antigen retrieval step, four (20 percent) were attributable to problems with the antibody staining protocol, and nine (45 percent) had problems with both.

In summary, two studies published subsequent to the ASCO/CAP review (Wolff, Hammond, Schwartz, et al., 2007a) reported similar results on interlaboratory comparisons. Overall, the available evidence shows 90 percent or greater agreement between high-volume reference laboratories in North America, Europe, and Australia. Scoring differences between laboratories occur most often with cases of low-level amplification or low-level overexpression. Results reported before and after the ASCO/CAP review (and other guidelines) support considering such cases as equivocal results, with confirmatory testing needed to classify HER2 status. Collaborative data from Italy and the United Kingdom suggest that quality control programs must evaluate all steps (preanalytic, analytic, and postanalytic) in HER2 testing. Positive analyte controls confirmed that antigen retrieval and antibody staining are persistent sources of interlaboratory variability in IHC results.

Reports on polysomy 17. The ASCO/CAP expert panel (Wolff, Hammond, Schwartz, et al., 2007a) interpreted evidence from two studies (Downs-Kelly, Yoder, Stoller, et al., 2005; Ma, Lespagnard, Durbecq, et al., 2005) as not supporting an association of polysomy 17 (defined as three or more copies of CEP 17) with HER2 protein or mRNA overexpression. However, one of these (Ma, Lespagnard, Durbecq, et al., 2005) reported increased HER2 protein (IHC 3+) in a subset of patients with polysomy 17 and HER2/CEP 17 ratios less than 2. In the following section, we summarize evidence published subsequent to the ASCO/CAP guideline.

Nine studies have reported data on polysomy 17 and HER2 status of breast cancer patients since the ASCO/CAP review. Of these, seven have been published in full (Dal Lago, Durbecq, Desmedt, et al., 2006; Torrisi, Rotmensz, Bagnardi, et al., 2007; Corzo, Bellosillo, Corominas, et al., 2007; Beser, Tuzlali, Guzey, et al., 2007; Hyun, Lee, Kim, et al., 2008; Kostopoulou, Vageli, Kaisaridou, et al., 2007; Hofmann, Stoss, Gaiser, et al., 2008) and two were reported at meetings with slides or video available on line (Kaufman, Broadwater, Lezon-Geyda, et al., 2007; Reinholz, Jenkins, Hillman, et al., 2007). Three studies reported no association of polysomy 17 with HER2 protein and/or mRNA overexpression (Dal Lago, Durbecq, Desmedt, et al., 2006; Torrisi, Rotmensz, Bagnardi, et al., 2007; Corzo, Bellosillo, Corominas, et al., 2007). In contrast, five other studies reported increased levels of HER2 protein in some cases with polysomy 17 and unamplified HER2 genes (Hyun, Lee, Kim, et al., 2008; Kaufman, Broadwater, Lezon-Geyda, et al., 2007; Reinholz, Jenkins, Hillman, et al., 2007; Kostopoulou, Vageli, Kaisaridou, et al., 2007; Hofmann, Stoss, Gaiser, et al., 2008). The ninth study did not report data on overexpression of HER2 protein or mRNA; this study reported chromosome 17 polysomy in two of 11 patients with HER2 gene amplification and in seven of 39 patients with unamplified HER2 genes (Beser, Tuzlali, Guzey, et al., 2007). In one study (Hofmann, Stoss, Gaiser, et al., 2008), seven of nine discordant IHC 3+/FISH-negative patients had chromosome 17 polysomy, and six of 26 patients with polysomy 17 responded to trastuzumab therapy for metastatic disease. However, all six responders were scored 3+ by IHC.

In contrast to conclusions of the ASCO/CAP review (Wolff, Hammond, Schwartz, et al., 2007a), evidence published subsequently reopens the question of whether chromosome 17 polysomy has implications for classifying patients' HER2 status. Five of eight new studies found polysomy 17 to be associated with protein (and/or mRNA) overexpression in at least some patients with nonamplified HER2 genes, while three of eight found no association.

Implications for Remainder of this Report

Discordances between IHC and FISH results might arise in one of three ways. They may be artifacts of one accurate and one inaccurate test. Alternatively, they may reflect a threshold issue, either related to the changes in threshold definitions over time, or an inherent problem of using a continuous measure to classify patients dichotomously. Finally, discordant test results might accurately reflect a small number of different patients with respect to the biologic mechanism that increases membrane levels of the HER2 protein. Present data could not tease apart the many factors reviewed here (preanalytic, analytic and postanalytic) that might have contributed to discordances in HER2 assay results. This clearly affects the interpretation of evidence on key questions that address use of “HER2 status” to predict treatment outcomes, even in nonbreast malignancies (Key Questions 2, 3, and 5). Furthermore, it also affects interpretation of evidence on the added clinical utility of serum measurements for patients with known tissue status, since this presumes accurate classification by tissue assays. Future studies reporting outcomes as a function of HER2 status should report separately on patients with concordant, equivocal, and discordant assay results.

Key Question 2

For patients who are not unequivocally HER2 positive, what is the evidence on outcomes of treatment targeting the HER2 molecule (trastuzumab, etc.), or on differences in outcomes of uniform chemotherapy or hormonal therapy regimens with versus without additional treatment targeting the HER2 molecule, in:

  • a)

    Breast cancer patients characterized by equivocal or discordant HER2 results from different tissue assay methods performed adequately; and

  • b)

    For those with HER2-negative breast cancer?

Study Selection

Table 7

Summary study design, treatment, patient characteristics, KQ2
StudyTreatments ComparedAge or Menopause StatusDisease ExtentER+PR+n FISH+ IHC-n FISH- IHC3+n FISH- IHC1,2+n FISH- IHC-
Adjuvant treatment for resected early breast cancer
NSABP B31 Paik et al., 2007; Paik et al., 2008; Romond et al., 2005≥50 years>2 cm>3 + nodes
Tx: AC → (P+TRZ) (n=1,019 randomized) 48.4% 61.4% 42.6% 51.9% 39.0% 56 10 69 82
Cx: AC → P (n=1,024 randomized)48.4%57.1%43.3%52.8%41.4%69218092
NCCTG N9831 Perez et al., 2007; Reinholz et al., 2007; Perez et al., 2006; Romond et al., 2005≥50 years>2 cm>3 + nodes
Tx: AC → (P+TRZ) (n=884 randomized) 50.4% 61.5% 39.1% 51.2% 39.4% 123 23 59
Cx: AC → P (n=895 randomized)48.9%58.7%39.1%52.8%41.3%953044
First- or second-line treatment for advanced breast cancer
CALGB 9840 Seidman et al., 2004, 2008Menopausal status≥3 metastatic sites
Tx: P (q wk vs. q3wk)+TRZ (n=115 randomized) 75% post 15% 55% NR 113
Cx: P (q wk vs. q3wk) (n=113 randomized)84% post11%49%NR115
CALGB 150002 (from 9840) Kaufman et al., 2007Tx: P (q wk vs. q3wk)+TRZ (n=115 randomized) 75% post15%55%NRcentral FISH-, polysomy +: 19
central FISH-, polysomy -: 53
Cx: P (q wk vs. q3wk) (n=113 randomized) 84% post11%49%NRcentral FISH-, polysomy +: 19
central FISH-, polysomy -: 50
EGF100151 Cameron et al., 2008; Geyer et al., 2006Tx: capecitabine (2 g/m2 days 1–14 q 3wk) + lapatinib (1.25 g q day) (n= 198 randomized) Median 54 yrs; range 26–80 yrs≥3 metastatic sitesER+ &/or PR+
49% 48% 15 1 14 23
Cx: capecitabine alone (2.5 g/m2 days 1–14 q 3wk) (n=201 randomized)Median 51 yrs; range 28–83 yrs48%46%721421

Abbreviations: AC: Adriamycin [doxorubicin]/cyclophosphamide; Cx: control; ER+: estrogen-receptor positive; IHC: immunohistochemistry; FISH: fluorescent in situ hybridization; mos: months; PR+: progesterone-receptor positive; P: paclitaxel; q wk: every week; q3wk: every 3 week; TRZ: trastuzumab; Tx: treatment; yrs: years.

Table 9

Summary tumor response, KQ2
StudyTumor Response (%)
CALGB 9840 Seidman et al., 2004, 2008GrpNCRPROR (CR+PR; with 95% CI)SDPDNETestpComments
+TRZ11238% (29%–48%)multivariate logistic regression0.28OR=1.35 (0.78–2.34)
-TRZ11432% (23%–41%)
CALGB 150002 (from 9840) Kaufman et al., 2007GrpNCRPROR (CR+PR)SDPDNETestpComments
+TRZ1963%???0.048FISH-/polysomy+
- TRZ1926%
+TRZ5336%???NSFISH-/polysomy-
- TRZ5036%

Abbreviations: CR: complete response; Grp: group; NE: not evaluable; NS: not significant; OR: overall response; PD: progressive disease; PR: partial response; SD: stable disease; TRZ: trastuzumab;

Table Abstraction Table II-A

Design, Enrollment and Treatment
StudyDesignTherapeutic Settingn, Enrolled (n per group)n, Evaluatedn, withdrawn or lost to F/UTreatment Regimen (Agents)
HER2 Discrepant
Paik et al. 2007; Kim et al, in preparation; Romond et al. 2005RCT NSABP-B31adjuvant therapy2043 (1024, 1019)1829 w tumor blocks; 1795 w baseline and F/U data248AC→ (P ± trastuzumab)
Perez et al. 2007; Perez et al. 2006; Romond et al. 2005RCT NCCTG N9831adjuvant therapy18421779 (895, 884)63AC→ (P ± trastuzumab)
HER2 Negative
Seidman et al. 2004RCT CALGB 9840inoperable or metastatic disease, stratified by 1st or 2nd line therapy735228 (HER2-) (113, 115)0 (507 HER2+ or UNK given TRZ)4 arm trial: P (weekly vs. q3w) stratified by HER2 status; HER2- randomized to ± TRZ, all HER2+ given TRZ
Kaufman et al. 2007RCT CALGB 150002metastatic, 1st or 2nd line; companion study on CALGB 9840 pts585303 (samples available for central testing)2824 arm trial: P (weekly vs. q3w) stratified by HER2 status; HER2- randomized to ± TRZ, all HER2+ given TRZ

Table II-I

Adverse Events
Toxicity TypeStudySeverity or GradeResults
HER2 Discrepant (IHC 2+/FISH+)
Treatment-related mortalityF/U (mo)Grp1 n%Grp2 n%
NauseaF/U (mo)Grp1 n%Grp2 n%
VomitingF/U (mo)Grp1 n%Grp2 n%
AnorexiaF/U (mo)Grp1 n%Grp2 n%
LethargyF/U (mo)Grp1 n%Grp2 n%
NeurosensoryF/U (mo)Grp1 n%Grp2 n%
Hearing lossF/U (mo)Grp1 n%Grp2 n%
Cardiac ischemiaF/U (mo)Grp1 n%Grp2 n%
Diminished LVEFF/U (mo)Grp1 n%Grp2 n%
ArrhythmiasF/U (mo)Grp1 n%Grp2 n%
BronchopulmonaryF/U (mo)Grp1 n%Grp2 n%
DermatologicF/U (mo)Grp1 n%Grp2 n%
KidneyF/U (mo)Grp1 n%Grp2 n%
AnemiaF/U (mo)Grp1 n%Grp2 n%
ThrombocytopeniaF/U (mo)Grp1 n%Grp2 n%
Leukopenia or neutropeniaF/U (mo)Grp1 n%Grp2 n%
InfectionF/U (mo)Grp1 n%Grp2 n%
OtherF/U (mo)Grp1 n%Grp2 n%
HER2 Negative
Treatment-related mortalityF/U (mo)Grp1 n%Grp2 n%
NauseaF/U (mo)Grp1 n%Grp2 n%
VomitingF/U (mo)Grp1 n%Grp2 n%
AnorexiaF/U (mo)Grp1 n%Grp2 n%
LethargyF/U (mo)Grp1 n%Grp2 n%
NeurosensoryF/U (mo)Grp1 n%Grp2 n%
Hearing lossF/U (mo)Grp1 n%Grp2 n%
Cardiac ischemiaF/U (mo)Grp1 n%Grp2 n%
Diminished LVEFF/U (mo)Grp1 n%Grp2 n%
ArrhythmiasF/U (mo)Grp1 n%Grp2 n%
BronchopulmonaryF/U (mo)Grp1 n%Grp2 n%
DermatologicF/U (mo)Grp1 n%Grp2 n%
KidneyF/U (mo)Grp1 n%Grp2 n%
AnemiaF/U (mo)Grp1 n%Grp2 n%
ThrombocytopeniaF/U (mo)Grp1 n%Grp2 n%
Leukopenia or neutropeniaF/U (mo)Grp1 n%Grp2 n%
InfectionF/U (mo)Grp1 n%Grp2 n%
OtherF/U (mo)Grp1 n%Grp2 n%
The search strategy for studies on HER2 testing in breast cancer yielded 3,218 citations. Initial review selected 74 citations potentially relevant to Key Question 2 for retrieval and review as full articles. We used the ASCO/CAP expert panel's definition (Wolff, Hammond, Schwartz, et al., 2007a) of equivocal HER2 assay results: IHC 2+, or HER2 gene copy number from 4.0 to 6.0 or HER2/CEP17 ratio from 1.8 to 2.2 if ISH is the first or only assay. We defined discordant results as unequivocally positive results by one assay method (i.e., IHC 3+, HER2 gene copy number greater than 6.0, or HER2/CEP17 ratio greater than 2.2) and unequivocally negative results by a different assay method on another tissue section from the same tumor. Four trials (eleven reports; see Table 7 and “Available Studies” for citations) met selection criteria for data abstraction and compared outcomes with versus without a drug targeting HER2, for breast cancer patients with equivocal, discordant, or unequivocally negative HER2 assay results. Three trials randomized patients to chemotherapy with versus without trastuzumab; the fourth randomized patients to chemotherapy with versus without lapatinib, a tyrosine kinase inhibitor active against HER1 and HER2. Trials and their results are summarized in Tables 79; detailed abstraction data can be found in Appendix Tables II-AII-I *.

Available Studies and Reports

Table 7 includes two trials on adjuvant trastuzumab with data for Key Question 2 (NSABP B31 and NCCTG N9831). Each reported post-hoc analyses on interim results for small subgroups of resected breast cancer patients inadvertently randomized to chemotherapy with or without trastuzumab in trials seeking to randomize only HER2-positive patients. Similarly, a trial on chemotherapy with or without lapatinib for locally advanced or metastatic disease (EGF100151) also intended to randomize only HER2-positive patients (Cameron, Casey, Press, et al., 2008; Geyer, Forster, Lindquist, et al., 2006). In each of these trials, local laboratory HER2 testing initially classified all randomized patients as HER2 positive. However, central or reference laboratory retests subsequently identified small subsets as equivocal, discordant, or HER2 negative. Only one trial (CALGB 9840) intentionally randomized HER2-negative metastatic breast cancer patients (referred to as “HER2 non-overexpressors” by study authors), and directly tested whether adding trastuzumab to chemotherapy improved outcomes.

One trial on trastuzumab in adjuvant therapy (NSABP B31) reported data on post-hoc subgroup analyses in a brief published communication (Paik, Kim, and Wolmark, 2008). Another adjuvant trastuzumab trial (NCCTG N9831) compared local, central, and reference laboratory results of HER2 testing in a published article that did not report outcomes (Perez, Suman, Davidson, et al., 2006). Both trials reported subgroup outcomes in meeting abstracts, with slides available online (B31: Paik, Kim, Jeong, et al., 2007; N9831: Perez, Romond, Suman, et al., 2007, and Reinholz, Jenkins, Hillman, et al., 2007). A single, published report provided baseline characteristics and preliminary outcomes data for patients randomized to treatment arms common to B31 and N9831 (Romond, Perez, Bryant, et al., 2005). Data were reported in this publication on each trial separately and both trials combined.

Two trials on patients with advanced or metastatic disease published full reports with subgroup analyses (Seidman, Berry, Cirrincione, et al., 2008; Cameron, Casey, Press, et al., 2008). The EGF100151 trial on chemotherapy with or without lapatinib (Cameron, Casey, Press et al., 2008) also published an earlier report (Geyer, Forster, Lindquist et al., 2006), but without results of repeat HER2 testing by a central or reference laboratory or analyses relevant to Key Question 2. CALGB 9840, the only preplanned analysis relevant to this key question, is on a HER2-negative (i.e., non-overexpressor) subgroup randomized to chemotherapy with or without trastuzumab within a larger trial studying an unrelated question (Seidman, Berry, Cirrincione, et al., 2004, 2008). CALGB 9840 also is the source of all patients in the subgroup analyzed post-hoc in CALGB 150002 (Kaufman, Broadwater, Lezon-Geyda, et al., 2007).

Treatments and Subgroups Compared

Adjuvant therapy. Two trials (NSABP B31, NCCTG N9831) investigated outcomes of adjuvant doxorubicin plus cyclophosphamide (AC; every three weeks for four cycles), followed by paclitaxel (P; every three weeks for four cycles), with versus without trastuzumab (+/-TRZ; weekly for 12 months, beginning concurrently with paclitaxel) in women with fully resected early breast cancer. Outcomes are as-yet unreported for a third arm of N9831, which began trastuzumab therapy after all eight cycles of chemotherapy (AC→P→TRZ). Both B31 and N9831 limited eligibility to HER2-positive patients, defined as FISH-positive/IHC unknown, IHC3+/FISH-unknown, or IHC2+/FISH-positive. Patients were initially evaluated by local laboratory testing, and randomized if classified HER2-positive by these results. They were subsequently re-evaluated by central laboratory testing, but continued with assigned treatments regardless of results. A planned interim analysis at two years' median followup (2.4 years for B31 patients; 1.5 years for N9831 patients) for all patients randomized to the treatment arms common to both trials, pooled patients assigned to the control arms(n=1,679; AC→P) and those assigned to concurrent trastuzumab (n=1,672; AC→P+TRZ) (Romond, Perez, Bryant, et al., 2005). Trastuzumab significantly improved overall survival (OS) at four years: 91.4 percent versus 86.6 percent; hazard ratio (HR) =0.67; 95 percent CI: 0.48–0.93; p=0.015. The B31 (Paik, Kim, and Wolmark, 2008; Paik, Kim, Jeong, et al., 2007) and N9831 (Perez, Romond, Suman, et al., 2007 and Reinholz, Jenkins, Hillman et al., 2007) results included here were unplanned, post-hoc analyses. They compared outcomes of adjuvant AC→(P+/-TRZ) in subgroups found HER2 discordant or negative by central lab results, using data collected for the pooled analysis of Romond, Perez, Bryant, et al. (2005) without longer followup.

Advanced/metastatic disease. A randomized, controlled trial (CALGB 9840) that studied paclitaxel in women receiving first- or second-line therapy for metastatic breast cancer reported outcomes at two meetings (Seidman, Berry, Cirrincione, et al., 2004; Kaufman, Broadwater, Lezon-Geyda, et al., 2007) and in a published article (Seidman, Berry, Cirrincione, et al., 2008). Primary randomization in this trial compared once-weekly to every-third-week paclitaxel dosing regimens. Testing for HER2 status began after enrolling the first 171 patients, and HER2-negative patients (termed “HER2 non-overexpressors” by study authors and defined as 0 or 1+ or IHC 2+/FISH negative by local laboratory tests) were also randomized to treatment with or without trastuzumab. Seidman, Berry, Cirrincione, et al. (2004, 2008) reported outcomes for this second randomization without separating results by paclitaxel treatment frequency. HER2-positive patients (by local laboratory tests) all received trastuzumab and are excluded from the analysis for Key Question 2.

Table 8

Summary time to event outcomes, KQ2
StudyTime to Event Outcomes
HER2 Discordant (all data on adjuvant AC→P +/- TRZ)
FISH+ IHC 0, 1+, or 2+ by central lab:
NSABP B-31aOutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
DFSTx56Cox prop0.0640.30 (0.08–1.07)
Cx69hazards
NCCTG N9831 DFSTx123???0.970.98 (0.33–2.91)
Cx95
FISH- IHC 3+ by central lab:
NSABP B-31aOutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
DFSTx10Cox prop0.940.91 (0.08–10)
Cx21hazards
NCCTG N9831 DFSTx23???0.570.61 (0.11–3.29)
Cx30
HER2 Negative
adjuvant AC→P +/- TRZ: FISH- IHC 1+, 2+ by central lab:
NSABP B-31aOutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
DFSTx69~98%~95%~90%~90%~86%Cox prop0.020.30 (0.11–0.83)
Cx80~90%~79%~75%~70%~62%hazards
adjuvant AC→P +/- TRZ: FISH- IHC 0, 1+, or 2+
NSABP B-31aOutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
DFSTx82~97%~90%~87%~87%~84%Cox prop0.0140.34 (0.14–0.80)
Cx92~92%~80%~76%~72%~65%hazards
NCCTG N9831 DFSTx5990.2%81.2%???pHR (95%CI)
Cx4482.6%60.9%0.130.51 (0.21–1.2)
P +/- TRZ as 1st or 2nd line therapy for metastatic disease
CALGB 9840 IHC2+/FISH- or IHC 0, 1+OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestp
OSTx11321.6~75%~40%~25%20%K-M0.65
Cx11521.6~70%~40%~25%20%analysis
TTPTx1136.5~30%~13%~7%~5%K-M0.28
Cx1155.5~25%~12%~12%~4%analysis
CALGB 150002 (from 9840) central FISH-, polysomy 17OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestp
OSTx19~30~90%~65%~30%???0.538
Cx19~23~69%~48%~30%
capecitabine +/- lapatinib for advanced or metastatic disease progressing after an anthracycline, a taxane, and trastuzumab
EGF100151 Cameron et al., 2008OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95% CI)
PFSTxK-M0.460.77 (0.39–1.54)
Cxanalysis
(sample include 74 patients not centrally confirmed to meet protocol HER2 eligibility criteria)
a

Subgroup analyses reported from NSABP B31 adjusted each Cox proportional hazards model used to estimate HR for included patients' ER and nodal status; subgroup analyses from NCCTG N9831 are unadjusted.

Abbreviations: AC: Adriamycin [doxorubicin]/cyclophosphamide; CI: confidence interval; Cx: control; DFS: disease-free survival; HR: hazard ratio; IHC: immunohistochemistry; FISH: fluorescent in situ hybridization; K-M: Kaplan-Meyer; Med: median; mos: months; OS: overall survival; P: paclitaxel; prop: proportional; q wk: every week; q3wk: every 3 week; TRZ: trastuzumab; TTP: time to progression; Tx: treatment; yr: year(s)

For all patients randomized (n=735), CALGB 9840 investigators first reported that response rate and time to progression (TTP) were better with weekly paclitaxel than with every third week, although the difference in median OS (24 versus 16 months; HR=1.19, p=0.17) was not statistically significant (Seidman, Berry, Cirrincione, et al., 2004). As prespecified in the CALGB 9840 protocol, the final analysis (Seidman, Berry, Cirrincione, et al., 2008) comparing paclitaxel schedules pooled additional patients (n=158) randomized to the identical dose of paclitaxel every third week (all without trastuzumab) in another trial (CALGB 9342; Winer, Berry, Woolf et al., 2004) with those randomized to this schedule in CALGB 9840. In this combined analysis, weekly paclitaxel statistically significantly improved response rate (42 percent versus 29 percent; OR=1.75, p=0.0004), TTP (median, nine versus five months; HR=1.43, p<0.0001), and OS (median, 24 versus 12 months; HR=1.28, p=0.0092), when compared with treatment every third week. Data in Table 8 on HER2 non-overexpressors exclude patients from CALGB 9342.

A post-hoc analysis on HER2 non-overexpressors randomized to paclitaxel with versus without trastuzumab in CALGB 9840 compared outcomes for subsets found FISH negative by central laboratory testing who had or did not have chromosome 17 polysomy (CALGB 150002; Kaufman, Broadwater, Lezon-Geyda, et al., 2007). This analysis was not included in the published final report (Seidman, Berry, Cirrincione, et al., 2008). It also did not include patients from CALGB 9342, none of whom were randomized to paclitaxel with or without trastuzumab.

The EGF100151 trial randomized patients with locally advanced or metastatic breast cancer to capecitabine (1 g/m2 twice daily for 14 days every three weeks) plus lapatinib (1.25 g/m2 daily) or to capecitabine alone (1.25 g/m2 twice daily for 14 days every three weeks). Eligibility required: a T4 primary tumor and stage IIIB or IIIC disease, for those without distant metastasis; a history of progressive disease after one or more regimens that included an anthracycline, a taxane, and trastuzumab (given separately or in combinations); and local laboratory HER2 test results of IHC3+ or IHC2+/FISH positive. An interim analysis on 163 patients randomized to capecitabine plus lapatinib and 161 randomized to capecitabine monotherapy reported median TTP was 8.4 months in the combination arm and 4.4 months in the capecitabine monotherapy arm (HR=0.49; 95 percent CI: 0.34–0.71, p<0.001) (Geyer, Forster, Lindquist, et al., 2006). A second report included more patients (n=198, capecitabine plus lapatinib; n=201, capecitabine monotherapy; Cameron, Casey, Press, et al., 2008). By intent-to-treat analysis, median TTP was 6.2 months in the combined arm and 4.3 months in the monotherapy arm (HR=0.57; 95 percent CI: 0.43–0.77, p<0.001). A second interim analysis for OS found 28 percent had died (median OS, 15.6 months) in the combined therapy arm and 32 percent had died (median OS, 15.3 months) in the capecitabine monotherapy arm (HR=0.78; 95 percent CI: 0.55–1.12; p=0.177); followup for survival continues. Central laboratory IHC and FISH retesting of samples from 300 (75 percent) of the 399 randomized in this trial identified small subgroups with HER2-discordant or -negative results (Table 7).

Study Quality

Only one of four included trials (CALGB 9840) stratified randomization by HER2 status, the most informative evidence level defined in this report's study design hierarchy (see Methods, Table 3). The others are post-hoc analyses of treatment effects in HER2-discordant or -negative subgroups from larger randomized, controlled trials. One trial on adjuvant trastuzumab (NSABP B31) and both trials on patients with metastatic or advanced disease (CALGB 9840 and EGF100151) included multivariate analyses. However, neither CALGB 9840 nor EGF100151 used multivariate analysis to adjust treatment outcomes in HER2 discordant or HER2 negative subgroups. Since these subgroups from each study are small and underpowered, and since results from three of four trials are interim analyses with limited followup, we did not assess study quality using the checklist derived from REMARK and other sources (see “Methods”).

Patient Characteristics

Adjuvant therapy. Patients from B31 and N9831 were initially randomized based on positive results of local lab testing, given their assigned regimen, and followed on these randomized, controlled trials. Those in subgroups included here subsequently were reclassified HER2 discordant or HER2 negative by central laboratory results. Baseline patient characteristics and prognostic factors (Table 7) were reported for all patients randomized to each treatment arm in each trial (Romond, Perez, Bryant, et al., 2005), including those classified as HER2 positive by both local and central laboratory results. At the level of initial randomization, baseline characteristics and prognostic factors of the groups treated with versus without trastuzumab were similar. However, data were not reported to separately compare baseline characteristics and prognostic factors by treatment arm for each subgroup of HER2-discordant or -negative patients (by central laboratory results).

Data are available from B31 for two HER2-discordant groups:

  • FISH positive/IHC 0, 1+, or 2+: n=56 +TRZ; n=69 -TRZ (data not reported separately for FISH-positive, IHC 0, 1+ subset)

  • FISH negative/IHC 3+: n=10 +TRZ; n=21 -TRZ;

and for two (partially overlapping) HER2-negative groups:

  • FISH negative/IHC 1+ or 2+: n=69 +TRZ; n=80 -TRZ

  • FISH negative/IHC 0, 1+, or 2+: n=82 +TRZ; n=92 -TRZ (13 and 12 patients per arm added to the 69 and 80 in the arms above).

Data are available from N9831 for two HER2-discordant groups:

  • FISH positive/IHC 0, 1+, or 2+: n=123 +TRZ; n=95 -TRZ (data not reported separately for FISH-positive, IHC 0, 1+ subset)

  • FISH negative/IHC 3+: n= 23 +TRZ; n=30 -TRZ;

and for one HER2-negative group:

  • FISH negative/IHC 0, 1+, or 2+: n=59 +TRZ; n=44 -TRZ.

Advanced/metastatic disease. Patients in CALGB 9840 had metastatic disease undergoing first- or second-line therapy. All were randomized to weekly or every third week paclitaxel, and those who were HER2 negative (IHC 2+/FISH negative or IHC 0 or 1+) by local laboratory results were simultaneously randomized to receive (n=113) or not receive (n=115) trastuzumab. The analysis pooled outcomes in the HER2-negative arms for patients given paclitaxel weekly or every third week. Subsequent analyses (CALGB 150002) compared outcomes separately for subgroups from CALGB 9840 who were FISH negative by central laboratory results and had (+/-TRZ, n=19 each arm) or did not have (+TRZ, n=53; -TRZ, n=50) chromosome 17 polysomy.

Patients in EGF100151 had locally advanced or metastatic disease that progressed after one or more regimens with an anthracycline, a taxane, and trastuzumab (given separately or in combinations, as adjuvant therapy or for metastasis). Women (n=399) with local laboratory HER2 test results of IHC3+ or IHC2+/FISH positive were randomized to capecitabine with or without lapatinib. Baseline characteristics and prognostic factors of the groups treated with versus without lapatinib were similar. Subsequent central laboratory reanalysis by FISH and IHC of tumor samples from 300 patients (75 percent of all randomized) identified HER2 discordant or HER2 negative subgroups (Table 7). Data were not reported to separately compare baseline characteristics or prognostic factors by treatment arm for any of these subgroups.

Results, Key Question 2

Adjuvant AC→(P±TRZ). The only available data are from post-hoc subgroup analyses, without stratification for the subgroups' defining characteristics. Neither the B31 nor the N9831 analyses reported subgroup-specific comparisons of baseline characteristics or prognostic factors by treatment arm. Furthermore, one subgroup mixed results for a discordant subgroup (IHC 0, 1+, FISH positive) with results for initially equivocal but ultimately positive (IHC 2+ but amplified by FISH) patients. Finally, data are presently unavailable from studies that classified patients using assay thresholds consistent with current guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007 see “Results and Conclusions, Key Question 1, Narrative Review”).

Neither trial reported median followup durations, or showed numbers per arm at risk over time, for the specific subgroups compared. In each subgroup from each treatment arm, failure events (e.g., death or relapse) occurred in less than 25 percent of patients (range: 5–23 percent) at the time of analysis. Therefore, length of followup was inadequate for reliable estimates of median event-free durations for any outcome reported. The interim analyses for all patients randomized in the larger trials that were sources of these subgroups (Romond, Perez, Bryant, et al., 2005) also lacked sufficient followup for reliable estimates of median overall survival or median disease-free survival (DFS).

For HER2 discrepant patients who were FISH positive and IHC 0, 1+ or 2+ by central laboratory testing, between-arm differences in outcome were not statistically significant in either trial. In B31 (n=56 +TRZ; n=69 -TRZ), the HR for failure in analysis of DFS was 0.30 (95 percent CI: 0.08–1.07; p=0.064) and the HR for failure in analysis of recurrence-free interval (RFI) was 0.35 (95 percent CI: 0.10–1.28; p=0.11). In N9831 (n=123 +TRZ; n=95 -TRZ), the HR for failure in analysis of DFS was 0.98 (95 percent CI: 0.33–2.91; p=0.97).

Few patients were FISH negative and IHC 3+ by central laboratory results (from B31: n=10 +TRZ; n=21 -TRZ; from N9831: n=23 +TRZ; n=30 -TRZ). B31 reported HR for failure was 0.91 for both DFS and RFI (for each outcome, 95 percent CI: 0.08–10.0; p=0.94), and N9831 reported hazard ratio for failure was 0.61 (95 percent CI: 0.11–3.29; p=0.57). Each between-arm subgroup comparison was not statistically significant.

Only B31 analyzed outcomes of patient subgroups that were HER2 negative by FISH but IHC 1+ or 2+ by central laboratory testing [n=69 +TRZ; n=80 -TRZ]). Between-arm differences reported by Paik, Kim, Jeong et al. (2007) were statistically significant for DFS (HR=0.30; 95 percent CI: 0.11–0.83; p=0.02) and RFI (HR=0.31; 95 percent CI: 0.10–0.95; p=0.041), and favored the subgroup given trastuzumab.

Both trials reported on patients who were FISH negative and IHC 0, 1+ or 2+ by central laboratory testing. In B31, this subgroup added FISH-negative/IHC 0 patients (13 and 12 per arm, respectively) to those in the FISH-negative/IHC 1+ or 2+ arms shown above (combined n=82 +TRZ; combined n=92 -TRZ). Between-arm differences were statistically significant for DFS (7 events, +TRZ, 20 events, -TRZ; HR=0.34; 95 percent CI: 0.14–0.80; p=0.014) and RFI (HR=0.36; 95 percent CI: 0.14–0.92; p=0.034), and again favored the subgroup given trastuzumab. One patient died in the trastuzumab arm, while 10 died in the control arm (HR=0.08; 95 percent CI: 0.01–0.64, p=0.017). In N9831 (n=59 +TRZ, n=44 -TRZ), the between-arm difference in DFS (HR=0.51; 95 percent CI: 0.21–1.2; p=0.13) was not statistically significant.

HER2 gene copy number and magnitude of benefit from trastuzumab. Additional unpublished subset analyses from the B31 trial presented at the June 2007 ASCO annual meeting (Paik, Kim, Jeong, et al., 2007), and similar analyses from the N9831 trial (Reinholz, Jenkins, Hillman, et al., 2007) and the HERA trial (McCaskill-Stevens, Proctor, Goodbrand, et al., 2007) presented at the December, 2007 San Antonio Breast Cancer Symposium, investigated the hypothesis that higher HER2 gene copy numbers, or higher HER2/CEP17 FISH ratios, were associated with a larger magnitude of relative benefit from trastuzumab. Data from the N9831 and HERA trials showed that the hazard ratio for DFS did not grow more favorable to the trastuzumab arm as average FISH ratios increased from 2.0 to 15 or greater (N9831), or from 2 to greater than 8 (HERA). Additionally, investigators found the HR for DFS did not increase as average HER2 gene copy number per cell increased from 4 to greater than 18 (HERA), or from 2 to greater than 10 (B31).

Polysomy 17 and adjuvant trastuzumab. An unpublished post-hoc analysis of data from N9831 presented at the December 2007 San Antonio Breast Cancer Symposium evaluated whether polysomy 17 influenced effects of adjuvant trastuzumab (Reinholz, Jenkins, Hillman, et al., 2007). Investigators reported that among patients with amplified HER2 genes, trastuzumab increased DFS whether or not these patients had polysomy 17. Central lab results identified very few patients without HER2 overexpression by IHC or HER2 gene amplification by FISH, but with polysomy 17. DFS was lower (79 percent versus 83 percent at 3 years; 65 percent versus 75 percent at 5 years) among those given trastuzumab than among those not given trastuzumab, although the sample size was small and few events had occurred in either arm (6 of 24 given trastuzumab, 3 of 13 controls). Investigators also analyzed slightly larger patient subsets without HER2 overexpression by IHC, HER2 gene amplification by FISH, or polysomy 17. DFS was substantially higher (94 percent versus 77 percent at 3 years; 84 percent versus 55 percent at 5 years) among those given than among those not given trastuzumab. As in the subset with polysomy 17, few events had occurred in either arm in the subset without polysomy (4 of 34 given trastuzumab, 13 of 33 controls). Additionally, unpublished data from the NSABP B31 trial showed no impact on prognosis or degree of benefit from trastuzumab (Dr. S. Paik; personal communication, May 2008).

HER2-negative patients with metastatic disease given P±TRZ for first- or second-line therapy. Patients found IHC 2+/FISH negative or IHC 0, 1+ by local laboratory results were randomized in CALGB 9840 to have or not have trastuzumab added to paclitaxel (n=113 +TRZ; n=115 -TRZ). Between-arm differences in OS (median: 21.6 versus 19.6 months, p=0.67), time to progression (TTP; median: 12 versus 6 months, p=0.088), and overall response rate (ORR; 35 percent versus 29 percent, p=0.32) were not statistically significant (Seidman, Berry, Cirrincione, et al., 2008).

CALGB 150002 reported that subgroups from CALGB 9840 found FISH negative by central laboratory results, and also found to have chromosome 17 polysomy (n=19 +TRZ; n=19 -TRZ), showed a statistically significant increase in ORR (63 percent versus 26 percent, p=0.048) among those given trastuzumab plus paclitaxel compared with those given paclitaxel alone (Kaufman, Broadwater, Lezon-Geyda, et al., 2007). In contrast, ORR did not differ between treatment arms (36 percent in each) for centrally FISH-negative patients without chromosome 17 polysomy. The ORR difference between arms for the centrally FISH-negative subgroup with polysomy 17 (+/-TRZ; n=19 each) did not yield statistically significant differences between arms for either OS (p=0.538) or TTP (p=0.88).

HER2-negative patients with advanced or metastatic disease that progressed after an anthracycline, a taxane, and trastuzumab given capecitabine ± lapatinib. Few patients randomized to capecitabine with or without lapatinib in the EGF100151 trial were HER2 discordant (Table 7). Furthermore, outcomes were not reported separately for those found FISH positive but IHC negative by central laboratory testing (with lapatinib, n=15; without lapatinib, n=7), or those found FISH negative but IHC 3+ by central lab results (with lapatinib, n=1; without lapatinib, n=2). Investigators identified a total of 74 patients (23.5 percent of 315 tested in the central laboratory) whose local results were not confirmed by the central lab as meeting HER2 eligibility criteria of IHC 3+ or FISH positive/IHC2+ (Cameron, Casey, Press et al., 2008); distribution between treatment arms was not reported. In an exploratory Kaplan-Meier analysis, investigators found no statistically significant difference between arms (capecitabine with or without lapatinib) in PFS (HR=0.772; 95 percent CI: 0.386–1.543; p=0.46).

Conclusions and Discussion, Key Question 2

Adjuvant trastuzumab. Currently available evidence is inconclusive on outcomes of trastuzumab added to adjuvant chemotherapy for resected HER2-discordant or HER2-negative patients. Evidence on each subgroup may be used to generate hypotheses, but is too weak to test hypotheses, for the following reasons. All available evidence is from post-hoc analyses on subgroups not directly randomized or stratified by the HER2 subgroups of interest. Furthermore, available reports did not show direct comparisons of baseline characteristics and prognostic factors for the specific subgroups compared. Thus, it is uncertain whether the HER2-discordant or HER2-negative subgroups were balanced by treatment arm (i.e., with or without trastuzumab; although treatment arms appeared well-balanced across all patients randomized). Finally, the data used for the two adjuvant studies are from interim analyses, with inadequate followup to estimate median survival for all patients randomized, and inadequate information on median duration of followup in the specific subgroups compared. Thus, although these were large, well-designed and well-conducted randomized, controlled trials, since the overwhelming majority of patients they randomized were unequivocally HER2-positive, only poor quality evidence is presently available on outcomes of adjuvant trastuzumab in either HER2 discordant or HER2 negative patient subgroups.

Adjuvant trastuzumab in HER2-discordant patients. Evidence is unavailable to evaluate effects of trastuzumab specifically for HER2-discordant patients who are FISH positive but IHC negative (0, 1+) by central lab results. Analyses reported from each trial pooled outcomes for these patients with outcomes for those who tested FISH positive and IHC 2+. The latter subset (initially considered equivocal if tested first by IHC) was classified HER2 positive by each trial protocol, and is ultimately classified HER2 positive by algorithms in current guidelines. A more informative analysis limited to the discordant subgroup might compare outcomes with versus without trastuzumab using data pooled from B31 and N9831 on patients who were FISH positive but IHC 0 or 1+ by central lab tests. Results from a systematic review (see Table 6, Key Question 1) estimates this subgroup as 2.4 percent (95 percent CI: 1–4.3 percent) of all breast cancer patients (Dendukuri, Khetani, McIsaac, et al., 2007).

Sample size is insufficient for conclusions from HER2-discordant B31 (total n=31) and N9831 (total n=53) subgroups that tested FISH negative but IHC 3+ by central lab results. The proportion of FISH-negative, IHC 3+ patients is 2.2 percent across both trials (total randomized: 3,822). Results of the systematic review summarized in Table 6 (Key Question 1) estimate this subgroup as 1.2 percent (95 percent CI: 0.6–2.1 percent) of all breast cancer patients (Dendukuri, Khetani, McIsaac, et al., 2007). Although at least three other randomized trials investigated adjuvant trastuzumab, they confirmed eligibility by central or reference laboratory FISH tests before randomizing patients, and have not reported on either of the HER2 discordant subgroups of interest. Thus, large database or registry analyses may be the only source of better evidence on outcomes of adjuvant trastuzumab for the two HER2 discordant subgroups, which together comprise approximately 4 percent of all breast cancer patients.

Factors influencing discordant results. Discordant results may occur if one assay is correct and the other in error, either due to preanalytic, analytic, or postanalytic factors (see Key Question 1). As with any assay, 100 percent accuracy cannot be expected even from the most careful and proficient laboratories. Proficiency testing and other quality control and quality assurance measures to minimize false-negative and false-positive results are recommended in current practice guidelines (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007). However, concordance of different methods to classify an individual as HER2 positive or negative is at least partly independent from accuracy of performing a specific assay. Even with the most careful and highly accurate laboratory techniques, discordance in classification may occur between a method that detects gene amplification (FISH in these studies, but also true with CISH or SISH) and a method that detects protein overexpression (IHC in these studies, but also true with Western blots).

By current guidelines, clinicians may categorize identical discordant patients differently with respect to HER2 status, depending on the selection and sequence of tests they order. So, for example, FISH-positive and IHC 0 or 1+ patients (1 to 4 percent of cases; see Table 6, Key Question 1) would be classified HER2 positive if tested only by FISH, but would be classified HER2 negative if tested initially by IHC, since reflex FISH would not be performed. Conversely, FISH-negative and IHC 3+ patients (1 to 2 percent of cases; see Table 6) would be considered HER2 negative if tested only by FISH, but HER2 positive if tested initially by IHC. NSABP B31 and NCCTG N9831 report the frequency of these subsets based on careful central laboratory results for FISH and IHC assays, although results are pooled across some IHC scores (see “Results and Conclusions, Key Question 1”). However, these data do not permit assessment of the subset frequencies independent of tissue fixation artifacts that may have occurred at some local hospitals and laboratories, or the margin of error that might exist even in the most proficient laboratories. Nor can the clinical consequences of such discordances be assessed from the available evidence.

Adjuvant trastuzumab in HER2-negative patients. Scant but intriguing evidence suggests the hypothesis that some patients currently classified as HER2 negative may benefit from adjuvant trastuzumab. Data reported from B31 showed significantly longer DFS and RFI in FISH-negative IHC ≤2+ patients given trastuzumab than in similar patients managed without trastuzumab, whether the analysis did or did not include those who were IHC 0. However, a similar analysis of data from N9831 did not show significant differences. Since both were interim analyses of trials in which fewer than 25 percent of subjects had reached a failure event, neither provides conclusive evidence as yet, and follow up analyses from these trials will be of great interest. Blinded review of IHC and FISH scoring would also be useful for samples from these trials, and from other adjuvant trastuzumab trials that confirmed eligibility by central lab testing before randomizing each patient. Recent guidelines conclude that present evidence does not demonstrate improved outcomes with use of adjuvant trastuzumab for patients who would be classified HER2 negative by protocols of B31, N9831, and similar studies (Wolff, Hammond, Schwartz, et al., 2007a; Carlson, Moench, Hammond, et al., 2006; Hanna, O'Malley, Barnes, et al., 2007).

Importantly, the B31 and N9831 subgroup analyses combine results for HER2-negative patients many now consider to be different: those with the so-called “triple-negative” subtype (i.e., negative for HER2, estrogen receptor, and progesterone receptor), and the luminal subtypes (luminal A or luminal B) that are negative for HER2 but positive for at least one of the hormone receptors. These subtypes were initially defined in studies using microarrays to subdivide breast cancer patients by gene expression patterns (for reviews, see Peppercorn, Perou, and Carey, 2008; Razzak, Lin, and, Winer, 2008; Kang, Martel, and Harris 2008). There is evidence that the triple negative and luminal subsets differ with respect to prognosis, chemotherapy response, and outcomes (Carey, Dees, Sawyer, et al., 2007; Liedtke, Mazouni, Hess, et al., 2008), and they clearly differ with respect to effects of endocrine therapy. Further complexity comes from reports that there is substantial but incomplete overlap between triple negative patients and those classified in the “basal-like” subset by gene expression arrays (Cheang, Voduc, Bajdik, et al., 2008). Notably, new phase III trials have recently opened (and others are planned) specifically for patients with triple negative or “basal-like” breast cancer (Kilburn, 2008). Results from these studies will likely be more conclusive than analyses that pool all HER2-negative patients to determine outcomes for subsets of HER2-negative breast cancer.

Adjuvant trastuzumab in HER2-equivocal patients. Among patients with initially equivocal HER2 test results by current clinical practice guidelines (those scored 2+ if IHC is first, or HER2 gene copy number from 4.0 to 6.0 or HER2/CEP17 ratio from 1.8 to 2.2 if ISH is first), ultimately, most are definitively categorized as HER2 positive or HER2 negative after guideline-recommended followup testing. Data are presently unavailable either to estimate effects of adjuvant trastuzumab on outcomes for the subset with initially equivocal results subsequently classified HER2 positive, or to demonstrate lack of benefit in those subsequently classified HER2 negative. For the minority who remain equivocal after followup testing, the guidelines' treatment recommendation depends on whether the patient would have been included or excluded from key randomized, controlled trials. For example, patients with HER2/CEP17 ratios 2.0 or greater but less than 2.2 were included and randomized in the adjuvant trastuzumab trials. Therefore, the guidelines consider current evidence insufficient to deny these patients trastuzumab with adjuvant chemotherapy. In contrast, patients with HER2/CEP17 ratios 1.8 or greater but less than 2.0 were excluded from these trials, and the guidelines consider current evidence insufficient to include trastuzumab in their adjuvant therapy regimens. Figures 2 and 3 (see Key Question 1) include information on trial eligibility of patients whose test results are equivocal by each HER2 assay.

Advanced or metastatic disease. No data were reported on patients with advanced or metastatic disease and discordant results from IHC and ISH HER2 testing. Evidence is available from one trial (CALGB 9840; n=226) that randomized metastatic breast cancer patients who were HER2 negative by local laboratory testing to chemotherapy with or without trastuzumab (Seidman, Berry, Cirrincione, et al., 2008). Additionally, a small subset of advanced and metastatic patients randomized to chemotherapy with or without lapatinib in another trial (EGF100151; n=74) were found by central lab confirmatory testing not to meet protocol criteria for HER2 positivity (Cameron, Casey, Press, et al., 2008). Thus, one source of good quality evidence (CALGB 9840) and one source of moderate quality evidence (EGF100151) suggest that HER2-negative patients with advanced or metastatic disease do not benefit from treatments targeting the HER2 molecule. Additional evidence supporting this conclusion comes from an analysis of data pooled from three pivotal trials of trastuzumab for metastatic breast cancer. The analysis showed that among patients found IHC 2+ by the presently unavailable “clinical trial assay,” benefit from trastuzumab was limited to those subsequently shown to have amplified HER2 genes by FISH (Mass, Press, Anderson et al., 2005).

CALGB 15002 investigators compared outcomes with versus without trastuzumab for a subgroup of FISH-negative patients who either had (n=38) or did not have (n=103) polysomy 17, (Kaufman, Broadwater, Lezon-Geyda, et al., 2007). Overall response rate was significantly higher with versus without trastuzumab for those with polysomy 17, but was identical with or without trastuzumab for those without polysomy 17. In contrast, the N9831 study on adjuvant therapy (Reinholz, Jenkins, Hillman, et al., 2007) reported no impact of polysomy 17 on benefit from trastuzumab, and unpublished data from a second study (NSABP B31; Dr. S. Paik, personal communication, May 2008)) suggested the same finding. This might be due to different definitions of polysomy 17 for CALGB 15002 (average CEP17 copy number per cell greater than 2.2) and N9831 (more than 3 CEP17 signals in more than 30% of nuclei). It might also reflect differences between adjuvant therapy and treatment for metastatic disease with respect to polysomy 17 as a predictor of benefit from trastuzumab. Note also that studies reviewed for “Results and Conclusions, Key Question 1” report conflicting data on a possible association of polysomy 17 with overexpression of HER2 protein. Thus, presently available evidence leaves unanswered questions with respect to the utility of polysomy 17 to select patients for HER2-targeted therapy.

Key Question 3a

For breast cancer patients, what is the evidence on clinical benefits and harms of using HER2 assay results to guide selection of chemotherapy regimen?

Study Selection

The search strategy for studies on HER2 testing in breast cancer yielded 3,218 citations. Initial review of titles and abstracts selected 219 citations potentially relevant to Key Question 3 for retrieval and review as full articles. Of these, 161 were considered potentially relevant to Key Question 3a (HER2 status to guide choice of chemotherapy regimen) while 62 were considered potentially relevant to Key Question 3b (HER2 status to guide choice of hormonal therapy regimen). Four reports were considered for both question 3a and 3b.

Table 10

Summary design, treatment, patient characteristics, KQ3a
Study/DesignTreatmentsAge or Menopause StatusExtent of Disease(% of pts analyzed by HER2 status)
ER+PR+HER2+HER2-
Adjuvant chemotherapy for resected early breast cancer
Yang et al., 2003 series cyclophosphamide + methotrexate + fluorouracil (CMF; n=94)≥50 yr:≥3 cm: 67%NRNRIHC only:36%64%
52.1%N+: 62%
Gusterson et al., 2003; stratified RCT perioperative CMF (one cycle)post: 47% of n=760 of 1275 N- patients randomized >2 cm: 53%, HER2+of HER2+:IHC only:12.8%87.2%
40%, HER2-36%24%
no adjuvant therapyof HER2-:IHC only:20.8%79.2%
100% N0 51%38%
Multiple cycles of CMFpost: 45% of n=746 of 1229 N+ patients randomizedT size, NR; 100%of HER2+:IHC only:17.3%82.7%
node+; ≥4 nodes +:32%22%
perioperative CMF (one cycle)49%, HER2+of HER2-:
43%, HER2- 59%45%IHC only:21.6%78.4%
Moliterni et al., 2003; RCT 8 cycles CMF + 4 cycles doxorubicin (CMF → A; n=248 of 277 randomized)≥52 yr:~65%, <2.1 cmonly reported for all randomized to each armIHC only:18.1%81.9%
67% 100% N1
12 cycles of CMF alone (n=258 of 275 randomized)≥52 yr:IHC only:19.4%80.6%
69%
Colozza et al., 2005; RCT epirubicin(E), weekly for 4 months (n=133 of 166 randomized)>50 yr:≤2 cm: 46%55%63%IHC only:40.6%59.4%
51% 1–3 N+: 52%
6 cycles CMF (n=133 of 174 randomized)>50 yr:≤2 cm: 45%56%63%IHC only:27.8%72.2%
56%1–3 N+: 59%
Pritchard et al. 2006; RCT 6 cycles of CEF (n=312 of 351 randomized) 100% pre FISH:posneg62% NR by FISH: 24.0% 76.0%
6 cycles of CMF (n=316 of 359 randomized)T252%49%
100% pre1–3 N+57%63%56%NRby FISH:27.8%72.2%
Knoop et al., 2005; RCT 9 cycles of CEF (n=352 of 480 randomized)post: 31.5%T≥2.1 cm: 60.7%25%NRIHC 3+ or
1–3 N+: 29.5% FISH+ 32.5% 67.5%
9 cycles of CMF (n=421 of 500 randomized)post: 30.2%T≥2.1 cm: 57.6%27%NRIHC 3+ or
1–3 N+: 33.3% FISH+32.8%67.2%
Dressler et al., 2005, Thor et al., 1998; 3-arm RCT (CALGB 8541) 4 cycles high-dose CAF (n=179 of 519 randomized)a (A=doxorubicin)mn, 50.1 yrsmn T size, 2.91 cm68%54%FISH+17.3%82.7%
42.5% pre mn # N+, 4.51 IHC+24.8% 75.2%
6 cycles moderate-dose CAF (n=167 of 513 randomized)amn, 51.4 yrsmn T size, 2.88 cm71%65%FISH+20.7%79.4%
38.3% pre mn # N+, 4.43 IHC+25.7% 74.3%
4 cycles low-dose CAF (n=178 of 518 randomized)amn, 50.4 yrsmn T size, 3.07 cm66%58%FISH+18.8%81.2%
41.1% premn # N+, 4.92 IHC+22.9%77.1%
Del Mastro et al. 2004, 2005; RCT (GONO-MIG-1) up to 9 cycles FEC14 regimen (q2wk; n=370 of ~607 randomized)IHC 3+
T1: 47% N+: 62%CB1150 (13.5%)320
median, 54 yrsT2: 46% N-: 38%54%(86.5%)
6 cycles FEC21 regimen (q3wk; n=361 of ~607 randomized)range, 25–70T3–4: 5%42%IHC 3+
T? 1%CB1153 (14.7%)308
(85.3%)
Tanner et al., 2006; control arm from RCT 9 cycles of FEC (n=180 of 251 randomized; n=211 from HDC/AuSCS arm excluded)≥50 yr:HER2:posnegonly reported pooled data for both study armsCISH
42% of all testedT:2–5cm60%52%only:31.1%68.9%
5–9 N+41%47%
≥10 N+59%53%
Hayes et al., 2007; RCT (randomly selected 2 groups of 750 ea) 4 cycles AC → paclitaxel (n=1,570 randomized)post: 38%Grp1Grp157%not reported
Grp2
T>2cm66%64%NR
4 cycles AC → observation (n=1551 randomized)post: 38%1–3 N+48%46%Grp262%not reported
4–9 N+40%43%NR
Martin et al., 2005b; RCT 6 cycles DAC (n=630 with known HER2 status of 745 randomized) (D=docetaxel)median, 49 yrsT1: 40% 1–3N+: 63%155 (24.6%)475
range, 26–70T2: 52% ≥4N+: 37%ER+ &/or PR+: 76%(75.4%)
pre, 56% T3: 8%
6 cycles FAC (n=632 with known HER2 status of 746 randomized)median, 49 yrsT1: 43% 1–3N+: 62%164 (26.0%)468
range, 23–70T2: 51% ≥4N+: 38%ER+ &/or PR+: 76%(74.0%)
pre, 55%T3: 6%
Neoadjuvant (preoperative) chemotherapy for locally advanced breast cancer
Learn et al., 2005c; 3-arm RCT 4 cycles AC ± D (concurrent or after resection) (n=104 of 144 randomized)mean, 48 yrsT ≤ 2 cm: 28% N0:61%only reported data for n=121 with biopsy specimensTAB 250 (n=104 classified)
median, 47 yrsT 2–5 cm:47% N1:39%IHC+41 (39%)63 (61%)
range, 27–73T >5 cm: 25% N2: 0
Arriola et al., 2006; series 4 cycles of doxorubicin followed by surgery (n=232)mean, 47 yrsT3: 70%67%52%IHC + FISH
N1: 40% then CISH18%82%
Park et al., 2003; series 4 cycles of doxorubicin followed by surgery (n=67)≥50 yrs, 18%5–10 cm91%
>10 cm9%46%NRCISH only:46%54%
N statusNR
Zhang et al., 2003; series 3–6 cycles of FAC followed by surgery (n=97)T253%
≥50 yrs, 44%≥T334%65%56%IHC 3+
N-33%or FISH+28%72%
N+67%
Tulbah et al., 2002; series 3–4 cycles of paclitaxel + cisplatin followed by surgery (n=54)HER2+ HER2-HER2+of HER2+:
≤50 91% 84%HER2-55%50%IHC 3+41%59%
pre 91% 78%≥T386%78%
N036%28%of HER2-:
N155%56%50%34%
N29%16%
Tinari et al., 2006; series median 4 (range, 3–6) cycles FEC, q3wk followed by surgery (n=77)median, 46 yrsT 2–5 cm: 75%62%IHC 3+
range, 25–74T >5 cm: 25% 45%or 2+ & FISH+20 (26%)57 (74%)
First- or second-line chemotherapy for advanced or metastatic breast cancer
Harris et al., 2006; RCT paclitaxel (n=165 of 474 randomized to 3 dose arms, but pooled for HER2 analysis)median: 54.9 yr# metastatic sites:ER+ &/or PR+:FISH26%74%
median, 158%CB1120%80%
Hercep. 3+21%79%
Di Leo et al., 2004; RCT doxorubicin (A; n=91 of 165 randomized)54 yr≥3 sites:46%NRIHC+ ≥1% & FISH+:
visceral: 79% 16% 69%
docetaxel (T; n=85 of 161 randomized)51 yr≥3 sites:51%NRIHC+ ≥1% & FISH+:
visceral:76%25%59%
Konecny et al., 2004; RCTepirubicin + cyclophosphamide (EC; n=137 of 254 randomized)mean: 55 yr1–2 sites:57%52.6%48.9%FISH only36%64%
(31–74) ≥3 sites: 42%
epirubicin + paclitaxel (ET; n=138 of 262 randomized)mean: 55 yr1–2 sites:53%60.9%49.3%FISH only35%65%
(29–75)≥3 sites:42%
a

Data on eligible patients randomized to each arm are from Budman, Berry, Cirrincione, et al., 1998.

b

Except for HER2 status, data shown compare all patients randomized to TAC versus all patients randomized to FAC

c

Except for ER, PR and HER2 status, data shown pool evaluable patients (n=142) randomized to AC, AC+D, or AC→adjuvant D

Abbreviations: Please refer to the text or list of abbreviations at the end of the report for definition of specific chemotherapy regimens/agents.

Grp: group; IHC: immunohistochemistry; FISH: fluorescent in situ hybridization; mn: mean; q wk: every week; q3wk: every 3 weeks;

Table IIIa-A

Design, Enrollment and Treatment
StudyDesignTherapeutic Settingn, Enrolled (Randomized)n, Evaluatedn, Withdrawn (Lost to F/U)Treatment Regimen (Agents)
Adjuvant Chemotherapy
Yang et al. 2003 rec. # 8840single arm retrospective seriesadjuvant therapy post mastectomy94 (identically treated; 13 of 107 in series not given adj. chemo)94 (outcomes reported separately)0cyclophosphamide + methotrexate + fluorouracil (CMF)
Gusterson et al. 2003; rec. # 43690RCT; separate randomization by nodal statusadjuvant therapy: none versus one cycle peri-op versus prolonged1275 node-neg 1229 node-pos760 node-neg 746 node-pos515 node-neg 483 node-pos (no samples)node-neg: peri-op CMF versus no adj therapy; node-pos: peri-op versus continuous CMF
Moliterni et. al. 2003; rec. # 10210RCT retrospective analysis by HER2 statusadjuvant therapy post mastectomy or quadrantectomy with axillary dissect. (1–3 nodes+)55250646 (HER2 status unknown)CMF alone (12 cycles) versus CMF for 8 cycles then doxorubicin for 4 cycles (CMF→(A)
Colozza et al. 2005; rec. # 3820RCT retrospective analysis by HER2 statuspost-operative adjuvant therapy; node- if ER/PR neg or node+ with ≤9 nodes involved34826682 (no tumor samples)CMF for 6 cycles versus epirubicin weekly for 4 months
Pritchard et al. 2006; rec. # 1760RCT retrospective analysis by HER2 statusadjuvant therapy post mastectomy or lumpectomy with axillary dissection; all node+710634 (by IHC) 628 (by FISH)71 (no tumor samples) 5 (IHC & FISH failed)CMF (Cx) versus CEF (Tx); each given for 6 cycles; no endocrine therapy after adjuvant chemoTx
Knoop et al. 2005; rec. # 3450RCT (2 × 2) retrospective analysis by HER2 statusadjuvant therapy post mastectomy or lumpectomy with axillary dissection1,195 (980 Danes eligible)773 (805 tested for HER2 status)CMF: 79 of 500 CEF: 128 of 480CMF (Cx) versus CEF (Tx); each given for 9 cycles ± pamidronate, daily for 4 years; no adjuvant tamoxifen
Dressler et al. 2005, rec. # 4280; Thor et al. 1998, rec. # 40880 CALGB trial 8541 & lab companion study 88693-arm RCT retrospective analysis by HER2 statusadjuvant therapy post mastectomy or lumpectomy with axillary dissection; all node+1,549 (in CALGB 8541)524 (of 993 in CALGB 8869)1,025 (556 not in 8869 study + 469 not in Dressler et al.)4 cycles high dose CAF (600/60/600 mg/m2) q4wk versus 6 cycles moderate dose CAF (400/40/400 mg/m2) q4wk versus 4 cycles low dose CAF (300/30/300 mg/m2) q4wk; similar proportions in each arm given 5 years of twice daily tamoxifen (41%, 40%, 34%) for ER+, post-menopausal disease
Del Mastro et al. 2004, 2005; rec. # 48020 GONO-MIG-1 trialRCT retrospective analysis by HER2 statusadjuvant therapy for node- high-risk or node+ patients1,214731483 (specimens unavailable for HER2 testing)6 cycles FEC21 regimen q3wk versus up to 9 cycles FEC14 regimen q2wk (same drug doses in each regimen; ER+ & PR+ patients in each arm received tamoxifen qd for 5 years
Tanner et al. 2006; rec. # 1820STD-dose arm of RCT; retrospective analysis by HER2 statusadjuvant therapy post mastectomy or lumpectomy with axillary dissection525 (251 to STD-dose arm)391 (180 for STD-dose arm)274 (71 from STD-dose arm; no samples)FEC (9 cycles; individualized doses based on hematological toxicity) versus HDC/AuSCS using CTCb after 3–4 cycles of FEC (did not abstract data from HDC/AuSCS arm); loco-regional RTx + 5 years of tamoxifen for all patients
Hayes et al. 2007; rec. # 47610 CALGB 9344subset from 3 X 2 RCT; retrospective analysis by HER2 statusadjuvant therapy for node+ patients after surgery with negative margins1500 (2 groups, 750 each, randomly selected from 3121 in RCT)1322178 (no tumor specimens; 1621 RCT patients not analyzed by HER2 status)4 cycles of AC (randomized to 1 of 3 doxorubicin doses) followed by 4 cycles of paclitaxel or observation (a second; separately reported doxorubicin dose did not change outcomes
Martin et al. 2005; rec # 47650RCT pre-planned subgroups; 2nd interim analysis of ongoing trialadjuvant therapy for node+ patients after surgery with negative margins14911262229 (no tumor specimens6 cycles (3 wks each) of docetaxel + doxorubicin + cyclophosphamide (DAC) versus flluorouracil + doxorubicin + cyclophosphamide (FAC); equal proportion (ER or PR)+ patients, each arm took qd tamoxifen for 5 years
Neoadjuvant (Pre-operative) Chemotherapy
Learn et al. 2005; rec. # 476403 arm RCT; retrospective analysis by HER2 statuspre-operative chemotherapy for operable breast cancer (T1–3, N0–1, M0)14410440 (no tumor specimen, 23; HER2 status unknown, 17)4 cycles AC ± docetaxel (D) q3wk, followed by surgery; 3rd arm given AC + post-surgery D (pooled with AC alone controls for analysis by HER2 status); all patients given 5 yrs of TAM qd
Arriola et al. 2006; rec # 950prospective single-arm seriesprimary chemotherapy for T2–3 N0–1 operable breast cancer2322320doxorubicin (75 mg/m2) 4 cycles, q3wk, then lumpectomy or mastectomy + 3-level axillary dissect.
Park et al. 2003; rec # 9960retrospective single-arm seriespre-operative chemotherapy for locally-advanced disease67670doxorubicin (50 mg/m2) 4 cycles, q3wk, prior to breast conservation or mastectomy
Zhang et al. 2003; rec # 9820retrospective single-arm seriespre-operative chemotherapy for operable breast cancer97970FAC q3wk (6 cycles for 7 patients, 5 cycles for 1, 4 cycles for 81, and 3 cycles for 8)
Tulbah et al. 2002; rec # 11560retrospective single-arm seriespre-operative chemotherapy for locally-advanced, non-inflammatory breast cancer54540paclitaxel + cisplatin, q3wk, for 3 or 4 cycles
Tinari et al. 2006; rec # 2300retrospective single-arm seriespre-operative chemotherapy for operable breast cancer77 (selected; 16 ineligible of 93 consecutive)770FEC q3wk (median 4 cycles; range 3–6 cycles)
Chemotherapy for Advanced or Metastatic Disease
Harris et al 20061; rec. # 390, no data on no. of sites, 1994-?RCT/RET; CALGB 9342Advanced (Stage IV or inoperable); first or second line Tx. No concurrent hormonal therapy474165 (of n=175 w adequate tumor blocks; n= 10, all bio-marker tests unsuccessful)299 (n=273, no blocks; n=26, blocks inadequate); similar characteristics & outcomes, w/wo blocks, except DFSPaclitaxel; compared 3 doses—175, 210, or 250 mg/m2 q3wk to failure (progression or intolerable toxocity)—but data combined for this analysis)
Di Leo et al 2004; rec. # 5970; 29 of 41 sites in original trial, 7/94–1/972Phase III RCT (not blinded); TAX 303 trial; secondary analysisMetastatic disease; first or second line therapy; prior CMF required (adj or for mets); prior anthraciclines or taxanes excluded326176150 (n=74, Grp1; n=76, Grp 2)Grp 1: doxorubicin (75 mg/m2) (A; n=91) vs Grp 2: docetaxel (100 mg/m2) (T; n=85) every 3 wks; max 7 cycles absent progression or toxicity. No stat sig differences between populations with versus without specimens for HER2 analysis.
Konecny et al 2004; rec. # 6740; ~71 sites, Germany, 10/96–12/99RCT; secondary analysisMetastatic; no prior chemo for metastatic disease, no metastasis to CNS or to bone only. Stratified by 0 vs 1 prior hormonal Tx for metastatic disease.579 enrolled; 516 eligible were randomized & treated275241 (n=219, no block; n=17, technically inadequate; n=5, no invasive cancer; no SS diffs between pts w/wo known HER2 status.Grp 1: epirubicin (60 mg/m2) and cyclophosphamide (600 mg/m2) (EC, n=137); Grp 2: epirubicin (60 mg/m2) and paclitaxel (175 mg/m2)(ET, n=138). Chemo given q3 wks for max of 10 cycles; median=6 cycles.
1

Some data from: Winer EP. Berry DA. Woolf S. Duggan D. Kornblith A. Harris LN. Michaelson RA. Kirshner JA. Fleming GF. Perry MC. Graham ML. Sharp SA. Keresztes R. Henderson IC. Hudis C. Muss H. Norton L. Failure of higher-dose paclitaxel to improve outcome in patients with metastatic breast cancer: cancer and leukemia group B trial 9342. J Clin Oncol 22(11):2061–8, 2004 Jun 1.

2

Some data from: Chan S, Friedrichs K, Noel D et al. Prospective randomized trial of docetaxel versus doxorubicin in patients with metastatic breast cancer. J Clin Oncol 1999;17(8):2341–54.

Twenty separate studies met selection criteria and were abstracted for Key Question 3a (Table 10; Appendix Table IIIa-A *). Eleven studies investigated adjuvant chemotherapy for resected early stage breast cancer, including nine randomized, controlled trials, an uncontrolled series, and the standard-dose control arm of a randomized, controlled trial of high-dose chemotherapy with autologous stem-cell support (HDC/AuSCS). Six studies investigated neoadjuvant (preoperative) chemotherapy for locally advanced breast cancer; one was a randomized, controlled trial and five were uncontrolled, single-arm series. Three studies investigated first- or second-line therapy for advanced or metastatic breast cancer. Two randomized, controlled trials compared different regimens; the third randomized, controlled trial compared different doses of one drug, but pooled arms for the analysis by HER2 status.

Available Studies

Eleven studies on postsurgical adjuvant chemotherapy. The available evidence included one retrospective analysis of an uncontrolled single-arm series (Yang, Klos, Zhou, et al., 2003), and ten randomized, controlled trials. However, for one of the randomized, controlled trials, (Tanner, Isola, Wiklund, et al., 2006), one arm was excluded, since patients received HDC/AuSCS. Each randomized, controlled trial was designed to compare outcomes of treatment regimens in populations not selected or stratified for HER2 status, and most published earlier reports that compared patients, prognostic factors, and outcomes by treatment arm for all randomized patients. With only one exception (Martin, Pienkowski, Mackey, et al., 2005), reports from randomized, controlled trials included for Key Question 3a were secondary or correlative analyses on patient subgroups with archived tissue samples that permitted HER2 testing. The proportion of originally randomized patients included in the analyses by HER2 status ranged from 34 to 92 percent (see Table 10). A subset of trials compared baseline characteristics and known prognostic factors between the subgroups with known HER2 status and those with undetermined HER2 status, and a smaller subset also compared outcomes. None of these studies used trastuzumab for HER2-positive patients; studies addressing the use of trastuzumab are included in the discussion of Key Question 2.

Studies on the CMF regimen. The uncontrolled series (Yang, Klos, Zhou, et al., 2003; n=94) and one comparative randomized, controlled trial (Gusterson, Gelber, Goldhirsch, et al., 2003; n=2,504 randomized) studied the cyclophosphamide plus methotrexate plus fluorouracil (CMF) regimen. The Gusterson and co-workers trial separately randomized groups of node-negative and node-positive patients. Tissue blocks for determining HER2 status were unavailable for 515 (40 percent) of 1,275 randomized node-negative patients and for 483 (39 percent) of 1,229 randomized node-positive patients. Node-negative patients were randomized to one perioperative cycle of adjuvant CMF or to observation. Node-positive patients were randomized to multiple cycles of adjuvant CMF or to one perioperative cycle of adjuvant CMF. The relevance of these findings for current practice may be limited as taxane-based regimens have largely replaced CMF when anthracyclines are not used, particularly for hormone-receptor-negative patients.

Studies on anthracycline-based regimens. Four randomized, controlled trials (Moliterni, Menard, Valagussa, et al., 2003; Colozza, Sidoni, Mosconi, et al., 2005; Pritchard, Shepherd, O'Malley, et al., 2006; Knoop, Knudsen, Balslev, et al., 2005) compared CMF versus anthracycline-based regimens, and a fifth randomized, controlled trial compared an anthracycline-based regimen without autologous stem-cell support (AuSCS) versus a higher-dose regimen with AuSCS (Tanner, Isola, Wiklund, et al., 2006). Only the non-AuSCS arm of the Tanner and co-workers study met selection criteria for data abstraction. Moliterni, Menard, Valagussa, et al. (2003) compared CMF followed by doxorubicin (CMF→A) versus CMF alone, and included 92 percent of originally randomized patients. Colozza, Sidoni, Mosconi, et al. (2005) compared epirubicin (E) alone versus CMF, and included 76 percent of originally randomized patients. Pritchard, Shepherd, O'Malley, et al. (2006) and Knoop, Knudsen, Balslev, et al. (2005) compared cyclophosphamide plus epirubicin plus fluorouracil (CEF) versus CMF, although the Pritchard and co-workers study gave 6 cycles while the Knoop and co-workers study gave 9 cycles. Pritchard and co-workers included 89 percent of originally randomized patients while Knoop and co-workers included 79 percent. Tanner, Isola, Wiklund, et al. (2006) also gave 9 cycles of CEF in the non-AuSCS arm of their trial, although the doses administered were higher than those in the Pritchard and Knoop trials. Outcomes by HER2 status for 72 percent of those randomized to the non-AuSCS arm are considered a single-arm study in this review.

Two randomized, controlled trials with two reports each compared different doses (Dressler, Berry, Broadwater, et al., 2005; Thor, Berry, Budman, et al., 1998) or dose intensities and schedules (Del Mastro, Bruzzi, Nicolo, et al., 2005; Del Mastro, Bruzzi, Venturini, et al., 2004) for anthracycline-based regimens. The Dressler and co-workers study investigated interaction of HER2 status with dose in 524 patients from the Cancer and Leukemia Group B (CALGB) trial 8541. This trial randomized 1,549 patients to high-dose (600/60/600 mg/m2 every four weeks for 16 weeks), moderate-dose (400/40/400 mg/m2 every four weeks for 24 weeks) or low-dose (300/30/300 mg/m2 every four weeks for 16 weeks) regimens of cyclophosphamide, doxorubicin and fluorouracil (CAF) (Budman, Berry, Cirrincione, et al., 1998). Although earlier reports (Thor, Berry, Budman, et al., 1998; Muss, Thor, Berry, et al., 1994) included different proportions of randomized patients tested for HER2 status by IHC and/or PCR, Dressler and colleagues compared outcomes separately by assay method (IHC, FISH, or PCR) for HER2 status subgroups from each dose arm (n=524, 33.8 percent of originally randomized patients).

In the GONO-MIG-1 study, Del Mastro and colleagues (2004, 2005) randomized 1,214 patients to either six cycles of CEF every three weeks (FEC21) or up to nine cycles at the same dose (600/60/600 mg/m2) every two weeks (FEC14). The analysis by HER2 status included 731 (60 percent) of originally randomized patients.

Studies on regimens with a taxane. Two randomized, controlled trials investigated effects of HER2 status on outcomes of regimens with versus without a taxane (Hayes, Thor, Dressler, et al., 2007; Martin, Pienkowski, Mackey, et al., 2005). Hayes and colleagues (2007; CALGB trial 9344) randomized 3,121 patients to doxorubicin plus cyclophosphamide (AC) followed by paclitaxel or observation. The trial used a 3 × 2 factorial design to compare three doses of doxorubicin in AC, each followed or not by paclitaxel. Since outcomes were not statistically significantly different across doxorubicin doses, the analysis of outcomes with versus without paclitaxel by HER2 status pooled patients from all three doxorubicin doses. Two groups of 750 patients each were randomly selected for this correlative analysis, but tissue blocks were available and analyzed for only 1,322 (42 percent of those originally randomized).

Martin, Pienkowski, Mackey, et al. (2005) stratified patients (n=1,491) by number of involved axillary lymph nodes and randomized them to six three-week cycles of docetaxel plus doxorubicin plus cyclophosphamide (TAC) or fluorouracil plus doxorubicin plus cyclophosphamide (FAC). The preplanned analysis by HER2 status included 1,262 (85 percent) of originally randomized patients. Patients were not stratified by HER2 status. In the TAC group, 20.8 percent were HER2 positive and 15.4 percent lacked tumor specimens for measuring HER2; in the FAC group, 22 percent were HER2 positive and 15.3 percent lacked tumor specimens. The study does not report the distribution of other prognostic factors by treatment group and HER2 status combined, which would be useful in ensuring balance in this subset of trial patients with known HER2 status.

Table 11

Hierarchy of evidence, KQ3a
Level of EvidenceStudynSettingTreatmentsOutcome Results
Adjuvant chemotherapy for resected early breast cancer
HER2 stratified or HER2-guided RCT
RCT prespecified MV SGA Martin 20051262adjuvantTAC vs. FACDFSCox regression treatment by FISH
HER2 interaction, p=NS  FISH+ TAC > FAC, FISH- TAC > FAC
RCT post-hoc MV SGA Gusterson 20031506adjuvantLN-: no tx vs. CMFOSLN- adjusted Cox regression IHC
HER2+, tx < cx p=NSLN+: periop CMF vs. prolonged CMFLN- adjusted Cox regression IHC
HER2-, tx ≈ cx p=NSLN+ adjusted Cox regression IHC
HER2+, tx < cx p=NSLN+ adjusted Cox regression IHC
HER2-, tx > cx p<0.05DFSLN- adjusted Cox regression IHC
HER2+, tx < cx p=NSLN- adjusted Cox regression IHC
HER2-, tx > cx p=NSLN+ adjusted Cox regression IHC
HER2+, tx > cx p=NSLN+ adjusted Cox regression IHC
HER2-, tx > cx p<0.05
Moliterni 2003506adjuvantCMF→A vs. CMFOSCox regression treatment by IHC
HER2 interaction p=0.052 HER2+ tx > cx p= NS, HER2- tx < cx p=NS
RFSCox regression treatment by IHC
HER2 interaction p=NS HER2+ tx > cx p= NS, HER2- tx < cx p=NS
Colozza 2005266adjuvantCMF vs. epirubOSCox regression treatment by IHC
HER2 interaction p=NS cx HER2+ < HER2- p=0.024, tx
HER2+ < HER2- p=NS
HER2 interaction p=NSRFSCox regression treatment by IHC
HER2+ < HER2-p=NS cx HER2+ HER2- p=NS, tx
Pritchard 2006628adjuvantCMF vs. CEFOSCox regression treatment by FISH
HER2 interaction p=0.02
 HER2+ tx > cx p=0.06, HER2- tx ≈ cx p=NS
RFSCox regression treatment by FISH
HER2 interaction p=0.02 HER2+ tx > cx p=0.003, HER2- tx ≈ cx p=NS
Knoop 2005805adjuvantCMF vs. CEFOSCox regression HER2+ tx > cx p=0.09, HER2- tx > cx p=0.23
RFSCox regression HER2+ tx > cx p=0.10, HER2- tx > cx p=0.10
Dressler 2005521adjuvantCAF: high vs. mode-DFSCox regression FISH HER2 by CAF
dose interaction, p=0.033rate vs. low doseCox regression IHC HER2 by CAF
dose interaction, p=0.0003Cox regression PCR HER2 by CAF
dose interaction, p=0.043 FISH+/PCR+/IHC+ high > moderate ≈ low dose
 FISH-/PCR-/IHC- high ≈ moderate ≈ low dose
Del Mastro 2004731adjuvantFEC q2wk vs. q3wkDFSCox regression IHC HER2 by Tx
schedule interaction, p=0.12 FEC q2wk HER2 + ≈ HER2-, FEC
q3wk HER2+ < HER2-OSCox regression IHC HER2 by Tx
schedule interaction, p=0.38 FEC q2wk HER2 + ≈ HER2-, FEC
q3wk HER2+ < HER2-
Hayes 20071500adjuvantAC vs. AC→POSCox regression treatment by FISH
HER2 interaction p=0.01DFS HER2+ tx > cx, HER2- tx ≈ cx Cox regression treatment by FISH
HER2 interaction p=0.01HER2+ tx > cx, HER2- tx ≈ cx
RCT treatment by HER2 SGA
1-arm prespecified MV analysis
1-arm post-hoc MV analysis
1-arm UV analysis Yang 200394adjuvantCMFDFSIHC HER2+ ↓ vs. HER2- p=0.002
Tanner 2006180adjuvantFECOSCISH HER2+ < HER2- but not statistical tests described
RFSCISH HER2+ < HER2- but not statistical tests described
Neoadjuvant (preoperative) chemotherapy for locally advanced breast cancer
HER2 stratified or HER2-guided RCT
RCT prespecified MV SGA
RCT post-hoc MV SGA Learn 2005104neoadjuvantAC vs. AC+DpCRIHC HER2+, AC vs. AC+D, p=NS
IHC HER2-, AC vs. AC+D, p=NS
cORRIHC HER2+, AC vs. AC+D, p=NS
(CR+PR)IHC HER2-, AC vs. AC+ D, p<0.05
RCT treatment by HER2 SGA
1-arm prespecified MV analysis
1-arm post-hocPark 200367neoadjuvantdoxorubpRespCISH HER2+ > HER2- p=0.013
MV analysis Zhang 200397neoadjuvantFACDFSCISH HER2+ ≈ HER2- p=NS
ORRIHC HER2+ > HER2- p=NS
pRespIHC HER2+ > HER2- p=NS
1-arm UV analysis Arriola 2006229neoadjuvantdoxorubpRespCISH HER2+ > HER2- p=0.03
Tulbah 200252neoadjuvantpaclit+cisplpRespIHC HER2+ ≈ HER2- p=NS
OSIHC HER2+ (3+) ≈ HER2- p=NS
OSIHC HER2+ (2+/3+) ≈ HER2- p=0.051
DFSIHC HER2+ (3+) ≈ HER2- p=NS
DFSIHC HER2+ (2+/3+) ≈ HER2- p=0.09
Tinari 200677neoadjuvantFECpResp (pCR+MRD)IHC 3+ or IHC2+/FISH+ HER2+ vs. HER2-, p=0.008
First- or second-line chemotherapy for advanced or metastatic breast cancer
HER2 stratified or HER2-guided RCT
RCT prespecified MV SGA
RCT post-hoc MV SGA Di Leo 2004149metastaticdoxorub vs. docetaxOSCox regression treatment by IHC HER2 interaction p=.10
 IHC/FISH HER2+ tx < cx p=NS, HER2- tx > cx p=.07
TTPCox regression treatment by IHC HER2 interaction p=NS
 IHC/FISH HER2+ tx > cx p=NS, HER2- tx > cx p=NS
Resplogistic regression treatment by IHC HER2 interaction p=.01
 IHC/FISH HER2+ tx > cx p=.04, HER2- tx > cx p=NS, HER2? tx ≈ cx p=NS
Konecny 2004275metastaticepirub+cyclophosph vs. epirub+paclitaxelOSCox regression treatment by IHC HER2 interaction p=NS
 IFISH HER2+ tx > cx p=.059, HER2- tx ≈ cx p=NS
PFSCox regression treatment by IHC HER2 interaction p=.109
 IFISH HER2+ tx > cx p=.062, HER2- tx ≈ cx p=NS
ORRlogistic regression treatment by IHC HER2 interaction p=NS
 IFISH HER2+ tx > cx p=.005, HER2- tx > cx p=.046
RCT treatment by HER2 SGA
1-arm prespecified MV analysis
1-arm post-hoc MV analysis
1-arm UV analysisHarris 2006156metastaticpaclitaxelOSIHC CB11 HER2+ < HER2- p=NS
OSFISH HER2+ < HER2- p=NS
OSIHC HercepTest HER2+ ≈ HER2- p=NS
ORRIHC CB11 HER2+ ≈ HER2- p=NS
ORRFISH HER2+ ≈ HER2- p=NS
ORRIHC HercepTest HER2+ > HER2- p=.026

Abbreviations: Please refer to the text or list of abbreviations at the end of the report for definition of specific chemotherapy regimens/agents. cx: control; DFS: disease-free survival; HR: hazard ratio; MV: multivariate; ORR: overall response rate; OS: overall survival; q2wk: every 2 weeks; q3wk: every 3 weeks; RCT: randomized, controlled trial; RFS: recurrence-free survival; SGA: subgroup analysis; TTP: time to progression; tx: treatment; UV: univariate analysis;

Evidence hierarchy. The first section of Table 11 categorizes available studies on HER2 status and outcomes of adjuvant chemotherapy according to the evidence hierarchy used in this evidence report (see “Methods”). No trials stratified patients by HER2 status or randomized patients to therapy guided or not guided by HER2 status, the highest category of evidence. Only one randomized, controlled trial that compared TAC versus FAC, reported a preplanned multivariate subgroup analysis (Martin, Pienkowski, Mackey, et al., 2005). Eight randomized, controlled trials, one that compared CMF versus no or minimal CMF, four that compared CMF versus an anthracycline-based regimen, two that compared different doses or schedules of anthracycline-based regimens, and one that compared AC alone versus followed by paclitaxel, reported post-hoc multivariate subgroup analyses. Finally, single-arm data from two reports provided univariate analyses by HER2 status.

Table 12

Study quality ratings, KQ3a
StudyProspective designPrespecified hypotheses about relation of marker to outcomeLarge, well-defined, representative study populationMarker assay methods well-describedBlinded assessment of marker in relation to outcomeHomogeneous treatment(s), either randomized or rule-based selectionLow rate of missing data (≤ 15%)Sufficiently long follow-upWell-described, well-conducted multivariate analysis of outcome: 1) clear candidate variable selection, 2) clear, appropriate model-building guidelines, 3) assumptions tested, 4) standard prognostic variables included, 5) continuous variables well handled, 6) validation
1)2)3)4)5)6)
Adjuvant Chemotherapy
Yang et al., 2003NNNY?YY?NA
Gusterson et al., 2003YNYY?YNmed: 6 yrs???Y?N
Moliterni et al., 2003YNYY?YYmed: 14.8 yrs??YY?Y
Colozza et al., 2005YNYYYYNmin 8 yrs?N?Y?N
Pritchard et al., 2006YNYY?YYmed: 10 yrs???Y?N
Knoop et al., 2005YNYY?YNmed: 10 yrs?NY??N
Dressler et al., 2005; Thor et al., 1998YYYYYYNmed: 9 yrsY??Y?Y
Del Mastro et al., 2004, 2005;YNYYYYNmed: 6.7 yrsYN?N?N
Tanner et al., 2006YNYYYYN?NA
Hayes et al., 2007YNYYYYNmed: ~10 yrsYY?Y?Y
Martin et al., 2005YYYN?YYmed: 4.6 yrsYY?Y?N
Neoadjuvant (Preoperative) Chemotherapy
Learn et al., 2005YNNN?YNpCR at resection??NA??N
Arriola et al., 2006YYYY?YYpCR at resection?NNAN?N
Park et al., 2003NNNY?YYpCR at resectionNA
Zhang et al., 2003NNNN?YYpCR at resectionNA
Tulbah et al., 2002NNNYYYYpCR at resectionNA
Tinari et al., 2006NNNYYYYpCR at resection??NAY??
Chemotherapy for Advanced or Metastatic Disease
Harris et al., 2006YNYYYYNmed: 8.3 yrs???Y?N
Di Leo et al., 2004YNYYYYNmed: 23 months?N?N?N
Konecny et al., 2004YNYY?YN??NY??N
Study quality assessment. The first section of Table 12 shows that, of nine studies that analyzed the relationship of HER2 status to outcome differences in previously completed randomized, controlled trials on adjuvant chemotherapy, each was prospectively designed; included a large, well-defined and representative study population; and treated patients in each study arm homogeneously, or used rule-based selection for non-study therapies. However, only two reports (Dressler, Berry, Broadwater, et al., 2005; Martin, Pienkowski, Mackey, et al., 2005) included a prespecified hypothesis on the relationship of HER2 status to differences between regimens in treatment outcome. Each study adequately described the assays and thresholds they used to for classify patients' HER2 status, but only five (Colozza, Sidoni, Mosconi, et al., 2005; Dressler, Berry, Broadwater, et al., 2005; Del Mastro, Bruzzi, Nicolo, et al., 2005; Tanner, Isola, Wiklund, et al., 2006; Hayes, Thor, Dressler, et al., 2007) reported that individuals who assessed HER2 status were blinded to patient and tumor factors and to treatment outcomes. Only three studies from randomized, controlled trials (Moliterni, Menard, Valagussa, et al., 2003; Pritchard, Shepherd, O'Malley, et al., 2006; Martin, Pienkowski, Mackey, et al., 2005) included ≥85 percent of originally randomized patients. However, a fourth (Hayes, Thor, Dressler, et al., 2007) randomly selected two large subsets (n=750 each) and separately analyzed more than 85 percent of patients in each. Six studies from randomized, controlled trials (Moliterni, Menard, Valagussa, et al., 2003; Colozza, Sidoni, Mosconi, et al., 2005; Pritchard, Shepherd, O'Malley, et al., 2006; Knoop, Knudsen, Balslev, et al., 2005; Dressler, Berry, Broadwater, et al., 2005; Hayes, Thor, Dressler, et al., 2007) had 9 or more years' median follow-up, but in only one of these (Moliterni, Menard, Valagussa, et al., 2003) was median follow-up ~15 years. Reporting of methodologic details for multivariate analyses was inadequate in all studies.

Six studies on preoperative neoadjuvant chemotherapy. Six studies, including one randomized, controlled trial and five uncontrolled series, compared outcomes by HER2 status for patients undergoing neoadjuvant (preoperative) chemotherapy. The randomized, controlled trial (Learn, Yeh, McNutt, et al., 2005) randomized patients (n=144) to one of three arms: doxorubicin plus cyclophosphamide (AC), AC plus docetaxel (AC+D), or AC followed by docetaxel after resection (AC→D). Analysis of pathologic outcomes at resection pooled patients from the AC and AC→D arms and compared these versus the AC+D arm. The secondary, unplanned analysis by HER2 status included 104 (72 percent) of originally randomized patients.

Two uncontrolled series, one prospective (n=232, Arriola, Moreno, Varela, et al., 2006) and the other retrospective (n=67, Park, Kim, Lim, et al., 2003) reported on patients given doxorubicin alone. One uncontrolled retrospective series (n=97, Zhang, Yang, Smith, et al., 2003) reported on patients given three to six cycles of fluorouracil plus doxorubicin plus cyclophosphamide (FAC). A similar uncontrolled, retrospective series (n=77; Tinari, Lattanzio, Natoli, et al., 2006) reported on patients given three to six cycles of fluorouracil plus epirubicin plus cyclophosphamide. Finally, one uncontrolled retrospective series (n=54, Tulbah, Ibrahim, Ezzat, et al., 2002) reported on patients given three or four cycles of paclitaxel plus cisplatin. Each series reported outcomes by HER2 status for all patients (n=232 for the Arriola and co-workers series; n<100 for each of the others).

Evidence hierarchy. As shown in Section 2 of Table 11, no studies on neoadjuvant chemotherapy reported either of the two highest evidence categories. The only study from a randomized, controlled trial on neoadjuvant chemotherapy (Learn, Yeh, McNutt, et al., 2005) reported a post-hoc multivariate subgroup analysis. Two series (Park, Kim, Lim, et al., 2003; Zhang, Yang, Smith, et al., 2003) reported post-hoc multivariate subgroup analyses, while three series reported univariate analyses only.

Study quality assessment. Section 2 of Table 12 shows that only two studies (one a randomized, controlled trial) on neoadjuvant chemotherapy were prospectively designed (Learn, Yeh, McNutt, et al., 2005; Arriola, Moreno, Varela, et al., 2006), and only one reported a prespecified hypothesis for the relationship of HER2 status to outcome of neoadjuvant chemotherapy (Arriola, Moreno, Varela, et al., 2006). Only one study (Arriola, Moreno, Varela, et al., 2006) included ≥100 patients. Four of six (Arriola, Moreno, Varela, et al., 2006; Park, Kim, Lim, et al., 2003; Tulbah, Ibrahim, Ezzat, et al., 2002; Tinari, Lattanzio, Natoli, et al., 2006; but not Learn, Yeh, McNutt, et al., 2005) adequately described the assays and thresholds used to classify patients' HER2 status, but only two (Tulbah, Ibrahim, Ezzat, et al., 2002; Tinari, Lattanzio, Natoli, et al., 2006) reported HER2 assays were scored by assessors blinded to patient and tumor characteristics and treatment outcomes. Patients in each study were treated homogeneously, and each series, but not the randomized, controlled trial (Learn, Yeh, McNutt, et al., 2005), reported on all enrolled patients. Follow-up was not an issue for any study on neoadjuvant therapy, since the outcome of interest was pathologic responses at resection. Reporting of methodologic details for multivariate analyses was inadequate in all studies.

Three studies on chemotherapy for advanced or metastatic breast cancer. Each was a secondary analysis from a randomized, controlled trial designed to compare outcomes of treatment regimens in populations not selected or stratified for HER2 status, and each published earlier reports comparing outcomes by treatment arm for all randomized patients. One randomized, controlled trial (n=474, Harris, Broadwater, Lin, et al., 2006; CALGB 9342) randomized patients with stage IV or inoperable disease undergoing first- or second-line therapy to three different doses of paclitaxel. The analysis of outcomes by HER2 status included 35 percent of originally randomized patients, and pooled data across all three doses. Thus, Harris and co-workers (2006) was considered a single-arm study in this systematic review.

A second randomized, controlled trial (n=326, Di Leo, Chan, Paesmans. et al., 2004) randomized patients to doxorubicin alone (A) or docetaxel alone (T). Eligibility required patients to have metastatic disease and to have failed prior CMF (either as adjuvant therapy or for metastasis), but no prior exposure to either of the randomized drug therapies. The analysis by HER2 status included 54 percent of originally randomized patients. The third randomized, controlled trial (n=516, Konecny, Thomssen, Luck, et al., 2004) randomized patients to first-line therapy for metastatic disease with either epirubicin plus cyclophosphamide (EC) or epirubicin plus paclitaxel (ET). Up to one prior hormonal therapy for metastasis was permitted, with patients stratified by prior hormonal therapy. The analysis by HER2 status included 53 percent of originally randomized patients.

Evidence hierarchy. As shown in Section 3 of Table 11, no studies on advanced or metastatic disease reported evidence of the two highest categories. Two randomized, controlled trials (Di Leo, Chan, Paesmans. et al., 2004; Konecny, Thomssen, Luck, et al., 2004) reported post-hoc multivariate subgroup analyses. The third study, a pooled analysis across trial treatment arms (Harris, Broadwater, Lin, et al., 2006) only reported a univariate analysis.

Study quality assessment. Section 3 of Table 12 shows that each of the three included studies on HER2 status as a predictor of chemotherapy outcomes for advanced or metastatic breast cancer was designed prospectively, but none reported a prespecified hypothesis for the effect of HER2 status on outcomes. Each study included a large, well defined, and representative study population, adequately described the HER2 assays and thresholds they used to classify patients' HER2 status, and treated patients in each study arm homogeneously. Only two of three (Harris, Broadwater, Lin, et al., 2006; Di Leo, Chan, Paesmans. et al., 2004) reported blinding HER2 assessors to patient and tumor characteristics and to treatment outcomes. Each omitted 15 percent or more of enrolled patients from the analysis of outcomes by HER2 status, and each omitted key methodologic details on their multivariate analyses from the published reports. Long-term follow-up was available in only one study (Harris, Broadwater, Lin, et al., 2006), and one did not report the median duration of follow-up (Konecny, Thomssen, Luck, et al., 2004).

Patient Characteristics

Table IIIa-B

Patient Characteristics
StudyAgeRace (%)Disease StageDisease StagePerformance StatusHormone Receptor Status
Adjuvant Chemotherapy
Yang et al. 2003 rec. # 8840 (of n=107 tested for expression of various markers) mnGrp1Grp2Scalenot reported
md51.9 yrsBIT <3cm31 (33%)ER+not reported
rng33–77 yrsWIIaT 3–5cm39 (41%)PR+
sdHIIbnot reportedT3 >5 cm24 (26%)
<50 yrs45 (47.9%)A100%IIIaN-41 (38%)
≥50 yrs49 (52.1%)OIIIbN+66 (62%)
IV
Gusterson et al. 2003; rec. # 43690 760 node- pts randomized to periop CMF vs no adj. Tx HER2+HER2-Grp1Grp2Grp1Grp2HER2+HER2-Scalenot reportedHER2+HER2-
mnBIMn TER+36%51%
mdWnot reportedIIaT sizeER-41%32%
rngHIIbnot reported≤2cm41%57%unk23%17%
sdAIIIa>2cm53%40%PR+24%37.5%
menopausal status:OIIIbunk6%3%PR-50%37.5%
pre-52.5%53%IVN0100%100%unk26%25%
post-47.5%47%
Gusterson et al. 2003; rec. # 43690 746 node+ pts randomized to perioperative vs prolonged CMF HER2+HER2-Grp1Grp2Grp1Grp2HER2+HER2-Scalenot reportedHER2+HER2-
mnBIMn TER+56%28%
mdWnot reportedIIaT sizeER-32%59%
rngHIIbnot reported≤2cmnot reportedunk12%13%
sdAIIIa>2cmPR+62%36%
menopausal status:OIIIb# positive nodes:PR-22%45%
pre-50%60%IV1–3+51%57%unk16%19%
post-50%40%≥449%43%
Moliterni et. al. 2003; rec. # 10210 RCT; CMF (Grp 1) versus CMF→A (Grp 2) Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2Scalenot reportedGrp1Grp2
mnBIMn TER+59%52%
mdnot reportedWnot reportedIIaT stage distribution not reportedER-34%39%
rngHIIbnot reported~65% <2.1 cm diam.unk7%9%
sdAIIIaN0PR+53%53%
<51 yr69%67%OIIIbN1100%100%PR-38%34%
IVN2unk9%13%
Colozza et al. 2005; rec. # 3820 RCT; CMF (Grp 1) vs epirubicin (Grp 2); n=133 each tested for HER2 status Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2Scalenot reportedGrp1Grp2
age (years):BItumor diameter (cm):ER+56%55%
<40129Wnot reportedIIa≤245%46%ER-41%44%
40–503240HIIbnot reported2–550%48%unk4%2%
>505651AIIIa>51%0%PR+63%63%
menopausal status:OIIIbunk5%6%PR-33%35%
pre5353IVN020%23%unk4%2%
post4747N1–359%52%
N4–921%26%
Pritchard et al. 2006; rec. # 1760 RCT; CMF versus CEF; n=163 FISH+ (Grp 1); n=465 FISH- (Grp2) Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2Scalenot reportedGrp1Grp2
age (%)BIT135%40%ER+56%62%
≤29 yr41Wnot reportedIIaT252%49%ER-35%27%
30–392722HIIbnot reportedT35%5%unk9%12%
40–495460AIIIa# positive nodes:PR+not reported
≥50 yr1517OIIIb000
all pre-menopausal; ineligible if post-menopausalIV1–357%63%
4–1036%31%
≤107%7%
Knoop et al. 2005; rec. # 3450 RCT (n=773); CMF (Grp 1; n=421) vs CEF (Grp 2; n=352) Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2Scalenot reportedGrp1Grp2
age in years (%)BIT size, cm (%)(%)(%)
<4016.4Wnot reportedIIa0–242.439.3ER+27.125
40–4947.6HIIbnot reported2.1–549.552.4ER-66.768.2
50–5922.0AIIIa>58.18.3PR+not reported
60–6914.0OIIIb# positive nodes (%)PR-
menopausal statusIV035.637.8
pre69.868.51–333.329.5
post30.231.5>331.332.7
Dressler et al. 2005, rec. # 4280; Thor et al. 1998, rec. # 40880; Grp 1, n=542, Dressler analysis; Grp 2, n=469, rest of CALGB 88693Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2Scalenot reportedGrp1Grp2
mn50.6 yr50.4 yrBIMn T2.962.86(%)(%)
mdWnot reportedIIa(cm)ER+68.264.8
rngHIIbnot reportedMn #4.624.68
pre-40.740.3AIIIaN+PR+59.155.7
OIIIb
IV
Del Mastro et al. 2004, 2005; rec # 48020 Grp 1: n=731, HER2 known Grp 2: n=483 HER2 unknown Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2Scalenot reportedGrp1Grp2
md5454BIT147.1%52.6%ER+54%49%
rng25–7026–70Wnot reportedIIaT246.2%42.2%ER-43%38%
<5035.8%43.1%HIIbnot reportedT3–45.3%4.4%ER?3%13%
50–5934.7%35.6%AIIIaT?1.4%0.8%PR+42%36%
>5929.5%21.3%OIIIbN+62.3%67.7%PR-50%44%
IVN-37.6%32.3%PR?8%20%
Tanner et al. 2006; rec. # 1820 (n=391 tested for HER2 status; 180 from FEC arm + 211 from CTCb arm) HER2+HER2-Grp1Grp2Grp1Grp2nHER2+HER2-Scalenot reportedHER2+HER2-
<50 years of age:BItumor size:ER+ and/or PR+:
n=22730.8%69.2%Wnot reportedIIa<2 cm12627%73%yes (210)22%78%
≥50 years of ageHIIbnot reported2–5cm21336%64%no (148)49%51%
n=16435.4%64.6%AIIIa>5 cm3730%70%unk (33)27%73%
OIIIbunk1547%53%
IV# positive nodes:
5–76834%66%
8–910727%73%
≥1021635%65%
Hayes et al. 2007; rec. # 47610 Grp1, n=643 Grp2, n=679 each is random mix of patients from 6 RCT arms4Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2Scalenot reportedGrp1Grp2
age, years:B8%9%Itumor size (cm):ER+57%62%
<4020%20%W84%84%IIa≤233%35%ER-43%38%
40–4940%38%H5%4%IIbnot reported>266%64%PR+not reported
50–5927%30%A2%2%IIIaunk<1%<1%
≥6012%12%O1%1%IIIb# positive nodes:
menopausal status:IV1–348%46%
pre61%61%4–940%43%
post39%39%≥1012%11%
Martin et al. 2005; rec # 47650 Grp 1: DAC, n=745 Grp 2: FAC, n=746 Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2ScaleGrp1Grp2Grp1Grp2
mnBIT140%43%100% had Karnofsky score ≥80%ER+ &/or
md4949Wnot reportedIIaT252%51%PR+76%76%
rng26–7023–70HIIbnot reportedT38%6%menopausal status:
sdAIIIaN000pre:56%55%
OIIIb1–3N+63%62%post:44%45%
IV≥4 N+37%38%
Neoadjuvant (Pre-operative) Chemotherapy
Learn et al. 2005; rec. # 47640 AC, n=50 AC+D, n=47 AC→D, n=47 pooled data on n=142 evaluated for clin. response mn48 yrsBI23.6%clinical tumor diameterScalenot reportedOf n=121 with biopsy specimens available for IHC:
md47 yrsWIIa39.6%≤2 cm28.2%ER+60.3%
rng27–73 yrsH71%IIb30.6%>2-≤5 cm47.2%ER-39.7%
sdAIIIa6.3%>5 cm24.6%PR+57.9%
OIIIb0N061.3%PR-42.1%
not reported29%IV0N138.7%
N≥20
Arriola et al. 2006; rec # 950 prospective single-arm series; n=232 mn47 yrsBIT230%Scalenot reportedER+67%
mdWIIaT370%ER-29%
rngHnot reportedIIbnot reportedN060%unk4%
sdAIIIaN140%PR+52%
OIIIbPR-43%
IVunk5%
Park et al. 2003; rec # 9960 retrospective single-arm series; n=67 yearsBItumor size (cm):
<5082%WIIa5–1091%Scalenot reportedER+46%
≥5018%Hnot reportedIIbnot reported>109%ER-54%
0IIIaPR status not reported
IV
Zhang et al. 2003; rec # 9820 retrospective single-arm series; n=97 md44.5 yrBIT113%Scalenot reportedER+65%
rng25–74 yrWnot reportedIIaT253%PR+56%
sdHIIbnot reported≥T334%
≥5044%AIIIaN-33%
<5056%OIIIbN+67%
IV
Tulbah et al. 2002; rec # 11560 HER2+HER2-BHER2+HER2-HER2+HER2-Scalenot reportedHER2+HER2-
age (yr)Wnot reportedI00T237ER+1216
≥502027HIIa13T3916ER-911
>5025AIIb69T4109unk15
menopausal status:OIIIa510N089PR+1111
pre2025IIIb1010N11218PR-1016
post27IV00N225unk15
Tinari et al. 2006; rec # 2300 retrospective single-arm series; n=77 mnBItumor size (cm):ER+62%
md46.1 yrsWnot reportedIIa2–575%Scalenot reportedER-38%
rng25.5–73.7 yrsHIIbnot reported>525%PR+45%
sdAIIIaPR-55%
OIIIb
IV
Chemotherapy for Advanced or Metastatic Disease
Harris et al 2006; rec. # 390 AllAllAllAllScaleAllAll
mnB20.6IT1ECOG performance status of 0,1,2=100ER+ and/or PR+=58
md54.9WIIT2
rngHIIInot reportedT3
sdAIVT4
preOUnkN0
postN1
N2
N3
median # mets: 1
Di Leo et al 2004; rec. # 5970 Grp 1: A, n=91 Grp 2: T, n=85 patients with tumor blocks tested for HER2 status Grp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2ScaleGrp1Grp2Grp1Grp2
mnBnot reportedI≥3 sites46%51%KarnofskyER+not reported
md54 yr51yrWIIVisceral Involvement79%76%all w specimens:PR+not reported
rngHIII60–70:15%15%reported (data not shown) other factors similar in HER2 status subgroups of each arm
sdAIV100%100%≥8085%85%
preOUnkHER2+ subgroup:
post60–70:33%0
Konecny et al 2004; rec. # 6740 Grp 1: EC, n=137 Grp 2: ET, n=138 data are for subgroups with known HER2 statusGrp1Grp2Grp1Grp2Grp1Grp2Grp1Grp2ScaleGrp1Grp2Grp1Grp2
Mn5555BINuclear GradeKarnofskyER+52.660.9
mdWNot reportedII12.22.9>60100100ER-37.232.6
rng31–7429–75HIII241.637.0Prior adj chemounk10.26.5
sdAIV100%100%338.746.3Yes40.232.6PR+48.949.3
preOUnkUnk17.513.8No59.165.9PR-40.142.7
post# of met sitesunk0.71.5unk11.08.0
135.831.9Prior palliative hormone therapy
221.221.0
≥342.342.0Yes14.613.8
Unk0.75.1No84.786.2
unk0.70
3

Also showed patient populations were similar in 3 CAF dose arms (high, moderate, low); data not abstracted here.

4

Also reported data comparing groups 1 and 2 with 1799 patients from CALGB 9344 not included in biomarker analysis; data showed similar baseline characteristics and 5-year outcomes.

*

p<.05;

characteristics of subset with biomarker data, n=165, similar to those of patients w/o biomarker measurements, n=299

Table IIIa-C

HER2 Measurement Methods
StudyAssays (Name)Criteria for PositivityTest Results (%)Comments
Adjuvant Chemotherapy
Yang et al. 2003 rec. # 8840 FISHnot doneFISHnot donePos36%HER2+ = IHC3+ by DAKO scoring pre-ASCO/CAP
IHCNeomarker antibodyEquiv0
Neg64%
IHCstrong & complete membrane staining in >10% of tumor cells3+
2+
1+
0
Gusterson et al. 2003; rec. # 43690 FISHnot doneFISHnot donePos
IHCICR12 monoclonal antibodyEquiv
Neg
IHCstrong & complete membrane staining at dilution shown to give + signal if ≥3 copies of HER2 genePos16% of 760 node- pts; 19% of 746 node+ pts
Equivnone
Neg84% of 760 node- pts; 81 % of 746 node+ pts
Moliterni et. al. 2003; rec. # 10210 RCT; CMF (Grp1) vs CMF→A (Grp 2) FISHnot doneFISHnot donePos
IHCCB11 antibodyEquiv
Neg
IHCstrong membrane staining found equivalent to 3+ by HercepTestPosGrp1:18.2%;Grp 2:16.2%
Neg75.6%73.3%
ND6.2%10.5%
Colozza et al. 2005; rec. # 3820 RCT; CMF (Grp 1) vs epirubicin (Grp 2); n=133 each tested for HER2 status FISHnot doneFISHnot done
IHCCB11 antibody and HercepTestIHC≥50% CB11+HER+Grp 1:28%Grp2:41%
≤50% CB11+HER-41%36%
CB11 negativeHER-31%23%
HercepTest using3+7%9%
DAKO scoring system2+7%9%
1+10%11%
075%71%
Pritchard et al. 2006; rec. #1760 RCT; CMF versus CEF; FISHPathVysion kitFISHHER2/CEP17 ≥2.00Pos163 (26%)also reported concordance rates between the different assays used
IHCCB11 and TAB 250 antibodies (results reported separately from each antibody assay)Neg465 (74%)
PCRas described by O'Malley et al. 2001; rec. #13790)IHCcomplete membrane staining, score ≥5 on Allred semi-quantitative scaleCB11+124 (20%
CB11-510 (80%)PCR+ 195 (31%)
TAB250+116 (18%)PCR- 429 (69%)
TAB250-516 (82%)
Knoop et al. 2005; rec. # 3450 RCT (n=805 tested) FISHpharmDxFISHHER2/CEP17 ≥2, as in kit manufacturer's manual; only tested if IHC2+ or 3+ followed instructions in manual for HercepTest kit PosIHC2+: 21.0IHC3+: 89.4
IHCHercepTestEquiv6.28.9
Neg72.81.6
IHC3+30.6
2+10.1(IHC3+ or FISH+) = HER2+: 32.7%
1+32.7HER2-: 67.3%
026.7
Dressler et al. 2005, rec. # 4280; Thor et al. 1998, rec. # 40880 FISHPathVysion kitRCT arm:hiqh-dosemod-doselow-dosetotal
IHCCB11 (n=346) or A0-11-854 (n=177) antibodiesFISHHER2/CEP17 >2Pos30 (5.7%)31 (5.9%)30 (5.7%)91 (17%)
Neg149 (28.4%)136 (26.0%)148 (28.2%)433 (83%)
PCRdifferential PCR assay as described in Thor et al. 1998IHC≥50% of invasive cells stained by antibodyPos44 (8.4%)43 (8.2%)40 (7.7%)127 (24%)
Neg134 (25.6%)124 (23.7%)138 (26.4%)396 (76%)
PCR“unequivocal amplification relative to ...normal...and amplified standard controls”Pos30 (6.1%)31 (6.3%)30 (6.1%)91 (18%)
Neg131 (26.7%)125 (25.5%)144 (29.3%)400 (82%)
Del Mastro et al. 2004, 2005; rec. # 48020 FISHnot doneFISHnot donePos
IHCCB11 antibodyNeg
FEC14 (n=370)FEC21 (n=361)
all slides scored by one pathologist, blinded to treatment arm & outcomeIHC3+ score on Dako scale: 103 of 731 (14%) with specimens available for assay3+50 (13.5%)53 (14.7%)
2+24 (6.5%)23 (6.4%)
1+19 (5.1%)20 (5.5%)
0277 (74.9%)265 (73.4%)
Tanner et al. 2006; rec. # 1820 data for n=180 from FEC arm tested for HER2 status FISHnot doneCISH≥6 copies in >30% of invasive carcinoma cells or ratio >2, HER2/CEP17Pos56 (31%)
CISHZymed probes (digoxigenin-labeled)Equiv0
IHCnot doneNeg124 (69%)
IHC3+
2+
1+
0
Hayes et al. 2007; rec. # 47610 FISHPathVysion kitFISHHER2/CEP17 ≥2.00Posproportions of HER2+ and HER2- patients not reported for any assay method
IHCCB11 antibody and HercepTestEquiv
Neg
IHCCB11: HER2+if ≥50% of breast cancer cells were stained; Herceptest: as in Dako manual3+
2+
1+
0
Martin et al. 2005; rec # 47650 FISHnot reportedFISHHER2/CEP17 ≥2.00Pos319 (21.4%; 20.8%, DAC arm; 22.0% FAC arm)
IHCCB11 antibody (only for 12 patients)Neg943 (63.3%; 63.8%, DAC arm; 62.7% FAC arm)
???229 (15.4%; 15.4%, DAC arm; 15.3% FAC arm)
IHCnot reported
Neoadjuvant (Pre-operative) Chemotherapy
Learn et al. 2005; rec. # 47640 n=104 classified for HER2 status FISHnot reportedFISHnot reported
IHCTAB250 antibody (Zymed; South San Francisco, CA)HER2 Pos41 (39.4% of those tested)
FISH performed on all specimens with “borderline” HER2 IHC scoresIHCnot reportedHER2 Neg63 (60.6% of those tested)
Arriola et al. 2006; rec # 950 n=223 tested by IHC/FISH initial algorithm & by CISH FISHOncor/Ventana Inform kitCISH>5 copies or ratio>2 for>5 copiesratio>2
CISHZymed probe and Spot-Light kitHER2/CEN17Pos18%14%
IHCCB11 antibody and HercepTestNeg82%86%
algorithm: CB11 first, then HercepTest for negatives only, then FISH for discordant IHC results; positives by initial algorithm tested byCISHIHC/FISH initial algorithm: CB11+ if complete membrane staining in >10% of cells; HercepTest+ if 2+ or 3+; FISH+ if >4 signals/cellPos19%
Neg 81%
Park et al. 2003; rec # 9960 FISHnot doneFISHnot done
CISHZymed SPOT-Light HER2 probe, digoxigenin-labeledCISHHER2 gene copy # >4, or large gene copy cluster in >50% of cancer cell nudePos46%
Neg54%
IHCnot doneIHCnot done
Zhang et al. 2003; rec # 9820 FISHPathVysion kitFISHgene copy ratio >2.0, HER2/chromosome 17 centromerePos13%overall, 28% (n=28) HER2+ defined as 3+ by IHC or FISH+; 72% (n=69) HER2-
IHCAB8 Neomarker antibodyNeg36%
untested51%
n=75 analyzed by IHCIHCstrong membrane staining in ≥10% of tumor cells3+23%
n=48 analyzed by FISH2+9%
n=97 all patients in study1+10%
035%
untested23%
Tulbah et al. 2002; rec # 11560 n=54 tested FISHnot doneFISHnot donePos
IHCHercepTestEquiv
Neg
IHCscored 0–3+, as in Dako kit guide; for analysis of response, only 3+ was considered HER2+3+22 (41%)
2+12 (22%)
1+8 (15%)
012 (22%)
Tinari et al. 2006; rec # 2300 retrospective single-arm series; n=77 FISHnot reportedFISHnot reported; used only if HercepTest scored 2+Pos20 (26%)
IHCHercepTest (Dako)Equiv0
Neg57 (74%)
IHCscored 0–3+, as in Dako kit guide; positive if IHC scored 3+ or if FISH+ and IHC 2+3+
2+
1+
0
Chemotherapy for Advanced or Metastatic Disease
Harris et al 2006; rec. # 390 FISHVysis PathVysion kit (Vysis Inc, Downers Grove, IL)FISHRatio of HER2 to CEP17 signal ≥ 2.0.Pos26Cohen's kappa = 83.0% (SE5.3%) for FISH vs CB11; 72.0%(SE6.2%) for Hercep-Test (0–1 vs 2–3) vs FISH; 79.2%(SE6.0%) for Hercep-Test (0–2 vs 3) vs FISH; 70.0%(SE6.3%) for Hercep-Test (0–1 vs2–3) vs CB11; 84.2%(SE5.4%) for Hercep-Test (1–2 vs 3) vs CB11.
Neg74
IHCMonoclonol antibody CB11 (Biogenex, San Ramon, CA); HercepTest (Dako Corp, Carpinteria, CA)IHCCB11 : moderate to strong intensity staining in ≥10% of invasive carcinoma cells.
Pos20
Equiv
Neg80
Herceptest score of 3+; i.e., complete membrane staining of >10% tumor cells040
128By CB11, 9% of African American women are HER2+ vs 20% of Caucasian women (p=0.08).
211
321
Di Leo et al 2004; rec. # 5970 IHCCB-11 (Novocastra, Newcastle, UK)Grp1Grp2
IHC →FISH: FISH done if IHC stained membranes in ≥ 1% of invasive cells; HER2+ if signal ratio, HER2/CEP17 ≥2Pos16%25%
FISHSpectrum Orange HER-2/Spectrum Green CEP17 (PathVysion, Vysis, Downers Grove, IL)unknown14%16%
Neg69%59%
Konecny et al 2004; rec. # 6740FISHPathVision HER-2 Neu & CEP17 probes (Vysis, Downers Grove, IL)Grp1Grp2
FISH≥2 HER-2/neu genes per Chromosome 17 CentromerePos35.834.8
Equiv
Neg64.265.2
IHCnot done
Eleven studies on postsurgical adjuvant chemotherapy. Although all investigated adjuvant chemotherapy, the eleven studies varied with respect to their patient groups' distributions of baseline characteristics and risk factors for recurrent disease (Appendix Tables IIIa-B and IIIa-C *, Table 10). Only a subset of these studies compared the HER2 positive and negative subgroups for baseline characteristics and risk factors. Also, only a subset of the nine randomized, controlled trials compared patients included in the analysis by HER2 status with those excluded because tissue blocks were missing or unsuitable.

Studies on CMF. Of the two CMF studies, the retrospective series by Yang, Klos, Zhou, et al. (2003) pooled data for node-negative and node-positive patients, groups that Gusterson, Gelber, Goldhirsch, et al. (2003) randomized separately to different treatment arm pairs. Yang, Klos, Zhou, et al. (2003) only reported baseline characteristics and risk factors for all patients analyzed. Gusterson, Gelber, Goldhirsch, et al. (2003) compared HER2-positive versus HER2-negative patients separately for the node-positive and node-negative groups, but did not compare those with known HER2 status versus those lacking tissue blocks for HER2 assays. In node-negative patients, HER2 positivity was statistically significantly associated with larger tumor size, hormone-receptor negativity, and higher tumor grade. In node-positive patients, HER2 positivity was statistically significantly associated with menopausal status, hormone-receptor negativity, and higher tumor grade.

Studies on regimens with versus without an anthracycline. Three (Colozza, Sidoni, Mosconi, et al., 2005; Pritchard, Shepherd, O'Malley, et al., 2006; Tanner, Isola, Wiklund, et al., 2006) of five studies comparing adjuvant regimens with versus without an anthracycline compared baseline characteristics of HER2 positive and negative subgroups. Three (Colozza, Sidoni, Mosconi, et al., 2005; Knoop, Knudsen, Balslev, et al., 2005; Tanner, Isola, Wiklund, et al., 2006) explored whether subgroups tested for HER2 status were similar to the total study population or the subgroup not tested. Two trials (Moliterni, Menard, Valagussa, et al., 2003; Pritchard, Shepherd, O'Malley, et al., 2006) determined HER2 status on 92 percent or 89 percent, respectively, of the patients originally randomized and did not report comparisons to all or omitted patients. Each trial's full treatment arms were well balanced for baseline characteristics and prognostic factors.

Moliterni, Menard, Valagussa, et al. (2003) did not report data comparing baseline factors by HER2 status. All patients in this trial had one to three positive nodes, and approximately 65 percent had tumors smaller than 2.1 cm in diameter. Colozza, Sidoni, Mosconi, et al. (2005) reported that treatment arms were well balanced, whether comparing all patients randomized or only those tested for HER2 status. However, significantly more patients randomized to epirubicin than to CMF were HER2 positive (41 percent versus 28 percent, p=.03). Progesterone receptor positivity was the only factor statistically significantly associated with HER2 positivity. This trial included node-positive and node-negative patients (4 or more positive nodes in less than 25 percent), and approximately 45 percent with tumors 2 cm or smaller in diameter.

Pritchard, Shepherd, O'Malley, et al. (2006) reported baseline characteristics of patients tested for HER2 status were similar to those of all randomized patients, but did not show data for this comparison. They showed data comparing FISH-positive and FISH-negative subgroups; except for a shift toward younger age in the FISH-positive subgroup, there were no significant differences. Just over half the patients in this trial had T2 or T3 tumors, all had positive lymph nodes, with four or more positive nodes in 37 percent and 43 percent of the FISH-negative and FISH-positive groups, respectively. Knoop and co-workers (2005) reported that among all patients tested for HER2 status, treatment arms were well balanced for prognostic factors. However, they did not report comparing the HER2-positive versus HER2-negative patients, either by treatment arms or across treatments. Tumors were larger than 2 cm diameter in approximately 60 percent of patients, and approximately 30 percent had four or more positive nodes. Tanner, Isola, Wiklund, et al. (2006) reported (but did not show data) that baseline characteristics of all patients tested for HER2 status did not differ from those of the entire trial cohort. They showed that baseline characteristics were similar for HER2-tested subgroups from each arm. However, the AuSCS arm was excluded from this review, and data were not reported comparing baseline characteristics of HER2-positive versus HER2-negative patients from the FEC arm.

Studies on dose or dose intensity of anthracycline-based regimens. Studies from randomized, controlled trials that compared dose (Dressler, Berry, Broadwater, et al., 2005) or dose intensity (Del Mastro, Bruzzi, Nicolo, et al., 2005) of anthracycline-based regimens reported baseline characteristics and prognostic factors of patients with known HER2 status were similar to those of patients omitted from the analyses, since HER2 status was unknown. Dressler and co-workers (2005) did not report data comparing baseline characteristics or prognostic factors of HER2-positive versus HER2-negative patients. Del Mastro and co-workers (2005) found a greater proportion of HER2-positive than HER2-negative patients lacking expression of both estrogen and progesterone receptors (62 percent versus 32.5 percent). Other baseline characteristics and prognostic factors were similar between subgroups by HER2 status and between treatment arms.

Studies on regimens with versus without a taxane. One of two studies from randomized, controlled trials on regimens with versus without a taxane compared baseline characteristics and prognostic factors of patient with known HER2 status versus those of patients with unknown HER2 status. The trial comparing paclitaxel versus observation after AC (Hayes, Thor, Dressler, et al., 2007) showed similar baseline characteristics, prognostic factors and overall survival in the two subgroups they randomly selected and tested for HER2 status (n=643 and 679, respectively). These subgroups were also similar to all treated patients (n=3,121), and to all non-tested patients (n=1,799). Tumor diameter was 2 cm or smaller in approximately 35 percent, and approximately 54 percent had 4 or more positive nodes. The randomized, controlled trial that compared TAC versus FAC (Martin, Pienkowski, Mackey, et al., 2005) only compared patient characteristics and prognostic factors by treatment arm for all patients randomized. Neither study compared HER2-positive versus HER2-negative patients, either pooled across treatments or by treatment arm.

Six studies on preoperative neoadjuvant chemotherapy. The randomized, controlled trial on neoadjuvant therapy (Learn, Yeh, McNutt, et al., 2005) did not compare treatment arms or patient subgroups by HER2 status (neither known versus unknown nor positive versus negative) with respect to baseline characteristics or prognostic factors. This study only reported patient and tumor characteristics for all randomized patients

Only one (Tulbah, Ibrahim, Ezzat, et al., 2002) of the five included series compared baseline characteristics and prognostic factors for HER2-positive and HER2-negative subgroups. Across all five studies, approximately 55 percent to 65 percent of included patients were positive for estrogen receptors, and 45 percent to 55 percent were positive for progesterone receptors. However, their study samples varied somewhat with respect to tumor size and number of positive nodes. The series reported by Arriola, Moreno, Varela, et al. (2006) included 30 percent T2 and 70 percent T3 tumors, with 60 percent of patients node negative and 40 percent N1. Most patients (91 percent) in the series reported by Park, Kim, Lim, et al. (2003) had tumors between 5 and 10 cm in diameter. However, they did not report nodal status. Zhang, Yang, Smith, et al. (2003) include a few patients (13 percent) with T1 tumors, and approximately 33 percent node-negative patients. Most patients in the Tulbah, Ibrahim, Ezzat, et al. (2002) series had T3 or larger tumors, and approximately 55 percent had N1 disease. They reported generally well-balanced HER2-positive and HER2-negative subgroups. Finally, 75 percent of patients in the Tinari, Lattanzio, Natoli, et al. (2006) series had tumors with diameters between 2 and 5 cm; number of positive nodes was not reported.

Three studies on chemotherapy for advanced or metastatic breast cancer. Each of three included randomized, controlled trials reported that baseline characteristics and prognostic factors for the subgroup tested for HER2 status were similar to those of patients not tested. However, none compared HER2-positive versus HER2-negative subgroups, either separately by treatment arm or across arms.

Harris, Broadwater, Lin, et al. (2006) reported the only statistically significant difference between patients tested for HER2 (and other biomarkers) and those not tested was a shorter disease-free interval among those tested (19 versus 31 months, p=.0003). Investigators attributed this difference to discarding of tissue blocks after 10 years, thus a shorter interval from diagnosis to metastasis for those with blocks remaining. Hormone-receptor status (positive in 58 percent) and median number of metastatic sites (one) were the only prognostic factors reported among those tested for HER2 status. The analysis by HER2 status pooled patients across three trial arms randomized to different paclitaxel doses.

Di Leo, Chan, Paesmans, et al. (2004) showed the subgroups tested for HER2 status from each treatment arm were similar to each other and to the untested patients. Approximately half the included patients had three or more sites of disease, and more than three fourths had visceral involvement. They did not report hormone receptor status.

Konecny, Thomssen, Luck, et al. (2004) reported no statistically significant differences in baseline characteristics or prognostic factors between groups tested for HER2 and those not tested from each treatment arm compared separately. However, the HER2-positive and HER2-negative groups were not directly compared, either separately by treatment arm or pooled across arms.

Results, Key Question 3a

Eleven studies on postsurgical adjuvant chemotherapy

Table 13

Summary time to event outcomes, KQ3a
StudyTime to Event Outcomes
Adjuvant chemotherapy for resected early breast cancer
Yang et al., 2003 CMF; single-arm series OutcomeGrpNMed (mos)1 yr2.5 yr3 yr4 yr5 yrTestpHR (95%CI)Comments p=.002 in stratified log rank that adjusted for nodal status
DFSHER2+346–7 years~60%53%log rank <.01
HER2-60not reached~90%86%
Gusterson et al., 2003 760 node-neg pts randomized to periop CMF (Tx) or no adj Tx (Cx) OutcomeGrpNMed (mos)2 yr3 yr4 yr5 yr6 yrTestpHR (95%CI)Comments unadjusted univariate analyses; adjusted results also NS unadjusted univariate analyses; adjusted results also NS
OS (HER2+)Tx64not reached76±5CoxNS1.15 (0.54–2.46)
Cx54not reached79±6prop hazards
OS (HER2-)Tx436not reached85±2CoxNS1.04 (0.68–1.61)
Cx206not reached87±2prop hazards
DFS(HER2+)Tx64not reached~84%~68%~65%~62%61±6CoxNS1.22 (0.66–2.25)
Cx54not reached~86%~75%~73%~70%68±7prop hazards
DFS (HER2-)Tx436not reached~90%~85%~80%~77%71±2CoxNS0.82 (0.61–1.09)
Cx206not reached~85%~77%~72%~70%68±3prop hazards
Gusterson et al., 2003 746 node-pos pts randomized to prolonged (Tx) or periop (Cx) CMF OutcomeGrpNMed (mos)2 yr3 yr4 yr5 yr6 yrTestpHR (95%CI)Comments unadjusted univariate analyses; adjusted analyses gave similar results unadjusted univariate analyses; adjusted analyses gave similar results
OS (HER2+)Tx85not reported46±6CoxNS1.15 (0.62–1.54)
Cx55not reported40±7prop hazards
OS (HER2-)Tx406not reached71±2Cox.010.69 (0.52–0.92)
Cx200not reached61±4prop hazards
DFS(HER2+)Tx85~36~60%~50%~43%~40%38±5CoxNS0.77 (0.51–1.16)
Cx55~24~50%~42%~35%~30%29±6prop hazards
DFS (HER2-)Tx406>72~80%~70%~63%~57%52±3Cox<.00010.57 (0.46–0.72)
Cx200~40~63%~55%~45%~40%36±4prop hazards
Moliterni et al., 2003 RCT; CMF→A (Tx) vs CMF (Cx) OutcomeGrpNMed (mos)2 yr4 yr6 yr8 yr10 yrTestpHR (95%CI)Comments HR=0.48, p=.052 for treatment × HER2 interaction term HR=0.68, p not signif. for treatment × HER2 interaction term
OS (HER2+)Tx45>192~92%~83%~73%~68%64%Cox0.61 (0.32–1.16)
Cx50~170~90%~80%~63%~57%54%model
OS (HER2-)Tx203>192~97%~90%~86%~83%76%Cox1.26 (0.89–1.79)
Cx208>192~97%~94%~90%~83%77%model
RFS(HER2+)Tx45>192~85%~75%~62%~58%55%Cox0.83 (0.46–1.49)
Cx50~102~85%~65%~62%~52%46%model
RFS (HER2-)Tx203~162~90%~80%~65%~60%56%Cox1.22 (0.91–1.64)
Cx208>192~90%~80%~74%~65%59%model
Colozza et al., 2005 RCT; epirubicin (Tx) vs. CMF (Cx); n=133 each group tested for HER2 status OutcomeGrpNMed (mos)4 yr6 yr% at 8 yr±SDTestComments: CMF HER2+ versus CMF HER2-, p=.024; all other comparisons not statistically significant including epirubicin HER2+ versus epirubicin HER2-, p=0.24. Interaction terms by Cox MVA: for OS: HR=1.61, CI: 0.64–4.01, p not signif. for RFS: HR=1.02, CI: 0.40–2.58, p not signif.
OS (HER2+)Tx54not reached~89%~80%75.8±5.8log
Cx37not reached~77%~70%67.6±7.7rank
OS (HER2-)Tx79not reached~90%~87%84.5±4.1log
Cx96not reached~93%~90%87.4±3.4rank
RFS(HER2+)Tx5460.1±6.9log
Cx3768.6±7.2rank
RFS (HER2-)Tx7965.9±5.4log
Cx9670.3±4.7rank
Pritchard et al., 2006 RCT; CEF (Tx) vs. CMF (Cx); HER2 status by FISH results OutcomeGrpNMed (yrs)2 yr4 yr6 yr8 yr10 yrTestpHR (95%CI)Comments HR=2.04, CI: 1.14–3.65, p=.02 for treatment by HER2 interaction in Cox MVA HR=1.96, CI: 1.15–3.65, p=.01 for treatment by HER2 interaction in Cox MVA
OS (HER2+)Tx75not reached~93%~70%~62%~58%~57%log.060.65 (0.42–1.02)
Cx88~5.3~92%~62%~47%~46%~45%rank
OS (HER2-)Tx237not reached~93%~83%~75%~67%~63%logNS1.06 (0.83–1.44)
Cx228not reached~93%~80%~75%~67%~62%rank
RFS (HER2+)Tx75not reached~77%~67%~58%~57%~56%log.0030.52 (0.34–0.80)
Cx88~2.5~63%~43%~42%~34%~31%rank
RFS (HER2-)Tx237~10~81%~67%~60%~54%~50%logNS0.91 (0.71–1.18)
Cx228~10~81%~64%~58%~54%~50%rank
Knoop et al., 2005 RCT (n=805); CEF (Tx) vs. CMF (Cx) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments HRs and 95% CIs all adjusted by multivariate analysis for T size, nodal & menopausal status; stratified for grade, ER and TOP2A status
OS (HER2+)Tx120Cox.090.73 (0.50–1.05)
Cx143proportional hazards
OS (HER2-)Tx249Cox.230.82 (0.59–1.13)
Cx293proportional hazards
RFS (HER2+)Tx120Cox.100.75 (0.53–1.06)
Cx143proportional hazards
RFS (HER2-)Tx249Cox.100.79 (0.60–1.05)
Cx293proportional hazards
Dressler et al., 2005; Thor et al., 1998; separate survival curves show similar results for HER2 status by IHC, FISH, and PCR; only abstracted data for HER2 by IHC OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments: HR and p data are for interaction of CAF dose with HER 2 status in model for DFS
OS HER2+high44>108~97%~97%93% (86–100)
by IHCmod43~87~93%~66%58% (47–75)
low40~96~90%~66%63% (49–80)
OS HER2-high134~100~93%~80%74% (67–81)
by IHCmod124>108~96%~86%78% (80–92)
low138~100~93%~80%74% (67–81)
DFS HER2+high44>108~97%~90%87% (74–96)multivariate.00030.42 (0.19–0.93)HER2 by IHC
by IHCmod43~36~60%~47%47% (34–64)
low40~66~65%~58%53% (39–71)proportional.0330.92 (0.81–1.04)HER2 by FISH
DFS HER2-high134>108~83%~70%64% (56–73)
by IHCmod124>108~83%~70%65% (57–74)hazards.0430.58 (0.25–1.35)HER2 by PCR
low138~90~78%~63%59% (51–68)
Del Mastro et al. 2004, 2005; Tx = FEC14 Cx = FEC21OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments for all FEC14, HER2+ vs. HER2-: EFS, HR=1.21 (0.65–2.24) p=.54; OS, HR=1.85 (0.88–3.89), p = .103 for all FEC21, HER2+ vs HER2-: EFS, HR=2.07 (1.27–3.38), p=.003; OS, HR=2.47 (1.34–4.57), p=.004
OS (HER2+)Tx50>72~100%~100%~96%~92%89.9%prop hazards.22 0.59 (0.26–1.37)
Cx53>72~98%~89%~85%~81%75.1%
OS (HER2-)Tx320>84~100%~99~96%~95%91.9%prop hazards.34 0.79 (0.49–1.28)
Cx308>84~100%~99%~96%~94%90.7%
EFS (HER2+)Tx50>72~100%~98%~85%79%77.7%prop hazards.092 0.54 (0.27–1.11)
Cx53>72~91%~82%~68%~67%62.5%
EFS (HER2-)Tx320>84~100%~93%~90%~85%81.5%prop hazards.570.91 (0.65–1.27)
Cx308>84~98%~93%~87%~83%80.9%
Tanner et al., 2006 FEC arm only OutcomeGrpNMed (mos)2 yr3 yr4 yr5 yr6 yrTestpHR (95%CI)Comments only reported statistical comparisons of FEC vs. HDC/AuSCS, not HER2+ vs. HER2- in same arm
OSHER2+56~54~79%~64%~58%~46%~41%not reported
HER2-124>84~94~83%~74%~68%~64%
RFSHER2+56~48~68%~62%~50%~46%~46%not reported
HER2-124>84~84%~74%~67%~66%~65%
Hayes et al., 2007 AC→P (Tx) vs. AC alone (Cx) HER2 status based on CB11 IHC test results; OutcomeGrpNMed (mos)3 yr6 yr9 yrTestpHR (95%CI)Comments: total n=1322; HR & p for interaction of of HER2+ status and effect of adding paclitaxel HR & p, as for OS
OS HER2+Txnot reached~87–92%~75–78%~70–78%Cox.010.57
Cx~60–96~70–75%~52–62%~47–49%multivariate regression
OS HER2-Txnot reached~87–92%~76–80%~68–70%
Cxnot reached~85–87%~74–77%~63–66%
DFS HER2+Txnot reached~80–87%~69–72%~62–67%Cox.010.59
Cx~48–60~53–60%~45–50%~45–48%multivariate regression
DFS HER2-Txnot reached~83–87%~70–75%~65–69%
Cx~120–132~80–85%~65–67%~55–60%
Martin et al., 2005 Tx = DAC Cx = FAC OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments K-M DFS curves not shown separately by HER2 status; p values not reported
DFS HER2+Tx155Cox prop hzrds models0.60 (0.41–0.88)
Cx164
DFS HER2-Tx4750.76 (0.59–1.00)
Cx468
DFS HER2Tx1150.72 (0.45–1.17)
UnknownCx114
Neoadjuvant (preoperative) chemotherapy for locally advanced breast cancer
Learn et al., 2005did not report time-to-event outcome
Arriola et al., 2006did not report time-to-event outcomes
Park et al., 2003did not report time-to-event outcomes
Zhang et al., 2003; FAC, n=97 (n=78 also given post-op chemoTx) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments
DFSHER2+2848 (for all patients)~90%~83%~60%~45%not specifiedNSnot reported
HER2-69~90%~80%~70%~60%
Tulbah et al., 2002; OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments if HER2+ = IHC 2+/3+, OS favored HER2- 90% vs. 79% p=.051 if HER2+ = IHC 2+/3+ DFS still not statistically significant (p=.09)
OSHER2+22not reached~95%~79%~66%~66%log.31
HER2-32not reached~97%~97%~72%~72%rank
DFSHER2+2134.5±7.8~88%~75%~75%0log.43
HER2-31(all 52 pts)~92%~83%~52%~52%rank
Tinari et al., 2006;did not report time-to-event outcome by HER2 status
First- or second-line chemotherapy for advanced or metastatic breast cancer
Harris et al., 2006OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yr10yrTestp HR (95%CI)Comments
OSCB11+3011.3Log.14
CB11-12613.1rank
FISH+3710.9Log.26
FISH-10913.1rank
 HercepTest 2+/3+4611.5Log.84
 HercepTest 0/1+10513.2rank
Di Leo et al., 2004 Grp 1: A Grp 2: T OutcomeGrpNMed (mos)6 mos1 yr1.5 yr2 yr2.5 yrTestp HR (95%CI)Comments In full TAX 303 trial, no statistically significant differences between Tx arms with respect to OS or TTP
OS HER2+11510.8~.85~.3No lineNo lineNo lineCox regression.33 1.47(0.68–3.15)
22114.4~.95~.6~.46No lineNo line
OS HER2-16316.9~.8~.72~.5~.3No lineCox regression.07 0.64(0.40–1.03)
25012.6~.8~.6~.32~.280
3 mos6 mos9 mos12 mos15 mos18 mos
TTP HER2+1154.7 ~.75~.4~.25~.15~.150Cox.73 0.88(0.43–1.82)
2217.0 ~.75~.6~.15~.1~0No lineregression
TTP HER2-1635.9 ~.74~.5~.35~.25~.15<.1Cox.22 0.77(0.52–1.16)
2505.0 ~.74~.45~.2~.1<.1No lineregression
PFSTx
Cx
Konecny et al., 2004 Grp 1: EC Grp 2: ETOutcomeGrpNMed (mos) (95%CI)1 yr2 yr3 yr4 yr5 yrTestp HR (95%CI)Comments
OS1
 HER2+4916.4(12.1–20.1)~.65~.3~.25~.25log.010
 HER2-8833.1(20.9–50.6)~.78~.57~.45~.4rank
2
 HER2+4821.4(15.3–27.3)~.74~.45~.25~.1log.463
 HER2-9027.5(17.1–35.2)~.7~.55~.35~.2rank
 HER2+14916.4(12.1–20.1)~.6~.3~.25~.25log.319
24821.4(15.3–27.3)~.7~.4~.25~.1rank
 HER2-18833.1(20.9–50.6)~.78~.58~.43~.4log.292
29027.5(17.1–35.2)~.7~.55~.35~.15rank
PFS1
 HER2+497.1(4.1–9.3)~.2~.08~.08log.010
 HER2-8810.4(6.9–14.9)~.54~.22~.12rank
2
 HER2+4810.5(8.1–11.9)~.35~.1~.05log.584
 HER2-909.6(7.5–11.3)~.35~.15~.08rank
 HER2+1497.1(4.1–9.3)~.2~.08~.08log.116
24810.5(8.1–11.9)~.35~.1~.05rank
 HER2-18810.4(6.9–14.9)~.47~.25~.1log.350
2909.6(7.5–11.3)~.52~.13~.08rank

Table 14

Summary tumor response, KQ3a
StudyTumor Response (%)
Adjuvant chemotherapy for resected early breast cancer
Yang et al., 2003not reported
Gusterson et al., 2003not reported
Moliterni et al., 2003not reported
Colozza et al., 2005not reported
Pritchard et al., 2006not reported
Knoop et al., 2005not reported
Dressler et al., 2005not reported
Del Mastro et al 2004, 2005not reported
Tanner et al., 2006not reported
Hayes et al., 2007not reported
Martin et al., 2005not reported
Neoadjuvant (preoperative) chemotherapy for locally advanced breast cancer
Learn et al., 2005; n=104 classified for HER2 status GrpNpCRORR (cCR+cPR)TestpComments: for ORR data, by multivariate analysis: AC, HER2+ vs. HER2-, p=0.06; AC+D, HER2+ vs. HER2-, p=0.99
HER2+, AC3222%75%logistic regressionNS
HER2+, AC+D922%78%<.05
HER2-, AC3724%51%
HER-, AC+D2624%81%
Arriola et al., 2006GrpNpCRPRSDPDNETest Mann-Whitneyp .03Comments “association of HER2+ with pCR”
all22927
Park et al., 2003GrpNpCRPROR (CR+PR)NR (PD+NE)Test Fisher's exactp .013Comments
HER2+315 (16%)22 (71%)27 (87%)4 (13%)
HER2-36017 (47%)17 (47%)19 (53%)
Zhang et al., 2003GrpNcCR+cPRcNRp .014RR 95%CIpCR+MRDERDp .53RR 95%CI 1.4 0.54–3.67 tests Fisher's exact & asymptotic
HER2+2893%7%1.21.1–1.418%82%
HER2-6978%22%13%87%
Tulbah et al., 2002GrpNpCRPRSDPDNETestpComments also NS if IHC 2+ and 3+ considered HER2+
HER2+216 (29%)NS
HER2-317 (23%)
Tinari et al., 2006GrpNTRORSDPDNETestpOR Comments 5.28 (1.57–19.6)
all7723.4%72.7%3.9%univariate .008 logistic regression
HER2+20
HER2-57
First- or second-line chemotherapy for advanced or metastatic breast cancer
Harris et al., 2006NCR+PR (%)pComments: did not report logistic regression analysis
HER2 by CB110.96
 Pos3023
 Neg12624
HER2 FISH0.70
 Pos3722
 Neg10925
HercepTest0.026
 Pos (2–3)4635
 Neg (0–1)10518
HercepTest0.98
 Pos (3)3023
 Neg (0–2)12123
Di Leo et al., 2004Grp1N%(CR+PR)A versus T OR (95%CI)pComments: By MV logistic regression, treatment × HER2 status OR=3.64, CI: 1.39–9.54 p=0.01; remains SS after adjusting for visceral & Tx × visceral interaction
 HER2+1527HER2+5.50(1.28–23.69).04
 HER2-6335HER2-1.24(0.58–2.68).70
 HER2 unk1331HER2 unk1.25(0.25–6.24)1.00
 All9133All1.72(0.94–3.18).09
 HER2+2167(In full TAX 303 trial, response rates were 48% with docetaxel (n=161), 33% with doxorubicin (n=165), p=0.008)
 HER2-5040
 HER2 unk1436
 All8546
Konecny et al., 2004 Grp1: EC Grp 2: ETGrp1NCR+PR(95%CI)SDPDNETestpComments by MV logistic regression, adjusted Tx*HER2 interaction: p=0.256
HER2+9760(51–70)chi sq.004
HER2-17841(34–49)
Grp1
 HER2+4945(32–60)chi sq.130
 HER2-8833(22–43)
Grp2
 HER2+4876(63–88)chi sq.005
 HER2-9050(39–61)
HER2+
 Grp14945(32–60)chi sq.004by MV logistic regression, OR=3.64 CI: 1.48–8.92, p=0.005
 Grp24876(63–88)
HER2-
 Grp18833(22–43)chi sq.002by MV logistic regression, OR=1.92 CI: 1.01–3.64, p=0.046
 Grp29050(39–61)

Abbreviations: ERD: extensive residual disease; MRD: minimal residual disease; NE: not evaluable; NS: not significant; OR: overall response (cPR + minimal residual disease + PR); ORR: overall response rate; PD: progressive disease; RR: relative risk; SD: stable disease; TR: tumor response (cPR + minimal residual disease);

Studies on CMF. Both studies on CMF reported superior outcomes in HER2-negative compared with HER2-positive patients (see Tables 13 and 14). The Gusterson, Gelber, Goldhirsch, et al. (2003) trial used proportional hazards models to compare hazard ratios (HR) for disease-free (DFS) and overall survival (OS) after no or one cycle of CMF in node-negative patients; each HR was not statistically significant. They also compared multiple versus single cycles of CMF in node-positive patients. Results favored multiple cycles for the HER2-negative subgroup and were statistically significant, but were not significant for HER2-positive patients:
  • OS, HER2- (n=406 multiple; n=200, one): HR=0.69, 95 percent CI: 0.52–0.92; p=.01

  • OS, HER2+ (n=85 multiple; n=55, one): HR=1.15, 95 percent CI: 0.62–1.54; p, NS

  • DFS, HER2- (n=406 multiple; n=200, one): HR=0.57, 95 percent CI: 0.46–0.72; p<.0001

  • DFS, HER2+ (n= 85 multiple; n=55, one): HR=0.77, 95 percent CI: 0.51–1.16; p, NS

The Yang, Klos, Zhou, et al. (2003) uncontrolled series (n=94) reported that at 5 years, DFS in the HER2-negative subgroup was superior to DFS in the HER2-positive subgroup (n=60, 86 percent versus n=34, 53 percent; log rank p<.1; stratified log rank, p=.002 after adjustment for nodal status).

Studies on regimens with versus without an anthracycline. Only one (Pritchard, Shepherd, O'Malley, et al., 2006) of four included randomized, controlled trials comparing regimens with versus without an anthracycline reported superior outcomes with the anthracycline regimen that reached statistical significance for HER2-positive but not HER2-negative patients. Pritchard, Shepherd, O'Malley, et al. (2006) used multivariate analysis (MVA) to test for an interaction of comparative treatment effect with HER2 status. The study compared CEF versus CMF and reported the following results for OS and relapse-free survival (RFS):

  • OS, HER2- (n=237, CEF; n=228, CMF): HR=1.06, 95 percent CI: 0.83–1.44; p, NS

  • OS, HER2+ (n=75, CEF; n=88, CMF): HR=0.65, 95 percent CI: 0.42–1.02; p=.06

  • OS, treatment by HER2 interaction from MVA: HR=2.04, 95 percent CI: 1.14–3.65, p=.02

  • RFS, HER2- (n=237, CEF; n=228, CMF): HR=0.91, 95 percent CI: 0.71–1.18; p, NS

  • RFS, HER2+ (n 75, CEF; n 88, CMF): HR=0.52, 95 percent CI: 0.34–0.80; p=.003

  • RFS, treatment by HER2 interaction from MVA: HR=1.96, 95 percent CI: 1.15–3.65; p=0.01

The other trials reported no statistically significant differences for any subgroups they compared. Moliterni, Menard, Valagussa, et al. (2003) compared CMF alone versus CMF followed by doxorubicin (CMF→A) in HER2-positive (n=50, CMF; n=45, CMF→A) and HER2-negative (n=208, CMF; n=203, CMF→A) subgroups. Confidence intervals spanned 1.00 and HRs were not statistically significant for either outcome (OS, RFS) in either subgroup. With Cox MVA, treatment by HER2 interaction terms were:

  • OS: HR=0.48, p=.052

  • RFS: HR=0.68, p, NS

Colozza, Sidoni, Mosconi, et al. (2005) compared CMF versus epirubicin alone (E), in HER2-positive (n=37, CMF; n=54, E) and HER2-negative (n=96, CMF; n=79, E) subgroups. Log rank analyses of Kaplan-Meier survival curves showed a statistically significant difference in OS at 8 years after CMF favoring HER2-negative over HER2-positive patients: (87.4 +/- 3.4) percent versus (67.6 +/- 7.7) percent, p=.024. All other subgroup comparisons were not statistically significant, and Cox MVA interaction terms for treatment effect by HER2 status also were not statistically significant.

Knoop, Knudsen, Balslev, et al. (2005) compared CMF versus CEF in HER2-positive (n=143, CMF; n=120, CEF) and HER2-negative (n=293, CMF; n=249, CEF) subgroups. For both OS and RFS, hazard ratios from Cox multivariate analyses (stratified by tumor grade, estrogen receptor and TOP2A status; and adjusted for tumor size, nodal and menopausal status) uniformly spanned 1.00 and were not statistically significant for either HER2-positive or HER2-negative subgroups.

The Tanner, Isola, Wiklund, et al. (2006) study showed separate Kaplan-Meier curves for HER2-positive (n=56) and HER2-negative (n=124) subgroups from the tailored FEC arm for both OS and RFS. However, they did not report statistical significance of differences between these HER2 status subgroups (although they reported statistical significance of differences between HER2 status subgroups treated by HDC/AuSCS versus subgroups treated with tailored FEC).

Studies on dose or dose intensity of anthracycline-based regimens. In one of two included studies, multivariate proportional hazards analysis showed statistically significant interaction of anthracycline-based regimen dose or dose-intensity with HER2 status to predict outcome. Dressler, Berry, Broadwater, et al. (2005) compared DFS after high-, moderate-, or low-dose CAF regimens in HER2-positive and HER2-negative subgroups. They reported separate MVAs using FISH, IHC, or PCR to classify patients' HER2 status. Results for DFS at five years comparing high-dose versus low-dose plus moderate-dose CAF subgroups were:

  • HER2/FISH (n=91, HER2+; n=433, HER2-): HR=0.822 (95 percent CI: 0.553–1.220)

  • HER2/IHC (n=127, HER2+; n=396, HER2-): HR=0.834 (95 percent CI: 0.590–1.181)

  • HER2/PCR (n=91, HER2+; n=400, HER2-): HR=0.732 (95 percent CI: 0.507–1.056)

  • HER2/FISH, interaction CAF dose by HER2: HR=0.919 (95 percent CI: 0.814–1.038); p=.033

  • HER2/IHC, interaction CAF dose by HER2: HR=0.418 (95 percent CI: 0.188–0.930); p=.0003

  • HER2/PCR, interaction CAF dose by HER2: HR=0.585 (95 percent CI: 0.253–1.352); p=.043

Investigators stated (but did not report HRs, CIs, or p values) that MVA yielded similar results for statistically significant interaction of CAF dose with HER2 status to predict OS.

Del Mastro, Bruzzi, Nicolo, et al. (2005) compared outcomes after identical doses of FEC administered every 14 days (FEC14) or every 21 days (FEC21). Multivariate proportional hazards analysis showed that interaction terms for HER2 status by randomly assigned treatment (dose intensity or treatment frequency) were not statistically significant for EFS (HR=0.53; p=.12) or OS (HR=0.646; p= .379). HER2 status (HER2-positive, n=103; HER2-negative, n=628) was statistically significant to predict EFS (HR=2.04, p=.005) and OS (HR=2.41, p=.006), while randomly assigned treatment (FEC14, n=370; FEC21, n=361) was not statistically significant to predict either outcome (EFS, HR=0.85, p=.335; OS, HR=0.72, p=.379).

Studies on regimens with versus without a taxane. One of two included studies reported statistically significant interaction of HER2 status with added paclitaxel to predict treatment outcome. Hayes, Thor, Dressler, et al. (2007) compared outcomes with versus without paclitaxel (following AC) in HER2-negative and HER2-positive subgroups, separately for each of two groups they randomly selected for HER2 testing. For each group, OS and DFS for HER2-positive patients given paclitaxel were superior to the same outcomes in HER2-positive patients not given paclitaxel. In contrast, OS and DFS for HER2-negative patients given paclitaxel appeared similar to the same outcomes for HER2-negative patients not given paclitaxel. They used Cox multivariate analyses, separately in each randomly selected group, and in the two groups combined, to test the statistical significance of an interaction term for HER2 positivity and paclitaxel treatment. Results for Group 2 and for Groups 1 and 2 pooled showed a statistically significant interaction favoring paclitaxel treatment in HER2-positive patients:

  • Group 1, n=643: recurrence, HR=0.63, p=.15; death, HR=0.61, p=.17

  • Group 2, n=679: recurrence, HR=0.52, p=.03; death, HR=0.52, p=.03

  • Groups 1+2, n=1,322: recurrence, HR=0.59, p=.01; death, HR=0.57, p=.01

Hayes, Thor, Dressler, et al. (2007) also investigated whether patients' estrogen-receptor status modified the impact of HER2 status on outcomes of paclitaxel. The researchers reported results of an exploratory analysis suggesting that, among HER2-positive patients, paclitaxel improved DFS whether patients were estrogen-receptor negative or positive. However, among HER2-negative patients, paclitaxel apparently improved DFS for ER-negative patients but not for ER-positive patients. HER2-negative, ER-positive patients comprised more than 50 percent of the patients in this study. However, the authors caution that additional prospective studies are needed to validate this finding before clinical practice changes and HER2-negative, ER-positive patients are no longer offered taxanes.

Martin, Pienkowski, Mackey, et al. (2005) compared DFS in patients randomized to AC plus docetaxel (TAC, n=745; HER2 positive, 155; HER2 negative, 475; HER2 unknown, 115) versus AC plus fluorouracil (FAC, n=746; HER2 positive, 164; HER2 negative, 468; HER2 unknown, 114). Subgroup analyses using a Cox proportional hazards model adjusted for age, tumor size and other prognostic factors showed superior outcomes with TAC compared to FAC for all subgroups, including by known HER2 status. A test for interaction of HER2 status with treatment effect, using the ratio of hazard ratios, was not statistically significant (ratio of HRs=0.85; p=.41).

Six studies on preoperative neoadjuvant chemotherapy. The primary outcome of interest for studies on neoadjuvant (preoperative) therapy is pathologic complete (pCR) and partial (PR) response rates, although clinical responses (cCR, cPR) also are considered. One randomized, controlled trial compared responses after neoadjuvant chemotherapy regimens (AC) with versus without added docetaxel (AC+D) (Learn, Yeh, McNutt, et al., 2005). Rates of cPR were similar with each regimen for HER2-positive (22 percent of each subgroup; AC, n=32; AC+D, n=9) and HER2-negative (24 percent of each subgroup; AC, n=37; AC+D, n=26) patients. Multivariate logistic regression analysis of overall clinical responses (ORR = cCR+cPR) showed a statistically significant increase with added docetaxel in HER2-negative patients (AC, ORR=51 percent; AC+D, ORR=81 percent; p<.05) but not in HER2-positive patients (AC, ORR=75 percent; AC+D, ORR=78 percent; p, NS). However, investigators did not report inclusion of an interaction term in their analysis.

Although two (Zhang, Yang, Smith, et al., 2003; Tulbah, Ibrahim, Ezzat, et al., 2002) of five uncontrolled series did report OS and/or DFS outcomes, these may have been influenced by postsurgical treatments that were not identical for all patients. Three of five series reported statistically significantly higher likelihood of response in the HER2-positive subgroups. Arriola, Moreno, Varela, et al. (2006) evaluated clinical and pathologic responses after preoperative treatment with doxorubicin alone. Although they did not report response rates for the HER2-positive (n=43) and HER2-negative (n=180) subgroups, a Mann-Whitney U test showed p=.03 for association of HER2 positivity with pCR. Park, Kim, Lim, et al. (2003) also investigated preoperative therapy with doxorubicin alone. They reported statistically significantly higher pCR (16 percent versus 0) and PR (71 percent versus 47 percent) in the HER2-positive (n=31) than the HER2-negative (n=36) subgroups, p=.013 by Fisher's exact test.

The study reported by Tinari, Lattanzio, Natoli, et al. (2006)compared marker assay results in paired core biopsy specimens (pre-chemotherapy) and resected tumors (post-chemotherapy), and focused primarily on changes induced by anthracycline-based neoadjuvant chemotherapy in HER2 and topoisomerase IIα (TopIIα) expression. However, they also used multivariate logistic regression analysis to compare pathologic tumor responses (TR, defined as either a pCR or minimal residual disease) in HER2 subgroups by core biopsy assays. Tinari and colleagues (2006) reported a 5.28-fold increase (95 percent CI: 1.57–19.6; p=.008) in the likelihood of achieving TR in HER2-positive than in HER2-negative patients.

Zhang, Yang, Smith, et al. (2003) investigated preoperative FAC in HER2-positive (n=28) and HER2-negative (n=69) patients. While overall clinical response rate was higher for the HER2-positive than the HER2-negative subgroup (CR+PR: 93 percent versus 78 percent), the risk ratio for response was not statistically significant (RR=1.2, 95 percent CI: 1.1–1.4, p=.14, Fisher's exact test). Overall pathologic response rates (pCR plus minimal residual disease, MRD) showed an even smaller difference between HER2-positive and HER2-negative subgroups that also was not statistically significant (18 percent versus 13 percent, RR=1.4, 95 percent CI: 0.54–3.67, p=.53, Fisher's exact test). Tulbah, Ibrahim, Ezzat, et al. (2002) investigated preoperative paclitaxel plus cisplatin in HER2-positive (n=21) and HER2-negative (n=31) subgroups. Pathologic complete response rates did not differ significantly between the groups (29 percent versus 23 percent; p=NS).

Three studies on chemotherapy for advanced or metastatic breast cancer. One of three studies did not compare different regimens and pooled data across arms randomized to different paclitaxel doses (Harris, Broadwater, Lin, et al., 2006); one compared monotherapy with doxorubicin (A) versus monotherapy with docetaxel (T) (Di Leo, Chan, Paesmans. et al., 2004); and one compared epirubicin plus cyclophosphamide (EC) versus epirubicin plus paclitaxel (ET) (Konecny, Thomssen, Luck, et al., 2004).

Harris, Broadwater, Lin, et al. (2006) used log rank analysis to compare Kaplan-Meier curves for OS between HER2-positive and HER2-negative patients, separately for test results by three different HER2 assays: CB11 IHC, the HercepTest™ IHC, and FISH. Differences between the curves were not statistically significant for any comparison. They also compared overall response rates (ORR=CR+PR) for subgroups defined by each HER2 assay. Results were statistically significant (HER2-positive, n=46, ORR=35 percent; HER2-negative, n=105, ORR=18 percent; p=.026) only with the HercepTest™ assay, and only when both 2+ and 3+ scores were considered HER2 positive.

Di Leo, Chan, Paesmans, et al. (2004) compared OS and time to progression (TTP) in patients randomized to A or T in HER2-positive (A, n=15; T, n=21) and HER2-negative (A, n=63; T, n=50) subgroups. There were no statistically significant differences between treatment arms for either outcome in either HER2 status subgroup. In contrast, ORR statistically significantly favored T over A in the HER2-positive subgroup (T, n=21, ORR=67 percent versus A, n=15, ORR=27 percent; OR=5.50, 95 percent CI: 1.28–23.69; p=.04). However, the difference was not statistically significantly different for the HER2-negative subgroup (T, n=50, ORR=40 percent versus A, n=63, ORR=35 percent; OR=1.24, 95 percent CI: 0.58–2.68; p=.70).

Konecny, Thomssen, Luck, et al. (2004) compared HER2-positive (EC, n=49; ET, n=48) and HER2-negative (EC, n=88; ET, n=90) subgroups randomized to EC or ET for OS and PFS. With the EC regimen, OS (median, 33.1 versus 16.4 months, log rank p=.01) and PFS (median, 10.4 versus 7.1 months, log rank p=.01) were significantly greater among HER2-positive than among HER2-negative patients. In each other comparison (OS or PFS; for the ET regimen by HER2 status, or for EC versus ET separately in subgroups by HER2 status) the difference was not statistically significant. Univariate chi square tests suggested each ORR difference was statistically significant (between all HER2-positive versus all HER2-negative patients, and separately by treatment arm and HER2 status subgroups; excluding those randomized to EC by HER2 subgroups). However, the interaction of treatment effect with HER2 status was not statistically significant (p=.256) by multivariate logistic regression.

Conclusions and Discussion, Key Question 3a

Across all three treatment settings (adjuvant, neoadjuvant, advanced/metastatic), currently available evidence comparing chemotherapy outcomes in HER2-positive and HER2-negative patient subgroups may be used to generate hypotheses, but is too weak to test hypotheses. Only one study (on adjuvant therapy; Martin, Pienkowski, Mackey, et al., 2005) is from a randomized, controlled trial that prespecified a multivariate subgroup analysis by HER2 status. Investigators reported the interaction of assigned treatment (with versus without paclitaxel) with HER2 status to predict outcome was not statistically significant (ratio of HRs=0.85; p=.41).

All other evidence is from post-hoc analyses on subgroups not directly randomized, selected, or stratified by HER2 status. All other reports from randomized, controlled trials were secondary or correlative analysis on patient subgroups with archived tissue samples available for HER2 testing. Many compared baseline characteristics and prognostic factors of patients with known versus unknown HER2 status, sometimes separately by treatment arm, but more often pooled across treatment arms. However, since few directly compared baseline characteristics and prognostic factors for HER2-positive and HER2-negative subgroups separately from each arm, it is uncertain whether these subgroups were well balanced. A minority of studies reported multivariate analyses that tested the statistical significance of interactions between treatment effects of different regimens and HER2 status.

Evidence on adjuvant CMF chemotherapy. Evidence from two studies (one randomized, controlled trial and one series) suggests HER2-positive patients may derive quantitatively smaller benefit from CMF (smaller improvements in OS and DFS) than experienced by HER2-negative patients. However, such evidence cannot prove that CMF provides no benefit to HER2-positive patients.

Evidence on adjuvant anthracycline therapy. An analysis from one of four randomized, controlled trials reports a statistically significant interaction between use of a regimen that includes an anthracycline and HER2 status as outcome predictors. Data from this study suggest HER2-positive patients (but not HER2-negative patients) experience a statistically significant improvement in outcome from inclusion of an anthracycline in their treatment regimen. Again, this does not prove that HER2-negative patients do not benefit from anthracycline therapy. Given the highly statistically significant result favoring anthracycline therapy for the large population of breast cancer patients included in the Early Breast Cancer Trialists' Collaborative Group (EBCTCG 2005) overview analysis, a more complete test of this hypothesis is needed before one can conclude that omitting anthracyclines from adjuvant chemotherapy regimens does not worsen outcome in HER2-negative patients. The absence of a statistically significant interaction in three other randomized, controlled trials is not informative, given the differences in specific treatment regimens, populations studied, and small numbers in the HER2-positive subgroups.

Two trials compared different doses or dose intensities (frequencies) of anthracycline-based regimens. One (Dressler, Berry, Broadwater, et al., 2005) reported a statistically significant interaction of CAF dose with HER2 status to predict treatment outcome, whether HER2 status was based on FISH, IHC, or PCR assays. Data from this study suggested the highest of three CAF doses (now considered by many oncologists the standard dose for all patients) improved outcomes for HER2-positive patients, but suggested no benefit from the highest dose for HER2-negative patients. In contrast, the interaction of dose intensity (frequency) with HER2 status to predict treatment outcome was not statistically significant in a second randomized, controlled trial (Del Mastro, Bruzzi, Nicolo, et al., 2005). Available data are too weak to conclude that HER2-positive patients clearly experience better outcomes with the higher-dose or dose-intensity anthracycline-based regimens.

Evidence on adding paclitaxel to adjuvant AC chemotherapy. A correlative analysis from one randomized, controlled trial (Hayes, Thor, Dressler, et al., 2007) provides evidence that adding paclitaxel after AC improves OS and DFS for HER2-positive patients, but may not improve these outcomes for HER2-negative patients. Here again, these strongly suggestive data are too weak by themselves to conclude that use of paclitaxel in adjuvant regimens is not beneficial in HER2-negative patients. Additionally, the only trial with a prespecified multivariate subgroup analysis (Martin, Pienkowski, Mackey, et al., 2005) reported that the interaction of concurrently added paclitaxel with HER2 status was not statistically significant.

The potential interaction between HER2 status, estrogen receptor status, and progesterone receptor status as predictors of chemotherapy efficacy is receiving increasing attention. The Hayes, Thor, Dressler, et al. (2007) article is the only included study on chemotherapy for breast cancer that addresses this issue, although the analysis only includes HER2 status and ER status. In an exploratory analysis, the authors found that adding paclitaxel improved survival for all HER2-positive patients and for HER2-negative/ER-negative patients, but not for HER2-negative/ER-positive patients. As discussed in the Conclusions and Discussion for Chapter 2, many researchers are investigating breast cancer subtypes identified by different combinations of ER, PR, and HER2, including the so-called “triple-negative” subtype (i.e., negative for HER2, estrogen receptor, and progesterone receptor), and the luminal subtypes (luminal A or luminal B) that are negative for HER2 but positive for at least one of the hormone receptors. There is evidence that the triple negative and luminal subsets differ with respect to prognosis, chemotherapy response, and outcomes (Carey, Dees, Sawyer et al., 2007; Liedtke, Mazouni, Hess et al., 2008), and they clearly differ with respect to effects of endocrine therapy. New phase III trials for patients with triple negative or “basal-like” breast cancer (Kilburn, 2008) should provide more insight in the future.

Systematic reviews on adjuvant chemotherapy. Recent systematic reviews and meta-analyses on HER2 status to predict chemotherapy outcomes were reported by Gennari and colleagues (Gennari, Sormani, Pronzato, et al., 2008) and by Pritchard and colleagues (Pritchard, Messersmith, Elavathil, et al., 2008; Dhesy-Thind, Pritchard, Messersmith, et al., 2008). Gennari and co-workers (2008) pooled data from eight randomized trials that compared adjuvant regimens with versus without an anthracycline (four of which did not meet selection criteria for this review). Two (NSABP B11, Paik, Bryant, Park, et al., 1998; NSABP B15, Paik, Bryant, Tan-Chiu, et al., 2000) considered patients HER2-positive if membranes of any tumor cells showed antibody staining by IHC, a threshold for HER2 positivity inconsistent with the ASCO/CAP and NCCN guidelines. Substantial numbers of patients from these early (but otherwise well done) randomized, controlled trials may have been classified as HER2 positive who would now be classified as HER2 negative using the currently recommended thresholds. Thus, pooling data from these analyses with later analyses that used current IHC scoring criteria to classify patients may potentially bias the outcome comparisons. We excluded a third study included by Gennari and colleagues (2008) since it was only published as an abstract, without slides available on the web (De Laurentiis, Caputo, Massarelli, et al., 2001). We excluded a fourth study they included (Di Leo, Gancberg, Larsimont, et al., 2002), since patients were not treated identically within each arm and patients with unknown hormone receptor status were given tamoxifen. We replicated the results of the Gennari, Sormani, Pronzato, et al., (2008) meta-analysis including the same studies the authors did and reached the same results. Then we redid the analysis including only the studies meeting criteria for the current review, which meant excluding the four studies mentioned above. Removing these studies widened the confidence intervals, but did not alter the overall conclusions.

The systematic reviews and meta-analyses reported by Pritchard and colleagues (Pritchard, Messersmith, Elavathil, et al., 2008; Dhesy-Thind, Pritchard, Messersmith, et al., 2008) also included randomized, controlled trials that did not meet selection criteria for this review. In addition to the four discussed above, we excluded three trials on anthracycline-based regimens that were reported only as meeting abstracts but without slides, audio or video available on the web to provide full access to presented data (Petruzelka, Pribylova, Vedralova, et al., 2000; Vera, Albanell, Lirola, et al., 1999; Arnould, Fargeot, Bonneterre, et al., 2003; Bonneterre, Roche, Kerbrat, et al., 2003). We also excluded one fully published study in which patients were not treated identically within each arm (Di Leo, Larsimont, Gancberg, et al., 2001) and a second fully published study on high-dose chemotherapy with autologous stem-cell transplant that did not report data by HER2 status separately for the conventional-dose arm (Rodenhuis, Bontenbal, van Hoesel, et al., 2006).

The Gennari and co-workers (2008) meta-analysis reports statistically significant improvement in DFS (six trials included) and OS (seven trials included) of HER2-positive patients given an anthracycline compared to the same outcomes for HER2-positive patients not given an anthracycline (HR for relapse=0.71, 95 percent CI: 0.61–0.83; p<.001; HR for death =0.73, 95 percent CI: 0.62–0.85; p<.001). In contrast, including an anthracycline apparently did not statistically significantly improve DFS or OS for patients with HER2-negative disease (HR for relapse=1.00, 95 percent CI: 0.90–1.11; p=.75; HR for death=1.03, 95 percent CI: 0.92–1.16; p=.60). The meta-analysis reported by Pritchard and co-workers (2008) included the same six trials for DFS and the same seven trials for OS, and reported identical pooled results (hazard ratios, confidence intervals) as those reported by Gennari and co-workers (2007). These analyses support the need for more definitive tests of the hypothesis that the balance of potential benefit versus harm of anthracyclines in HER2-negative patients may not justify their use. Furthermore, as discussed in Key Question 2 and in this section, future analyses and new studies should probably subdivide the HER2 negative group, and analyze subsets who are triple-negative (or “basal-like”) separately from those who are positive for one or both hormone receptors (luminal A or B).

Pritchard, Messersmith, Elavathil, et al. (2008) also reported a meta-analysis on DFS that included three randomized, controlled trials comparing higher-dose or intensity versus lower-dose or intensity anthracycline regimens: two are included here (Dressler, Berry, Broadwater, et al., 2005; Del Mastro, Bruzzi, Nicolo, et al., 2005), and one we excluded (Di Leo, Larsimont, Gancberg, et al., 2001). They found significant improvement of DFS at higher doses for HER2-positive patients (HR=0.54; 95 percent CI: 0.38–0.79) but not for HER2-negative patients (HR=0.98; 95 percent CI: 0.78–1.22). However, a test for the interaction of anthracycline regimen dose or dose intensity with HER2 status to predict DFS was not statistically significant. Thus, present evidence is too weak to support conclusions about HER2 status as a sole predictor of differences in outcome between higher- and lower-dose anthracycline-based regimens. Longer-term data on potential toxicities (particularly decreased ejection fraction and congestive heart failure) of the higher doses are also needed.

Pritchard, Messersmith, Elavathil, et al. (2008) reported on a final meta-analysis that pooled results on DFS from two randomized, controlled trials on adjuvant therapy (Hayes, Thor, Dressler, et al., 2007; Martin, Pienkowski, Mackey, et al., 2005) and one on neoadjuvant therapy (Learn, Yeh, McNutt, et al., 2005) that compared taxane-containing versus non-taxane-containing regimens. While all three trials were included in this systematic review, the validity of pooling them for meta-analysis seems uncertain. Postsurgical therapy in the Learn, Yeh, McNutt, et al. (2005) trial may have affected DFS and may not have been uniform in all three arms. The meta-analytic results suggest the magnitude of benefit from including a taxane in the regimen may be greater for HER2-positive patients (HR=0.60; 95 percent CI: 0.46–0.78) than for HER2-negative patients (HR=0.83; 95 percent CI: 0.71–0.98). However, these results also show statistically significant evidence of benefit for each group from including a taxane in the regimen. Thus, the evidence is presently too weak to support conclusions on HER2 status as a sole predictor of whether or not any subgroup of breast cancer patients benefits from paclitaxel therapy.

These meta-analyses were thorough and used appropriate methodologies. The difference in the trials included in the meta-analyses versus the current systematic review is due to varying prespecified inclusion and exclusion criteria, which are a matter of opinion. The main concern regarding the meta-analyses is their relevance to current practice. The current ASCO/CAP guidelines recommend a different approach to measuring HER2 status than used in the trials incorporated into the meta-analyses, which is why we chose not to perform a formal meta-analysis. Whether and how the change in measurement of HER2 status alters the results of the trials and meta-analyses is unknown since necessary data are unavailable.

Evidence on neoadjuvant chemotherapy. Available evidence on whether HER2 status affects rates of complete pathologic response (pCR) to neoadjuvant chemotherapy is limited to four uncontrolled series (retrospective analysis in three). Although two of four reported statistically significantly higher pCR rates in HER2-positive than HER2-negative patients, these data are too weak to conclude that the regimens tested are of no benefit to HER2-negative patients. Furthermore, data are lacking to directly compare any neoadjuvant regimens. Since a number of trials have already compared different neoadjuvant therapies, correlative studies using archived tissue samples may be useful. However, it is also possible that conclusions on relative benefits of different regimens from studies in the adjuvant setting may generalize to the neoadjuvant setting.

Evidence on chemotherapy for advanced disease. Evidence also is limited on differences by HER2 status for outcomes of chemotherapy for advanced or metastatic disease. Three randomized, controlled trials investigated different treatments: one studied paclitaxel alone (at different doses), one studied an anthracycline alone versus a taxane alone, and one studied an anthracycline plus cyclophosphamide versus an anthracycline plus a taxane. Small patient groups limited statistical power.

In summary, although present evidence is suggestive, it is too weak to determine in either the adjuvant, neoadjuvant, or metastatic disease settings, whether a more favorable balance of benefit versus risk from chemotherapy can be achieved by selecting patients for anthracycline- or taxane-based regimens based on HER2 status.

Research needs. Future trials that compare adjuvant chemotherapy regimens with versus without an anthracycline, or with versus without a taxane, could determine HER2 status at the time of diagnosis, and stratify randomization by HER2 assay results. This approach might provide more definitive tests for the hypotheses that neither an anthracycline nor a taxane improves outcomes of HER2-negative patients. Another possibility is for the EBCTCG to collect individual patient data on HER2 status using current scoring thresholds from all trials that compared adjuvant regimens with versus without an anthracycline, or with versus without a taxane. If sufficient tumor samples are available, this might be a more efficient and more definitive approach for testing hypotheses on the interaction of HER2 status with assigned treatment to predict outcome. Future analyses should also obtain more complete information on estrogen and progesterone receptor status of all patients. This would enable investigators to further subdivide the HER2-negative subset, so that triple-negatives (or those with “basal-like” breast cancer if gene array data were obtained) can be analyzed separately from the luminal A and B subtypes.

Key Question 3b

For breast cancer patients, what is the evidence on clinical benefits and harms of using HER2 assay results to guide selection of hormonal therapy?

Study Selection

Of the 219 articles retrieved for Question 3, 66 were assessed for potential relevance to Question 3b. Only six articles met the selection criteria. The primary reasons for article exclusion are as follows: not reporting outcomes identified in selection criteria; not reporting outcomes by HER2 status, nonidentical treatment of patients, measurement of HER2 status inconsistent with current specialty society recommendations; lack of primary data; or inclusion of only HER2-positive patients, only HER2-negative patients, or fewer than 20 HER2-positive cases.

Two of the studies that did not meet the selection criteria were by Berry, Muss, Thor, et al. (2000) and by Ellis, Coop, and Singh, et al. (2001). The first uses data from the CALBG 8541 trial, and data from this trial are included in the previous section on chemotherapy for breast cancer. It is excluded here because while the chemotherapy regimens were randomized across patients, the use of tamoxifen was not. Rather, tamoxifen was prescribed based on clinician preferences. Its use increased over time after recommendations for its use in ER-positive, postmenopausal women were released during the course of the trial and as the percentage of postmenopausal women recruited also rose. Although the study by Ellis, Coop, and Singh, et al. (2001) on the neoadjuvant use of letrozole versus tamoxifen reportedly affected clinical practice, it is excluded from this systematic review for two reasons: It reported on clinical response (breast palpation) rather than the more definitive pathological response, and it used a broader definition of HER2 positivity (IHC scores of 2+ and 3+ were designated as positive, without any further evaluation of IHC 2+ scores using FISH).

Table 15

Hierarchy of evidence, KQ3b
Level of EvidenceStudynSettingTreatmentsOutcomeResults
RCT stratified on HER2 status/HER2-guided vs. non-HER2-guided
RCT prespecified MV SGA
RCT post-hoc MV SGA Von Minckwitz 2007194Neoadjuvant tumor≥3cm, age 18–70doxorubicin+ docetaxel + tamoxifen (TAM)pCRUnivariate: Not reported;
Log reg: IHC HER2 as predictor of pCR p=0.126; HER2*TAM not reported
Rasmussen 20083533Adjuvant postmen HR+tamoxifen vs. letrozoleDFSUnivariate : FISH/IHC HER2+ vs. HER2- p<.0001
Coxa: FISH/IHC HER2*Tx, p=.60
TTRCox: FISH/IHC HER2*Tx NS
Dowsett 20081782Adjuvant postmen HR+tamoxifen vs. anastrozole (ANA) for 5 yrsTETRUnivariate: FISH/IHC HER2 - vs. + ANA:p<0.0001, TAM:p=.002
Cox: FISH/IHC HER2 - vs. + ANA:p<0.001, TAM:p=.014
“no indication” of greater differential benefit of ANA vs. TAM but
no statistics provided and only 44 HER2+ pts so CIs wide
Ryden 2005, 2007470Adjuvant, Stage II, Premen or <50 yearstamoxifen vs. observationRFSUnivariate: IHC HER2- Tx vs. Cx p=.07 (ER+)
IHC HER2+ Tx vs. Cx p=.2 (ER+)
FISH HER2- or HER2+ Tx vs. Cx p=.14 (ER+)
Cox regression: IHC HER2*TAM p=.4 (ER+); p=.3 (ER+/PR+)
FISH/IHC HER2*TAM p=.95 (unclear if ER+ only)
Knoop 20011515Adjuvant, High risk, Postmentamoxifen vs. observationDFSUnivariate: IHC HER2 lo+ or - Tx vs. Cx p=.0001 (HR+)
IHC HER2 hi+ Tx vs. Cx p=.5 (HR+)
Cox regression: IHC HER2 and HER2*TAM not significant (HR+)
RCT treatment by HER2 SGA
1-arm prespecified MV analysis
1-arm post-hoc MV analysis Arpino 2004136Metastatic, 1st line TxtamoxifenORRNo stat signif diff FISH HER2 + vs. - in CR+PR+SD
TTFunivariate: FISH HER2 - vs. + p=.007
Cox regression: HER2+ as predictor TTF p=.54
OSunivariate: FISH HER2 - vs. + p=.07
1-arm UV analysisCox regression: HER2+ as predictor OS p=.97
a

Stratified for randomization group and chemotherapy

Abbreviations: DFS: disease-free survival; HR: hazard ratio; MV: multivariate; ORR: overall response rate; OS: overall survival; pCR: pathologic complete response; RCT: randomized, controlled trial; RFS: recurrence-free survival; SGA: subgroup analysis; TETR: time to early tumor recurrence; TTF: time to treatment failure; TTR: time to tumor recurrence; Tx: treatment; UV: univariate analysis;

Table 16

Summary study quality assessment, KQ3b
StudyProspective designPrespecified hypotheses about relation of marker to outcomeLarge, well-defined, representative study populationMarker assay methods well-describedBlinded assessment of marker in relation to outcomeHomogeneous treatment(s), either randomized or rule-based selectionLow rate of missing data (≤15%)Sufficiently long followup1) clear candidate variable selection, 2) clear, appropriate model-building guidelines, 3) assumptions tested, 4) standard prognostic variables included, 5) continuous variables well handled, 6) validation
1)2)3)4)5)6)
von Minckwitz et al., 2007?NYY?YNYYN???N
Rasmussen et al., 2008, Mauriac et al., 2007NYYYNot reportedYN51 mos/24 mos???Y?N
Dowsett et al., 2008NNYY?YNYY????N
Ryden et al., 2005, 2007Y??N?YNMed=14 yrs if no breast eventNNY?N?
Knoop et al., 2001YNYN?YY?NN?YN?
Arpino et al., 2004YNYY?YN?Y??NN?
Four of the six studies that met selection criteria investigated outcomes of tamoxifen; while two others compared an aromatase inhibitor (letrozole or anastrozole) to tamoxifen (Tables 15 and 16). No studies on selective estrogen receptor modulators met selection criteria. Five of the studies were secondary analyses by HER2 status of randomized, controlled trials, while the sixth was a prospective, uncontrolled series. One of the secondary analyses addressed neoadjuvant therapy; four focused on adjuvant therapy; and the uncontrolled series reported on metastatic disease. None of these studies used trastuzumab for HER2-positive patients; studies addressing the use of trastuzumab were reviewed in Chapter 2.

Table 17

Summary design, enrollment and treatment, KQ3b
StudyTherapeutic SettingTreatment(s)AgeExtent of DiseasePerformance Status Scale Index ResultHormone Receptor Status (%)
Neoadjuvant Hormonal Therapy
von Minckwitz et al., 2007, Germany, multicenter RCT, secondary analysisPrimary breast carcinoma ≥ 3 cm largest diameter; no distant metastases, age 18–70Doxorubicin + docetaxel ± tamoxifen (TAM); followed by surgery within 14–28 daysMedian=48 Range=27–67 Premen: 51%(TAM-) 57%(TAM+)Positive nodes: 47% (TAM-) 53% (TAM+)Karnofsky score ≥70% 100% ≥90% 96.3%ER+ 59.2 (TAM-) 53.1 (TAM+) PR+ 43.9 (TAM-) 34.7 (TAM+)
Adjuvant Hormonal Therapy
Rasmussen et al., 2008, Mauriac et al., 2007, international, multicenter RCT, secondary analysisPostmenopausal women with HR+, early invasive breast cancer, in monotherapy arms of BIG 1–95 trialLetrozole (LET) vs. TAM; 44%–54% had mastectomy; 21%–32% had chemotherapyMedian=~60Positive lymph nodes: 42–47%Not reportedMedian ER=85–90 Median Pr=10–70
Dowsett et al., 2008, international, multicenter RCT, secondary analysisPostmenopausal women with operable, invasive breast cancer HR+, in monotherapy arms of ATAC trial. Most from UK.Anastrozole (ANA) vs. TAM for 5 years; mastectomy, 41%; chemotherapy, 9%; TAM presurgery, 3%Median=63Positive lymph nodes: 30%Not reportedPr+, 78%
Ryden et al., 2005, multicenter RCT, secondary analysis Stage II, premenopausal/<50 yrsTAM for 2 yrs vs. control; mastectomy or breast-conserving surgery + radiotherapy; <2% pts received adjuvant chemotherapyMedian=45 Range=2–75~70% are node positive; tumor 25 in TAM group vs. 22 in control (p=0.03)Not reportedTAMCxER-
PR-3026
PR+810
ER+
PR-45
PR+5457
P=0.6
Not done42
Knoop et al., 2001, Denmark, multicenter RCT, secondary analysis Postmenopausal, “high risk”Grp 1: TAM 10 mg 3×/day for 1 year (n=868) + radiotherapy Grp 2: Radiotherapy (n=848)Median=66, Range=45–88High risk=positive axillary lymph nodes, tumor >5 cm, or tumor invaded skin or deep fasciaNot reportedER+66% (11% HER2+)
PR+43%
(7% HER2+)
Metastatic Hormonal Therapy
Arpino et al., 2004, multicenter, US? PROFirst line, ER+TAM 2×/day, 10 mg (n=56) or 10 mg/m2 (n=149).HER2+: 66%<65 yo; 16% premenNot reportedNot reportedHer2+Her2-
HER2-: 57%<65yo; 12% premenER+100100
PR+7896

Table 18

Summary time to event outcomes, KQ3b
StudyTime to Event Outcomes
Neoadjuvant Hormonal Therapy
von Minckwitz et al., 2007Not reported
Adjuvant Hormonal Therapy
Rasmussen et al., 2008, Mauriac et al., 2007OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yrTestpHR (95%CI)Comments HER2+ vs. HER2- (any Tx): HR=2.09(1.59–2.76) p<0.0001.
DFSHER2+LET vs. TAM
All239~0.95~0.87~0.82~0.75HER2+
LET134~0.97~0.90~0.86~0.790.62(0.37–1.03)
TAM105~0.94~0.84~0.75~0.70
HER2-
All3,294~0.98~0.96~0.91~0.88HER2-
LET1,648~0.98~0.97~0.95~0.900.72(0.59–0.87)
TAM1,646~0.98~0.95~0.90~0.86
Dowsett et al., 2008OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments “[N]o indication of a greater differential benefit of anastrozole over tamoxifen in the HER-2-positive patients. However, there were only 44 events in the HER-2-positive group, so the CIs are wide.”
TTRTAM837
HER2-~0.01~0.04~0.06~0.08~0.090.00182.25
HER2+~0.01~0.06~0.10~0.15~0.25
ANA875
HER2-~0.01~0.02~0.04~0.05~0.06<0.00013.27
HER2+~0.01~0.08~0.12~0.16~0.20
Ryden et al., 2005, 2007 ER+ pts OutcomeGrpNMed (mos)5 yr10yr15 yrTestpHR (95%CI)Comments No stat diff in RFS between HER2+and HER2- pts (measured by IHC or FISH) among untreated pts. VEGFR2 status was predictive of TAM efficacy. Using the combined HER2 measure, there was a TAM effect in the ER+/HER2- group (n=275; HR=0.64, 95%CI: 0.44–0.93, p=0.02), but not in the ER+/HER2+cohort (n=24; HR=0.71, 95%CI: 0.23–2.20, p=0.6).
RFSHER2+ (IHC 3+)
Tx8~0.7~0.7~0.7LR0.20.38 (0.08–1.79)
Cx13~0.4~0.4~0.4
RFSHER2- (IHC 0–2+)
Tx115~0.75~0.7~0.65LR0.070.69(0.46–1.03)
Cx124~0.7~0.6~0.55
RFSHER2+ (FISH)
TxData not reportedLR0.140.21 (0.03–1.67)
CxData not reported
RFSHER2- (FISH)
TxData not reportedLR0.140.73(0.47–1.12)
CxData not reported
Knoop et al., 2001 ER+ or PR+ pts only OutcomeGrpNMed (mos)5 yr10 yrTestpHR (95%CI)Comments
DFS:
HER2 -TAMNot reported57(2)34(2)LR.0001Bonferroni p=.0006
& low + (n=1,005)Cx43(2)26(2)
HER2 hi + (n=52)TAMNot reported63(11)37(12)LR.5Bonferroni p=.5
Cx41(9)35(8)
CoxHER2+ (n=54):  RR TAM vs. Cx=0.89  (95%CI:0.63–1.27)
HER2- (n=998):  RR TAM vs. Cx=0.86  (95%CI:0.78–0.93)
MV CoxHER2 and HER2*TAM: Not significant (p values not reported)
NOTE: Analysis limited to steroid-receptor positive pts. Standard errors in parentheses. LR=log-rank test of differences in DFS probabilities for pts with the variables in question when treated with TAM or not.
Metastatic Hormonal Therapy
Arpino et al., 2004OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yr10yrTestpHR (95%CI)Comments HER2+ pts had lower median ER levels, even when all pts ER+
TTFHER2-1047~.35~.18~.08~.0800LR.007
HER2+325~.20~.03No lineNo lineNo line
OSHER2-10431~.85~.60~.50~.25~.20~.05LR.07
HER2+3225~.90~.52~.30~.20~.08~.05
HER2+ as predictor of TTFMV Cox0.541.15adjusted
HER2+ as predictor of survivalMV Cox0.970.99adjusted

Abbreviations: ANA: anastrozole; Cx: control; DFS: disease-free survival; HR: hazard ratio; LET: letrozole; LR: log rank; MV: multivariate; OS: overall survival; RR: relative risk; TAM: tamoxifen; TTF: time to treatment failure; TTR: time to tumor recurrence; Tx: treatment;

Table 19

Summary tumor response and quality of life, KQ3b
StudyTumor Response (%)
Neoadjuvant Hormonal Therapy
von Minckwitz et al., 2007ALL (pCR)ER+ (pCR)ER- (pCR)
HER2+HER2-HER2+HER2-HER2+HER2-
TAM+0%10.7%0%0%0%24.2%
TAM-8.7%9.6% 9.1%2.2%8.3%21.4%
Adjuvant Hormonal Therapy
Rasmussen et al., 2008, Mauriac et al., 2007Not reported
Dowsett et al., 2008Not reported
Ryden et al., 2005Not reported
Knoop et al., 2001Not reported
Metastatic Hormonal Therapy
Arpino et al., 2004GrpNcCR+cPR+cSDPDNETestpComments
HER2-10456%44%χ2NS
HER2+3247%53%

Abbreviations: cCR: clinical complete response; cPR: clinical partial response; ER: cSD: clinical stable disease; estrogen-receptor; NE: not evaluable; NS: not significant; PD: progressive disease; SD: stable disease;

Table IIIb-A

Design, Enrollment and Treatment
StudyDesignTherapeutic Settingn, Enrolled (Randomized)n, Evaluatedn, Withdrawn (Lost to F/U)Treatment Regimen (Agents)
Neoadjuvant Hormonal Therapy
None
Adjuvant Hormonal Therapy
Ryden et al 2005, multicenter, Sweden, 1986-1991RCT (secondary analysis)Stage II invasive cancer, premenopausal or <50 years old. Includes HR+ and HR-564428136 (64, no specimens; 72, not assessable by IHC). Another 55 not assessable by FISH. Baseline prognostic factors similar in groups with or without specimens.TAM for 2 yrs vs no TAM; also mastectomy or breast-conserving surgery + radiotherapy <2% pts received additional adjuvant chemotherapy (polychemotherapy, n=8; goserelin, n=1). Evenly distributed across arms.
Knoop et al. 2001, 27 sites, all in Denmark?, 8/77–11/82RCT (secondary analysis of Danish Breast Cancer Cooperative Group's 77c protocol)Adjuvant, postmenopausal, “high risk” (positive axillary lymph nodes, tumor > 5 cm, skin or deep facia involvement) (Note: eligibility did not depend on hormone receptor status)17161515201 (167, no specimens; 33, unevaluable ; 1,,unaccounted for. Baseline prognostic factors & outcomes similar in groups with or without specimens.)TAM thrice daily for 1 year vs observation. All patients treated with mastectomy, lower ALND, and radiotherapy. 8% of the TAM pts were HER2-positive, vs. 14% of the observation arm (p=0.001) (per email from Dr. Knoop)
Metastatic Hormonal Therapy
Arpino et al. 2004, multicenter, US?, 1982-1987Prospective uncontrolled, SWOG protocol 8228 and ancillary study 9314First-line Tx for metastatic disease, ER+; prior adjuvant TAM or chemo completed > 3 mos before relapse349136213 (134, no specimens; 7, inevaluable specimens; 4, lost to F/U, 68, assays unsuccessful)TAM twice daily until disease progression (failure), 10 mg (n=56) or 10 mg/m2 (n=149)

Table IIIb-K

Case Series/Single Arm Trial Study Quality Ratings
StudyClearly Defined QuestionWell-Described Study PopulationWell-Described InterventionUse of Validated Outcome Measures (Independently Assessed)Appropriate Statistical AnalysisWell-Described ResultsDiscussion/Conclusions Supported by DataFunding/Sponsorship Source Acknowledged
Hormonal Therapy
Chemotherapy
The neoadjuvant study (von Minckwitz, Sinn, Raab, et al., 2007) was a secondary analysis of a randomized trial comparing a chemotherapy regimen (doxorubicin and docetaxel) with or without the addition of tamoxifen. The four secondary analyses of randomized trials of adjuvant therapy included comparisons of (1) letrozole versus tamoxifen (Rasmussen, Regan, Lykkesfeldt, et al., 2008; Mauriac, Keshaviah, Debled, et al., 2007); (2) anastrozole versus tamoxifen (Dowsett, Allread, Knox, et al., 2008); (3) tamoxifen plus radiotherapy versus radiotherapy alone (Knoop, Bentzen, Nielsen, et al., 2001); (4) tamoxifen versus no tamoxifen following mastectomy or breast-conserving surgery plus radiotherapy (Ryden, Jirstrom, Bendahl, et al., 2005). The study of metastatic disease (Arpino, Green, Allred, et al., 2004) was a prospective, uncontrolled series of HER2-positive or HER2-negative patients given tamoxifen. Study hierarchy, quality assessment, summary descriptions, and results are summarized in Tables 1519; detailed abstraction data can be found in Appendix Tables IIIb-AIIIb-K *.

Patient Characteristics

Patients in the von Minckwitz, Sinn, Raab, et al. (2007) neoadjuvant trial had unilateral primary breast carcinoma at least 3 cm in largest diameter with no distant metastases or inflammatory disease. They comprised 194 of the 250 patients in the GEPARDO [German Preoperative Adriamycin-Docetaxel] trial. The average age was 48 years and 51 percent (control [Cx] group) to 57 percent (tamoxifen [TAM] group) were premenopausal. Forty-seven percent (Cx) to 53 percent (TAM) had clinically positive lymph nodes, and all had a Karnofsky score of at least 70 percent. For hormone-receptor status, 53 percent (TAM) to 59 percent (Cx) were ER-positive, while 35 percent (TAM) to 44 percent (Cx) were PR positive. HER2 status was measured centrally using IHC, and a HercepTest™ score of 3+ was considered positive. About 24 percent of the participants were HER2 positive.

Patients in the Rasmussen, Regan, Lykkesfeldt, et al. (2008) study comprised 3,533 of the 4.922 patients in the monotherapy arms of the BIG 1–98 trial. They were postmenopausal with early stage invasive cancer. The median age was around 60 years, and about 37 percent (HER2-negative patients) to 45 percent (HER2-positive patients) had tumors larger than 2 cm. Fewer than half had positive lymph nodes (42 percent for HER2-negative pts; 47 percent for HER2-positive patients). The median estrogen receptor level was 85 for HER2-positive patients and 90 for HER2-negative patients (p<0.0001), while the median progesterone receptor level was 10 in HER2-positive patients and 70 in HER2-negative patients (p<0.0001). HER2 positivity was defined as amplification by FISH or HercepTest™ 3+ by IHC (in 0.5 percent of patients with no FISH result). Seven percent of the patient population was HER2-positive.

Patients in the Dowsett, Allread, Knox, et al. (2008) study comprised 1,782 of the 5,880 patients in the monotherapy arms of the ATAC trial; most were from the United Kingdom. Sixty-seven percent of the patients had prior radiotherapy; 9 percent, prior chemotherapy; and 3 percent, tamoxifen prior to surgery. The median age was 63 years; and all of the women were postmenopausal. Sixty-seven percent had tumors that were no larger than 2 cm; 66 percent had negative lymph nodes; and all were hormone receptor positive (78 percent were PR+). HER2-positivity was defined by a score of 3+ on IHC or 2+ on IHC plus FISH amplification. Ten percent of the patients in the study were HER2-positive.

Patients in the Knoop, Bentzen, Nielsen, et al. (2001) adjuvant study were postmenopausal with a median age of 66 years. They had a high risk of recurrence, defined as having positive axillary lymph node(s), tumor larger than 5 cm diameter, or skin/deep fascial involvement. Sixty-six percent of the patients were estrogen-receptor (ER) positive, and 43 percent, progesterone-receptor (PR) positive.

In the original randomized, controlled trial, the Danish Breast Cancer Cooperative Group's 77c protocol, patients were randomized to receive tamoxifen three times daily for a year or to observation. All patients were also treated with mastectomy, lower axillary lymph node dissection, and radiotherapy. In the secondary analysis, data on HER2 status were available on a subset (n=1,515, 88 percent) of those in the original trial. Eighteen percent of these patients were HER2 positive by IHC, but approximately 11 percent had IHC results roughly comparable to a 3+ score by HercepTest™*. However, the proportions of HER2-positive patients differed between the arms of the trial: 8 percent of patients in the tamoxifen arm were HER2 positive, while 14 percent of those in the control arm were HER2 positive (p=0.001).

Patients in the Ryden, Jirstrom, Bendahl, et al. (2005) and Ryden, Landberg, Stal, et al. (2007) adjuvant trial had Stage II invasive cancer and included 470 or the 564 patients in the original trial. The median age was 45 years, and all were premenopausal or younger than 50 years old. The median tumor size ranged from 22 in the control group to 25 in the tamoxifen group. Both hormone-receptor-positive and hormone-receptor-negative patients were included. Fifty-four percent of patients in the tamoxifen group and 57 percent of patients in the control group were ER positive and PR positive, respectively; 30 percent and 26 percent, were ER negative and PR negative, respectively; the remainder were either ER negative/PR positive or ER positive/PR negative. Approximately 70 percent of the patients had positive lymph nodes. Patients were randomized to tamoxifen for two years versus no tamoxifen. Patients also underwent mastectomy or breast-conserving surgery plus radiotherapy. Less than 2 percent of patients, evenly distributed across arms in the original trial, received additional chemotherapy (n=8) or goserelin (n=1).

Data on HER2 status were available on 428 patients, or 76 percent of the original trial participants. The authors reported that baseline prognostic factors were similar in the groups with and without archived pathological specimens available for the secondary analysis. HER2 status was measured by FISH, using a cutoff of six signals/tumor cell (13 percent of patients were HER2 positive) and by IHC using a cutoff of 3+ on the HercepTest™ (15 percent were HER2 positive). The correlation between IHC 3+ and FISH amplification was r=0.82 (p<0.001); κ=0.84.

Patients with metastatic disease in the Arpino, Green, Allred, et al. (2004) single-arm study were drawn from the Southwest Oncology Group's (SWOG) protocol 8228 and ancillary study 9314. Approximately 60 percent of the patients were younger than 65 years old, and approximately 14 percent were premenopausal. All patients were ER positive; 78 percent of the HER2-positive and 96 percent of the HER2-negative patients were PR positive. Patients received tamoxifen twice daily as first-line therapy until disease progression.

Data on HER2 status were available on 136 patients, or about 39 percent of the original study participants. HER2 status was measured by FISH with a cutoff of HER2/CEP17 ratio of 2 or more (24 percent of patients were HER2 positive) and by IHC with a cutoff of complete membrane staining in 10 percent or more of tumor cells (21 percent of patients were HER2 positive), but only the FISH results were used in this analysis.

Outcomes Reported and Followup

The outcome for the neoadjuvant study (von Minckwitz, Sinn, Raab, et al., 2007) was pathological complete response, and surgery was performed within 14–28 days after chemotherapy was completed. In the two studies on the BIG 1–98 trial, Mauriac, Keshaviah, Debled, et al. (2007) assessed time to early tumor recurrence (TETR), defined as a recurrence within 2 years, which was also the median followup; while Rasmussen, Regan, Lykkesfeldt, et al. (2008) reported on disease-free survival with a median followup of 51 months. In the comparison of anastrozole versus tamoxifen from the ATAC trial, Dowsett, Allread, Knox, et al. (2008) examined time to recurrence; the duration of followup was unclear, possibly 68 months. The only outcome reported in the Knoop, Bentzen, Nielsen, et al. (2001) adjuvant study was disease-free survival (DFS); the duration of followup was not reported, but the tables included estimates of DFS at 10 years. The Ryden, Jirstrom, Bendahl, et al. (2005) adjuvant trial only reported recurrence-free survival (RFS) and had 14 years; median followup for patients without a breast cancer event. The Arpino, Green, Allred, et al. (2004) uncontrolled study on metastatic disease reported overall response rates (ORR; sum of complete plus partial responses), time to failure (TTF), and overall survival (OS). “Nearly all” of the tumor blocks were more than 10 years old; some were more than 20 years old.

Results by Hierarchy Level, Study Quality Assessment

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, post-hoc multivariate subgroup analysis. Five of the six studies that met the selection criteria were post-hoc analyses of randomized controlled trials. The only neoadjuvant study compared pathological tumor response in patients receiving doxorubicin and docetaxel with or without tamoxifen (von Minckwitz, Sinn, Raab, et al., 2007). The pCR rate among ER-positive and HER2-positive patients was 0 percent for those receiving tamoxifen versus 9 percent for those not receiving it; among HER2-negative patients the corresponding numbers were 24 percent and 21 percent. The numbers were small, however. There were only 25 ER-positive and HER2-positive patients, with 1 pCR, while there were 61 ER-positive but HER2-negative patients, with 14 pCRs. In a multivariate logistic regression model including menopausal status, tumor size, grade, and nodal status, the odds ratio for HER2 was 3.66 (95 percent CI: 0.69–19.30, p=.126). Analysis of the interaction term between HER2 status and treatment group was not reported. Consequently, the study confirms that the prognosis is poorer in HER2-positive patients, but it does not indicate whether or not tamoxifen is more or less effective in HER2-positive versus HER2-negative patients.

Two studies compared the use of an aromatase inhibitor versus tamoxifen. In secondary analyses of the BIG 1–98 trial, disease-free survival and time to early tumor recurrence were examined. Rasmussen, Regan, Lykkesfeldt, et al. (2008) reported a hazard ratio of letrozole versus tamoxifen among HER2-positive patients of 0.62 (95 percent CI: 0.37–1.03) and among HER2-negative patients of 0.72 (95 percent CI: 0.59–0.87). While the numerical values of the hazard ratios are similar, the result for HER2-negative patients is statistically significant, while that for HER2-positive patients is not. The number of HER2-positive patients is 239, much smaller than the 3,294 HER2-negative patients. Mauriac, Keshaviah, Debled, et al. (2007) report that the time to early tumor recurrence does not appear to be statistically significantly different by treatment group in either HER2-positive or HER2-negative patients, and the HER2 status/treatment group interaction term in a multivariate analysis is not statistically significant. Consequently, this study suggests that letrozole increases disease-free survival among HER2-negative patients relative to tamoxifen, but it does not provide evidence on a greater effect among HER2-positive patients.

In the secondary analysis of the ATAC trial, Dowsett, Allread, Knox, et al. (2008) compare the effect of anastrozole and tamoxifen by HER2 status. They examine time to treatment recurrence by HER2 status and report hazard ratios of HER2-negative versus HER2-positive patients of 2.25 (p=.0018) for anastrozole and 3.27 (p<.0001) for tamoxifen. These results demonstrate that HER2-positive patients have a poorer prognosis than HER2-negative patients but do not compare the effectiveness of each treatment within each HER2 group. The authors report that there is “no indication of a greater differential of anastrozole over tamoxifen in the HER-2-positive patients. However, there were only 44 events in the HER-2-positive group, so the CIs are wide.” No further details of the analysis are provided. In the multivariate analysis, no analysis of an interaction term between HER2 status and treatment group is reported.

Table 20

Summary results for DFS in Knoop, Bentzen, Nielsen, et al. (2001)
HER2 StatusTreatment group5-year DFS (% ± SE)10-year DFS (% ± SE)Log rank p valuep value with Bonferroni correction
Negative/low-positive (n=1,005) Tamoxifen 57% (±2) 34% (±2) .0001.0006
Control43% (±2)26% (±2)
High-positive (n=52)Tamoxifen63% (±11)37% (±12) .5.5
Control41% (±9)35% (±8)

Abbreviations: DFS: disease-free survival; SE: standard error;

Two studies compared patients treated with tamoxifen versus a control group; they both included a multivariate analysis. Table 20 summarizes results reported by Knoop, Bentzen, Nielsen, et al. (2001) from their secondary analysis on outcomes of adjuvant tamoxifen by HER2 status in hormone-receptor-positive patients. The results showed that patients who were HER2 negative or low HER2 positive had statistically significantly longer disease-free survival when they were treated with tamoxifen; the difference in survival (with versus without tamoxifen) was not statistically significant for patients that were high HER2 positive.

A multivariate Cox model was constructed that included tumor size, proportion node positive, histologic grade, p53 value, EGFR, HER2, tamoxifen, and interactions between tamoxifen and p53, HER2 and EGFR. The coefficients for HER2 and for the interaction term for HER2 and tamoxifen were not statistically significant (specific p values and coefficients not reported for these variables). Node positive proportion (RR=1.011), grade (RR=1.103), p53 (1.54), and tamoxifen (RR=0.73) were statistically significant at p<.01. In other words, after controlling for other variables, HER2 was not a statistically significant predictor for outcomes of treatment with tamoxifen in this study.

Table 21

Summary results for RFS in Ryden, Jirstrom, Bendahl, et al. (2005)
HER2 StatusIHC FISH
Log rank p valueHazard ratio TAM vs. Cx (95% CI)Log rank p valueHazard ratio TAM vs. Cx (95% CI)
HER2- (n=239)a.070.69 (0.46–1.03).140.73 (0.47–1.12)
HER2+ (n=21)b.20.38 (0.08–1.79).140.21 (0.03–1.67)
a

IHC 0–2+ or FISH nonamplified

b

IHC 3+ or FISH amplified

Abbreviations: Cx: control; TAM: tamoxifen

The results of the secondary analysis of the adjuvant trial by Ryden, Jirstrom, Bendahl, et al. (2005) are summarized in Table 21. All patients were ER positive. No result was statistically significant.

The authors also reported that among untreated patients, the difference in outcome between HER2-positive and HER2-negative patients (measured with either IHC or FISH; in both univariate and multivariate Cox proportional hazard models) was not statistically significant. In contrast, the marker VEGFR2 was a statistically significant predictor of outcome of tamoxifen treatment. In a univariate analysis among ER-positive/PR-positive patients with HER2 status measured using IHC, the duration of RFS was longer among tamoxifen-treated patients than controls in the HER2-negative subgroups (p=.03) but not among HER2-positive (p=.3) patients.

In a multivariate Cox model, the interaction term between treatment (tamoxifen versus control) and HER2 status was not statistically significant when the model was run for ER-positive patients (p=.4) or ER-positive/PR-positive patients (p=.3). The covariates in the model were not clearly listed but probably included age, tumor size, nodal status, Nottingham histologic grade, tamoxifen, and the interaction term.

Randomized trial, treatment by HER2 subgroup analysis. No studies of this type were identified.

Single-arm study, prespecified multivariate analysis. No studies of this type were identified.

Single-arm study, post-hoc multivariate analysis. The prospective but uncontrolled study on use of tamoxifen for metastatic disease by Arpino, Green, Allred, et al. (2004) compared outcomes for HER2-positive versus HER2-negative patients. ORR was 56 percent for HER2-negative patients and 47 percent for HER2-positive patients (χ2 test, p=NS). Median TTF was 7 months for HER2-negative patients versus 5 months for HER2-positive patients (log rank p=.007). Finally, median OS was 31 months for HER2-negative patients versus 25 months for HER2-positive patients (log rank p=.07). While all of the patients were ER positive, median ER levels were lower in HER2-positive than in HER2-negative patients.

Multivariate, partially nonparametric Cox models for TTF and OS included menopausal status, disease-free interval, ER and PR levels, HER1 status, and HER2 status. HER2-positive status was not a statistically significant predictor of either TTF or overall survival. HER1 status, premenopausal status, and disease-free interval before recurrence were statistically significant predictors of TTF, while ER and PR levels and disease-free interval prior to recurrence were significant predictors of OS. The hazard ratios for HER2-positive versus HER2-negative subgroups were 1.15 (p=.54) for TTF and 0.99 (p=.97) for OS. Therefore, after controlling for other factors, this study provided no evidence of a difference in outcomes after treatment with tamoxifen between HER2-positive and HER2-negative patients.

Single-arm study, univariate analysis. No studies of this type were identified.

Conclusions, Key Question 3b

The evidence on use of HER2 status to predict outcomes of hormonal therapy is weak and inconclusive. Four studies reviewed here addressed use of tamoxifen in different breast cancer patient populations; two compared tamoxifen with aromatase inhibitors. Evidence is lacking from the most informative types of studies, trials in which randomization is stratified by HER2 status or randomization to therapy directed by HER2 results or not. Less-informative designs were used, including post-hoc multivariate analyses in five randomized trials and one post-hoc multivariate analysis in a single-arm study. In comparing tamoxifen with aromatase inhibitors in a secondary analysis of randomized, controlled trial results, the most persuasive finding would be a significant interaction term between HER2 status and treatment group, after controlling for other important prognostic factors.

In the two comparison studies included, one had an insignificant interaction term (suggesting that there is no differential in the impact of the two treatments based on a patient's HER2 status), and the other did not report an interaction term although they included a qualitative statement that there was no evidence that one treatment was more effective than the other in HER2 positive patients. Some results suggest that tamoxifen may be more effective among HER2-negative patients, but a conclusion is undermined by the paucity of studies and inconsistent findings. Importantly, data demonstrating a difference in magnitude of benefit by HER2 status would not by themselves be sufficient to conclude there is no benefit in HER2-positive patients also positive for hormone receptors. Studying the differential impact of hormonal therapy by HER2 status is hindered by the inverse relationship between HER2 status and hormone receptor status, which leads to relatively small numbers of HR-positive and HER2-positive patients on which to base the results.

Key Question 4

What is the evidence that monitoring serum or plasma concentrations of HER2 extracellular domain in patients with HER2-positive breast cancer predicts response to therapy, or detects tumor progression or recurrence, and if so, what is the evidence that decisions based on serum or plasma HER2 assay results improve patient management and outcomes?

Study Selection

Studies were included for Key Question 4 if they were:

  • randomized trials, prospective single-arm studies, or retrospective series of identically treated patients; that

  • measure serum or plasma HER2 concentrations in breast cancer patients, either at baseline or at multiple time points; and either:

    • a

      associate baseline values or changes in HER2 concentration with one or more outcomes of interest (primary or secondary); or

    • b

      compare outcomes of treatment decisions based on assay results with outcomes of decisions made in absence of assay results.

Table IV-A

Design, Enrollment and Treatment
StudyDesignTherapeutic Settingn, Enrolled (Randomized)n, Evaluatedn, Withdrawn (Lost to F/U)Treatment Regimen (Agents)
Gasparini et al. 2007, Italy, multicenter; 12/00 – 09/04PII RCTuntreated MBC, t-IHC 2+/3+ (1st-line metastatic disease)124 enrolled (61 grp 1, 63 grp 2); allocation concealment: A123 for efficacy and toxicity, 118 for ORR1 for efficacy and toxicity, 6 for ORRGrp 1: paclitaxel; Grp 2: paclitaxel + trastuzumab
Im et al. 2005, Korea, multicenterPII, single-armMBC no previous CHT for metastatic disease (1st-line)4039 for toxicity, 38 for response1 for toxicity, 2 for response (refused further tx)Epirubicin + docetaxel
Fornier et al. 2005, USA, 1 centerRET analysis of PIIMBC, HER2 overexpressing and non-overexpressing55 of 95 in trial who had 1° tumor tested for tHER255Paclitaxel + trastuzumab
Muller et al. 2004, Germany, multicenterRCT1st-line tx for MBC103 of 597 in trial1012Grp1: epirubicin + paclitaxel (ET, n=47, 62% sHER2+);Grp2: epirubicin + cyclophosphamide (EC, 54, 65% n=sHER2+)
Luftner et al. 2004, Germany, 1 centerPIIstage IV BC, 1 or 2 prev CHT (1 anthracycline-based)3535Dose-intensified paclitaxel (1st-line 6%, 2nd-line 60%, 3rd-line 34%)
Sandri et al. 2004, Italy, 1 centerClinical trialstage IV BC, ≥ 1 prev CHT for met dis (2nd-line+)643925Cyclophosphamide + methotrexate
Colomer et al. 2004, Spain, 7 centersPIIprogressive advanced BC, no 1° tx for mets (1st-line)4343 for toxicity 42 for efficacy1Paclitaxel + gemcitabine
Burstein et al. 2003, US, 17 centersPIIstage IV BC, IHC HER2 3+ or FISH+, no prev CHT for met dis (1st-line)5554 (43 had sHER2 values at baseline and after 1 tx cycle)1 (did not receive protocol-based tx)Trastuzumab + vinorelbine
Lipton et al. 2003, mulitnational, multicenterRCTpostmenopausal locally advanced, (stage IIIB) loco-regionally recurrent BC, MBC, ER+/PR? and/or PR+/ER? (1st-line)562 of 907 allocation concealment: B562Grp1: letrozole (n=283) Grp2: tamoxifen (n=279)
Esteva et al. 2002, US, 1 centerPRO CSMBC overexpressing tHER2, w/ or w/o previous tx for met dis, but no prior trastuzumab3030Trastuzumab + docataxel
Colomer et al. 2000, Spain, 1 centerPRO CSMBC, no previous CHT for met dis (1st-line)77553Doxorubicin+ paclitaxel
Colomer et al. 2006, Spain, 6 centersPIIadvanced BC (1st-line)5247IV vinorelbine+ IV gemcitabine
Yamauchi et al. 1997, US, ? centersRCTMBC (1st-line)94 of 369943 doses of droloxifene

Table IV-K

Case Series/Single Arm Trial Study Quality Ratings
StudyClearly Defined QuestionWell-Described Study PopulationWell-Described InterventionUse of Validated Outcome Measures (Independently Assessed)Appropriate Statistical AnalysisWell-Described ResultsDiscussion/Conclusions Supported by DataFunding/Sponsorship Source Acknowledged
Im et al. 2005++++ (NA/-)++++
Fornier et al. 2005++++ (-)++++
Luftner et al. 2004++++ (-)+- (no AEs)++
Sandri et al. 2004++++ (-)+- (no AEs)++
Colomer et al. 2004++++ (-)+++-
Burstein et al. 2003++++ (-)++++
Esteva et al. 2002++++ (-)++++
Colomer et al. 2000+-++ (+)+- (no AEs)++
Colomer et al. 2006++++ (-)++++
Of 15 studies meeting selection criteria, five were randomized trials and 10 single-arm designs. One of the randomized trials compared three different doses of a single selective estrogen receptor modulator, droloxifene (Yamauchi, O'Neill, Gelman, et al., 1997). Since the range of doses assessed in the trial do not produce different results, the data pooled across dosing groups will be treated as a single-arm design, therefore, four randomized trials and 11 single-arm designs are presented in separate summary tables; detailed abstraction data can be found in Appendix Tables IV-AIV-K *. All but one study meeting study selection criteria addressed subgroup analyses of baseline sHER2 measurements to predict outcomes after treatment. The study reported by Fornier, Seidman, Schwartz, et al. (2005) was the only one that focused on changes in serial measurements. No studies meeting selection criteria addressed whether serial sHER2 measurements confer lead time compared with other monitoring techniques.

Patient Characteristics

Table 22

Randomized trials, design, treatment, patient characteristics, KQ4
StudyTherapeutic SettingTreatments ComparedAge Mean, rangeNumber of Disease Sites (%)Performance Status
12≥3ScaleIndexResultE+&P+E+/P+
Cameron et al., 2008, multicenter international tHER2+ LABC/MBC, 2nd-lineGrp 1: capecitabine (n=201)51, 28–83223048ECOG%059
%141
Grp 2: lapitinib + capecitabine (n=198)54, 26–80203149%062
%138
Gasparini et al., 2007, Italy, multicenter; 12/00 - 09/04 Phase II RCT First-line, untreated MBC, t-IHC 2+/3+Grp 1: paclitaxel (n=61)54.27, 30–71334027ECOG% 0, 1–2:82, 183727
Grp 2: paclitaxel + trastuzumab (n=63)56.02, 32–7240332781, 193710
Muller et al., 2004, Germany, multicenter RCT First-line tx for MBCGrp1: epirubicin + paclitaxel (ET, n=54, 65% sHER2+);Grp1+Grp2: 48, 31–6361 (E+)
Grp2:epirubicin+cyclophosphamide (EC, n=47, 62% sHER2+)
Lipton et al., 2003, multinational, multicenter RCTFirst-line, postmenopausal locally advanced (stage IIIB), loco-regionally recurrent BC, MBC, ER+/PR? and/or PR+/ER?Grp1: letrozole (n=283, 31% sHER2+)65, 42–94533710KPSmd903828
rng50–100
Grp2: tamoxifen (n=279, 28% sHER2+)63, 31–90553411md904027
rng50–100

Abbreviations: BC: breast cancer; E+&P+: estrogen and progesterone receptor positive; E+/P+: estrogen and/or progesterone receptor positive; ECOG: Eastern Cooperative Oncology Group; ER: estrogen receptor; Grp: group; KPS: Karnofsky performance score; LABC: locally advanced breast cancer; MBC: metastatic breast cancer; md: median; PR: progesterone receptor; RCT: randomized, controlled trial; s: serum; rng: range; t: tissue; tx: treatment;

Randomized trials. Two of the four trials (Table 22) selected patients with metastatic breast cancer undergoing first-line systemic therapy. The comparisons in these two trials were paclitaxel with or without trastuzumab, and epirubicin with either paclitaxel or cyclophosphamide. The third trial included postmenopausal patients with locally advanced (stage IIIB), locoregionally recurrent or metastatic breast cancer randomized to either letrozole or tamoxifen. The fourth trial selected patients with locally advanced or metastatic breast cancer given capecitabine with or without lapatinib as second-line treatment after progression following treatment with an anthracycline, a taxane and trastuzumab. A total of 1,153 patients were included in these trials, with individual samples sizes ranging from 101 to 562.

Two of the randomized trials selected patients for being positive on tissue (t) HER2 testing. Gasparini, Gion, Mariani, et al. (2007) selected patients with 2+ or 3+ scores on the IHC HercepTest™. Cameron, Casey, Press, et al. (2008) included patients who were 3+ on IHC or 2+ with a positive FISH result. Muller, Witzel, Luck, et al. (2004) performed tissue testing on only 29 of 103 patients and only nine patients had 3+ results by Dako-style scoring of an IHC assay using the CB11 mAb. No tHER2 results were reported for Lipton, Ali, Leitzel, et al. (2003).

Patient characteristics were reported in various ways. Only age was reported by all four studies. Baseline data in the two treatment groups in the Muller, Witzel, Luck, et al. (2004) trial were combined; median age was 48 years. In the Gasparini, Gion, Mariani, et al. (2007) and Cameron, Casey, Press, et al. (2008) trials, median ages by treatment group were in the low and mid-50s and in the Lipton, Ali, Leitzel, et al. (2003) study median ages were in the mid-60s.

The proportion of patients with three or more disease sites was 27 percent in the Gasparini, Gion, Mariani, et al. (2007) study, 49 percent in the Cameron, Casey, Press, et al. (2008) trial and 10 percent and 11 percent of the two treatment groups studied by Lipton, Ali, Leitzel, et al. (2003).

Gasparini, Gion, Mariani, et al. (2007) used the ECOG performance status scale, finding that 82 percent and 81 percent had the highest level (0). Cameron, Casey, Press, et al. (2008) reported that 62 percent and 59 percent were at ECOG level 0. Median Karnofsky Performance Scale values were 90 in both groups included by Lipton, Ali, Leitzel, et al. (2003).

In the study by Gasparini and co-workers, 37 percent were both estrogen and progesterone-receptor positive, while the proportions for the twp groups from Lipton and co-workers' study was 38 percent and 40 percent, respectively. Muller, Witzel, Luck, et al. (2004) only noted that 61 percent were estrogen-receptor positive. Cameron, Casey, Press, et al. (2008) reported the proportions of patients in the two groups who were either positive on one or both receptors: 48 percent and 46 percent.

Table 23

Single-arm studies, design, enrollment and treatment, KQ4
StudyTherapeutic SettingTreatments ComparedAge Mean, rangeNumber of Disease Sites (%)Performance Status
12≥3ScaleIndexResultER+&PR+
Im et al., 2005, Korea, multicenter MBC (1st-line)Epirubicin+paclitaxel (n=40, 14.8% sHER2+)49, 35–70443126ECOG%021
%154
%226
Colomer et al., 2000, Spain, 1 centerMBC, no previous CHT for met dz (1st-line)Doxorubicin+ paclitaxel (n=55, 43.6% sHER2+)35% premenopausal554567 (ER+)
Fornier et al., 2005, USA, 1 center MBC, tHER2 +/-Paclitaxel+trastuzumab (n=55 of 95, 69% sHER2+)51, 33–67med 2, rng 1–4KPSmn90
rng70–100
Esteva et al., 2002, US, 1 center MBC tHER2+, +/- previous tx for met dzTrastuzumab+docataxel (n=30, 70% sHER2+)45, 33–78164042KPS%9063
%8020
%7016
Colomer et al., 2004, Spain, 7 centersprogressive advanced BC (1st-line)Paclitaxel+gemcitabine (n=42, 29.3% sHER2+)53, 29–72med 3, rng 1–6 49 (ER+)
Luftner et al., 2004, Germany, 1 center stage IV BC, 1 or 2 previous CHTDose-intensified paclitaxel (n=35; 1st-line 6%, 2nd-line 60%, 3rd-line 34%, 63% sHER2+)48, 31–632631431734
(# involved organs)
Burstein et al., 2003, US, 17 centers stage IV BC, tHER2+Trastuzumab+vinorelbine (n=43)55, 29–82md 3,rng 1–6ECOG%0703718
%128
%32
Colomer et al., 2007MBC (2nd-line)Letrozole (n=226, 25% sHER2+)~63/64363133ECOG%05162
%1–249
Yamauchi et al., 1997, US, ? centers MBC (1st-line)3 doses of droloxifene (n=94 of 369, 34% sHER2+)47% < 6445321855 (ER+)
53% ≥ 6434 (PR+)
Sandri et al., 2004, Italy, 1 centerstage IV BC, ≥ 1 prev CHT for met dz (2nd-line+)Cyclophosphamide + methotrexate (n=39)56, 36–81263936
Colomer et al., 2006, Spain, 6 centersadvanced BC (1st-line)IV vinorelbine+ IV gemcitabine (n=47, 29.8% sHER2+)64, 34–81med 2, rng 1–4ECOG%04167 (ER+)
%147
%212

Abbreviations: BC: breast cancer; CHT: chemohormonal therapy; dz: disease; ECOG: Eastern Cooperative Oncology Group; ER+: estrogen-receptor positive; IV: intravenous; MBC: metastatic breast cancer; med: median; met: metastatic; PR: progesterone-receptor positive; rng: range;

Single-Arm Designs. All 11 studies selected patients with metastatic breast cancer (Summary Table 23). The total number of patients across studies is 706; individual sample sizes ranged from 35 to 94. Treatments were first-line systemic therapy in six studies, second-line in one study, second- or third-line in one study, second-line or higher in one study and a mix of first- and second-line or higher in two studies. Regimens in six studies were taxane-based (two with anthracyclines, two with trastuzumab); one study combined trastuzumab with vinorelbine, one study used the aromatase inhibitor letrozole, one study used the selective estrogen receptor modulator droloxifene, and three studies used other chemotherapy regimens.

Two studies selected patients who were tHER2 3+ on IHC or positive on FISH. Five studies included mixed patient populations that were positive and negative on HER2 tissue testing (Colomer, Llombart-Cussac, Lloveras, et al., 2007; Colomer, Montero, Lluch, et al., 2000; Im, Kim, Lee, et al., 2005; Fornier, Seidman, Schwartz, et al., 2005; Sandri, Johansson, Colleoni, et al., 2004). The remaining four studies did not provide data on tissue HER2 testing (Yamauchi, O'Neill, Gelman, et al., 1997; Colomer, Llombart-Cussac, Lluch, et al., 2004; Luftner, Henschke, Flath, et al., 2004; Colomer, Llombart-Cussac, Tusquets, et al., 2006).

Regarding age, one study had a median age of 48 years, another had a median of 49 years. One study had 53 percent at age 64 or older, another had a median age of 64 years and a third had mean ages in sHER2 positive and negative groups of 63 and 64 years. The other 6 studies had median ages in the 50s.

Nine studies gave the distribution of patients by number of disease sites and one study gave the number of involved organs (43 percent had three or more involved organs). In seven studies, the percentage of patients with three or more disease sites ranged from 18 percent to 43 percent; in another study all patients had two or fewer disease sites. Four studies provided average number of disease sites: the medians were two in two studies and three in two studies.

Four studies provided ECOG performance status data: the percentages in categories 0 or 1 (better performance status) were 75, 98, 98, and 88 percent. Two studies used the Karnofsky Performance Scale: in one study the mean value was 90 percent and in the other 83 percent were at 80 percent or 90 percent on the scale.

Seven studies gave baseline information on hormone receptor status, 4 of which reported the proportion of patients those estrogen positive, ranging from 49 percent to 67.3 percent. One study gave the proportion progesterone positive (34 percent). Two studies gave percentages of different combinations of hormone receptor status: the proportions who were both estrogen and progesterone positive were 17 percent and 37 percent; the proportions who were either estrogen or progesterone positive were 34 percent or 18 percent.

Evidence Hierarchy and Quality Assessment

Table 24

Hierarchy of evidence, KQ4
Level of EvidenceStudynSettingTreatmentsOutcomeResults
RCT stratified on HER2 status/HER2-guided vs. non-HER2-guided
RCT prespecified MV SGA Gasparini 2007123MBC 1st, t+paclit vs. paclit+trastuzTTPCox regression sHER2 by treatment interaction p=.0538
ORRlogistic regression sHER2 by treatment interaction p=.6044
RCT post-hoc MV SGA
RCT treatment by HER2 SGACameron 2008367LABC/MBC 2nd, t+capecit (Cp)+/- lapatinib (Lp)PFScont sHER2/highest vs. other quartiles Cp p<.001, Cp+Lp0.12 Cp vs. Cp+Lp↑ highest quartile sHER2+ p<.001, other quartiles p=.002
Muller 2004101MBC 1st, t+/-epirub+paclit (ET) vs. epirub+cycloph (EC)OSET sHER2+↓ vs. - p=.092, EC sHER2+ vs. - p=NS
PFSsHER2- EC vs. ET p=NS, sHER2+ EC↓ vs. ET p=.0341
ORRET sHER2+ vs. - p=NS, EC sHER2+↓ vs. - p=.059
Lipton 2003562locally advanced, recurrent, MBC 1st, t?letrozole (LET) vs. tamoxifen (TAM)TTPsHER2+ LET↑ vs. TAM p=.0596, sHER2- LET↑ vs. TAM p=.0019
TTFsHER2+ LET↑ vs. TAM, p=.0418, sHER2- LET↑ vs. TAM p=.0066
ORRsHER2+ LET vs. TAM, p=.4507, sHER2- LET↑ vs. TAM p=.0078
CBsHER2+ LET vs. TAM, p=.3057, sHER2- LET↑ vs. TAM p=.0162
1-arm prespecified MV analysis Colomer 2007226MBC 2ndletrozole (LET)ORRunivariate sHER2+ ↓ vs. sHER2- p=.036
TTPunivariate sHER2+ ↓ vs. sHER2- p=.004 Cox regression sHER2+ ↓ vs. sHER2- p<.001
OSunivariate sHER2+ ↓ vs. sHER2- p<.0005
Colomer 200055MBC 1st, t+/-doxorub+paclitRDunivariate sHER2+↓ vs. sHER2- p=.035
RDCox regression sHER2+↓ vs. sHER2- p=.04
ORRunivariate sHER2+↓ vs. sHER2- p=.01
ORRlogistic regression sHER2↓ + vs. sHER2- p=.03
1-arm post-hoc MV analysis Yamauchi 199794MBC 1st, t?3 doses droloxifTTPCox regression sHER2+↓ vs. sHER2- p=.0003
OSCox regression sHER2+↓ vs. sHER2- p=.003
ORRunivariate sHER2+↓ vs. sHER2- p=.00001
ORRlogistic regression sHER2+↓ vs. sHER2- p=.0001
1-arm UV analysisIm 200538MBC 1st, t+/-epirub+paclitRDsHER2+↓ vs. sHER2- p=<0.001
TTPsHER2+↓ vs. sHER2- p=<0.001
OSsHER2+↓ vs. sHER2- p=<0.076
RespsHER2+ vs. sHER2- p=0.45
Fornier 200555MBC, t+/-paclit+trastuzORRsHER2+ vs. sHER2- p=1.0, sHER2 Δ<15 vs. Δ≥15 p=0.005
ORRsHER2 ≥15% vs. < 15% p=0.015
Esteva 200230MBC 2nd+, t+trastuz+docetORRsHER2+↑ vs. sHER2- p=0.04
Colomer 200442MBC 1st, t?paclit+gemcitabRDsHER2+↓ vs. sHER2- p=0.04
RespsHER2+↓ vs. sHER2- p=0.02
Luftner 200435MBC 2nd+, t?dose intense paclitRDsHER2+↓ vs. sHER2- p=0.042
PFSsHER2+↓ vs. sHER2- p=0.098
ORRsHER2+ vs. sHER2- p=0.40
Sandri 200439MBC 2nd+, t+/-cycloph+methotrexTTPsHER2+↓ vs. sHER2- p=0.007
OSsHER2+↓ vs. sHER2- p=<0.001
Burstein 200343MBC, t+trastuz+vinorelbProgrno ↓ in sHER2 predicted progression; baseline, Δ did not predict
Colomer 200647MBC 1st, t?IVvinorelb+IVgemcitORRsHER2+ vs. sHER2- p=0.9

Abbreviations: cycloph: cyclophosphamide; DFS: disease-free survival; droloxif: droloxifene; epirub: epirubicin; gemcit: gemcitabine; HR: hazard ratio; MV: multivariate; ORR: overall response rate; OS: overall survival; paclit: paclitaxel; pCR: pathologic complete response; PFS: progression-free survival; RCT: randomized, controlled trial; RD: residual disease; RFS: recurrence-free survival; SGA: subgroup analysis; TETR: time to early tumor recurrence; trastuz: trastuzumab; TTF: time to treatment failure; TTR: time to tumor recurrence; Tx: treatment; UV: univariate analysis; vinorelb: vinorelbine;

Table 25

Study quality assessment, KQ4
StudyProspective designPrespecified hypotheses about relation of marker to outcomeLarge, well-defined, representative study populationMarker assay methods well-describedBlinded assessment of marker in relation to outcomeHomogeneous treatment(s), either randomized or rule-based selectionLow rate of missing data (≤ 15%)Sufficiently long follow-upWell-described, well-conducted multivariate analysis of outcome: 1) clear candidate variable selection, 2) clear, appropriate model-building guidelines, 3) assumptions tested, 4) standard prognostic variables included, 5) continuous variables well handled, 6) validation
1)2)3)4)5)6)
Cameron et al., 2008YNYN?YY≥ 6 wktreatment × HER2 SGA
Gasparini et al., 2007YYNY?YYmed: 16.6 mos?????N
Muller et al., 2004YNNY?YNmed 8.9 mo (0.5–36)treatment × HER2 SGA
Lipton et al., 2003YNNYYYN3 mostreatment × HER2 SGA
Colomer et al., 2000YYNY?YNmed 23 mos?????N
Yamauchi et al., 1997YNNY?YN??N???N
Colomer et al., 2007YYYY?YY≥ 4 wk?????N
Im et al., 2005YYNY?YYmed 22.5 mosNA
Fornier et al., 2005YNNY?YN≥ 4 wkNA
Esteva et al., 2002YYNYYYY≥ 8 wkNA
Colomer et al., 2004YYNY?YY26 mosNA
Luftner et al., 2004YYNY?YY≥ 4 wkNA
Burstein et al., 2003YYNY?YY8 wkNA
Sandri et al., 2004YNNY?YN2 moNA
Colomer et al., 2006YYNY?YYmed 79 moNA

Abbreviations: mos: months; NA: not applicable; SGA: subgroup analysis; wks: weeks;

No studies conducted stratified randomization on sHER2 status or randomized patients to whether sHER2 guided treatment (Tables 24 and 25) and only one performed prespecified subgroup analyses (Gasparini, Gion, Mariani, et al., 2007). Three randomized trials reported results from post-hoc treatment by sHER2 subgroup analyses (Cameron, Casey, Press, et al. 2008; Muller, Witzel, Luck, et al., 2004; Lipton, Ali, Leitzel, et al., 2003). Two single-arm studies included multivariate analyses (Colomer, Montero, Lluch, et al., 2000; Yamauchi, O'Neill, Gelman, et al., 1997). Overall, the bulk of studies (7 of 13) belonged to lowest category of the hierarchy.

Results by Hierarchy Level

Table 26

Randomized trials, summary time to event outcomes, KQ4
StudyTime to Event Outcomes
Cameron et al., 2008 capecitabine (Cp) +/- lapatinib (Lp) OutcomeGrpNMed (mos)6 mos1 yrTestpHR (95%CI)
PFSsHER2+,CpCox<.001sHER2 as continuous variable
sHER2-,Cp
sHER2+,CpLpCox.12
sHER2-,CpLp
sHER2+,Cp2.6Cox<.0012.3 (1.5, 3.6) highest sHER2 quartile vs. other quartiles
sHER2-,Cp4.8
sHER2+,CpLp6.0Cox.121.5 (0.9, 2.4)
sHER2-,CpLp6.7
sHER2+,Cp~3~17Cox<.0010.320 (0.181, 0.567) highest sHER2 quartile
sHER2+,CpLp~6~50
sHER2-,Cp~20~43~15Cox.0020.561 (0.389, 0.81) other quartiles
sHER2-,CpLp~30~52~28
Gasparini et al., 2007 paclitaxel vs. paclitaxel + trastuzumab OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments
TTPCox regression p value for sHER2 by treatment interaction: .0538
Muller et al., 2004 epirubicin + paclitaxel vs. epirubicin+ cyclophosphamide (ET vs. EC) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments
OSsHER2+/EC19~8.4~50~15~0LR.092
sHER2-/EC35~22~77~40~15
sHER2+/ET18~16~60~10~0LRNS
sHER2-/ET29~14~65~10~0
PFSsHER2-/EC35~7~30~0~0LRNS
sHER2-/ET29~9~21~0~0
sHER2+/EC19~12~21~0~0LR.0341
sHER2+/ET18~9~28~0~0
Lipton et al., 2003 letrozole vs. tamoxifenOutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)Comments
TTPsHER2+164
letrozole876.1~28~7~6~4Cox.05960.73 (0.53,1.01)
tamoxifen773.3~17~5~3
sHER2-398
letrozole19612.2~53~29~20~14Cox.00190.70 (0.56,0.88)
tamoxifen2028.5~38~20~10~8
TTFsHER2+164
letrozole876.0Cox.0418
tamoxifen773.2
sHER2-398
letrozole19611.6Cox.0066
tamoxifen2026.2

Abbreviations: Cox: Cox proportional hazards; HR: hazard ratio; LR: log rank; med: median; mos: months; TTF: time to treatment failure; TTP: time to progression; yr: years;

Table 27

Randomized trials, summary tumor response, KQ4
StudyTumor Response (%)
Cameron et al., 2008 capecitabine (Cp) +/- lapatinib (Lp)Not reported
Gasparini et al., 2007 paclitaxel vs. paclitaxel + trastuzumab GrpNCRPRSDPDTestpComments
WHO criteria logistic regression p value for sHER2 by treatment interaction: 0.6044
Muller et al., 2004 epirubicin + paclitaxel vs. epirubicin+cyclophosphamide (ET vs. EC) GrpNCR+PRSDPDTestpComments
sHER2+/ET1850.033.316.7Chi sqNSUICC criteria
sHER2-/ET2646.238.515.4
sHER2+/EC1729.435.335.3Chi sq.059
sHER2-/EC3141.935.522.6
Lipton et al., 2003 letrozole vs. tamoxifenGrpNCR+PRSD+PDTestpComments
sHER2+164UICC criteria
letrozole1783log regr.4507
tamoxifen1387
sHER2-398
letrozole3961log regr.0078
tamoxifen2674
GrpNCR+PR +SDPDTestpComments
sHER2+164UICC criteria
letrozole3367log regr.3057
tamoxifen2674
sHER2-398
letrozole5743log regr.0162
tamoxifen4555

Abbreviations: Chi sq: Chi square; CR: complete response; Grp: group; log reg: logistic regression; NS: not significant; PR: partial response; UICC: International Union against Cancer; WHO: World Health Organization;

Table 28

Single-arm studies, summary time to event outcomes, KQ4
StudyTime to Event Outcomes
Im et al., 2005 epirubicin+paclitaxel (n=40) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
TTPsHER2+42.8LR<.001
sHER2-198.3
RDsHER2+31.50LR<.001
sHER2-136.7~43~33
OSsHER2+412.4~50~26LR.076
sHER2-23not reached~72~56
Colomer et al., 2000 doxorubicin+ paclitaxel (n=55) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
Resp DursHER2+157.5~26LR.035
sHER2-2411~50~35MV Cox.04
Colomer et al., 2004 paclitaxel+gemcitabine (n=42) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
Resp DursHER2+57.9~40~0?.04
sHER2-2414.4~55~37
Luftner et al., 2004 dose-intensified paclitaxel (n=35) OutcomeGrpNMn (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
Resp DursHER2+96.0~0LR.042
sHER2-52~60
PFSsHER2+223~3LR.098
sHER2-134~10
Colomer et al., 2007 Letrozole (n=226) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
TTPsHER2+424~36~12~7LR.004
sHER2-18414~57~34~12
CPH<.001
OSsHER2+42~22~8244LR<0.0005
sHER2-184~9175~63
Yamauchi et al., 1997 3 doses of droloxifene (n=94 of 369) OutcomeGrpNMed (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
TTPsHER2+32~3~13~13MV Cox0.00030.36 (0.21, 0.63) (adjusted)
sHER2-62~8~43~28
OSsHER2+32~28~74~54MV Cox0.0030.35 (0.17, 0.70) (adjusted)
sHER2-62~92~63
Sandri et al., 2004 cyclophosphamide+methotrexate (n=39)OutcomeGrpNMn (mos)1 yr2 yr3 yr4 yr5 yrTestpHR (95%CI)
TTPsHER2+?2~0~0~0LR0.007
sHER2-?8~34~12~7
OSsHER2+?11~47~0~0LR<0.001
sHER2-?16~84~49~42

Table 29

Single-arm studies, summary tumor response, KQ4
StudyTumor Response (%)
Im et al., 2005 epirubicin+paclitaxel (n=40) GrpNCRPRSDPDTestpComments
sHER2+407525Chi sq0.45WHO criteria
sHER2-2313.043.426.117.4
Colomer et al., 2000 doxorubicin+paclitaxel (n=55) GrpNCRPRNo responseTestpComments
sHER2+2406237Chi sq0.021WHO criteria
sHER2-31265223MV logistic regression for ORR
IHC+1195536Chi sq0.219sHER2 p value: 0.03
IHC-28186418
Fornier et al., 2005 paclitaxel+trastuzumab (n=55) GrpNResponseNo responseTestpComments
sHER2+385050FE1.0Response= CR+PR criteria described
sHER2-174743
Δ<15256832FE0.005
Δ≥15131585
Δ≥55%256832FE0.015OR 4.25, 95% CI: 1.37–13.19
Δ<55%303367
Esteva et al., 2002 trastuzumab+docataxel (n=30) GrpNCR+PRSD+PDTestpComments
sHER2+217624FE0.04ECOG criteria
sHER2-93367
IHC 3+196337FE0.99
IHC 0–2+56040
FISH+246733FE0.60
FISH-45050
Colomer et al., 2004 paclitaxel+gemcitabine (n=42) GrpNResponseNo responseTestpComments
sHER2+154258FE0.02WHO criteria
sHER2-268317
Luftner et al., 2004 dose-intensified paclitaxel (n=35) GrpNCR+PRSDPDTestpComments
sHER2+2240.936.422.7MH0.40mean duration 25.7 wks
sHER2-1338.530.830.8mean duration 65.2 wks (p=0.042)
internationally accepted criteria (referenced)
Burstein et al., 2003 trastuzumab+ vinorelbine (n=43) GrpNNo progressionProgressionComments
sHER2+?AU ROC=0.8947, baseline or Δ in sHER2 do not predict response, but no ↓ in sHER2 predicted progression
sHER2-?
RECIST criteria
Colomer et al., 2007 letrozole (n=226) GrpNCR+PRNo responseTestpComments
sHER2+421486.036
sHER2-1843170
Yamauchi et al., 1997 3 doses of droloxifene (n=94 of 369) GrpNResponseNo responseTestpComments
sHER2+32991FE.00001criteria?
sHER2-625644MV logistic regression for response sHER2 p value: .0001
Colomer et al., 2006 IV vinorelbine+ IV gemcitabine (n=47)GrpNCR+PRNo responseTestpComments
sHER2+145050?.9WHO criteria
sHER2-3348.551.5

Abbreviations: AU: area under; Chi sq: Chi square; CR: complete response; FE: fixed effects; Grp: group; MV: multivariate; ORR: overall response rate; PR: partial response; RECIST: Response Evaluation Criteria in Solid Tumors; ROC: receiver operating characteristic; s: serum; t: tissue; WHO: World Health Organization;

Multivariate analysis was performed in only three studies: one randomized trial (Gasparini, Gion, Mariani, et al., 2007) and two single-arm designs (Colomer, Montero, Lluch, et al., 2000; Yamauchi, O'Neill, Gelman, et al., 1997). Summary study descriptions and results are arrayed in Tables 2629.

Randomization stratified on HER2/randomized to whether treatment was guided by HER2. No studies of this type were identified.

Randomized trial, prespecified multivariate subgroup analysis. The only trial that performed a prespecified multivariate subgroup analyses was Gasparini, Gion, Mariani, et al. (2007, n=123 patients given first-line treatment by paclitaxel with or without trastuzumab for metastatic breast cancer). One quality concern was uncertainty over whether sHER2 results were scored blindly to outcome. Also, this study addressed 11 predictor variables plus treatment interaction terms in logistic and Cox regression analyses, however there appeared to be too few events in terms of response and progression to support models with so many variables. Thus, the study was not large enough for the type of modeling used. Overall, it is unclear whether the multivariate analysis was well-conducted. It is unclear how candidate variables were selected, what model-building strategy was used, whether assumptions were tested, whether the standard metastatic breast cancer prognostic factors were included in final models, how continuous variables were categorized; also, the model did not appear to go through validation.

For time-to-progression, the Cox regression treatment by sHER2 interaction was nearly statistically significant (p=0.0538). Among patients with elevated sHER2 values, results significantly favored paclitaxel plus trastuzumab, while in those with normal sHER2, results nonsignificantly favored paclitaxel alone. Logistic regression analysis of overall response rate showed no significant treatment by sHER2 interaction (p=.6044); in both groups, combination treatment was favored, but not significantly.

Randomized trial, post-hoc multivariate subgroup analysis. No studies of this type were identified.

Randomized trial, treatment by HER2 subgroup analysis. Among three randomized trials that described treatment by sHER2 subgroup analyses, Muller, Witzel, Luck, et al. (2004) reported on a subset of 101 patients with serum available, out of 597 patients (17 percent) randomized to epirubicin plus either paclitaxel (ET) or cyclophosphamide (EC). This study was a retrospective analysis of previously reported randomized trial. These authors found within the ET group a trend for worse overall survival for sHER2 positive patients (p=.092), but no significant difference between sHER2 groups receiving EC. Regarding progression-free survival, outcomes for the two treatments did not differ among the sHER2 negative, but results were significantly worse for EC among those sHER2 positive. For overall response rate, sHER2 groups did not differ among those receiving ET, but those getting EC had worse results when sHER2 was positive. No test for treatment by sHER2 interaction was reported.

These results should be viewed cautiously because the analyzed subset comprised less than 20 percent of those originally randomized and multivariate analysis was not used to adjust for any imbalances between treatments by sHER2 subgroups. Additionally, it is unclear sHER2 results were scored blindly with respect to outcome.

Lipton, Ali, Leitzel, et al. (2003) addressed 562 postmenopausal women given either letrozole or tamoxifen as first-line therapy for advanced breast cancer. This retrospective analysis included 62 percent of all patients randomized in the trial; however, this is the only randomized trial that used blinded assessment of sHER2 in relation to outcome. Results were better in terms of time-to-progression and time to treatment failure for those receiving letrozole, regardless of sHER2 status. For overall response rate and rate of clinical benefit (overall response plus stable disease), letrozole was significantly better than tamoxifen for sHER2 negative patients, but not for those sHER2 positive. No tests of treatment by sHER2 interaction were reported.

Cameron, Casey, Press, et al. (2008) randomized 399 patients with tissue HER2 positive locally advanced or metastatic breast cancer to receive capecitabine with or without lapatinib. Exploratory analyses of the relation between sHER2 status and progression-free survival were conducted in 92 percent of those randomized. When sHER2 was divided into the highest quartile versus other quartiles, both sHER2 subgroups had significantly better progression-free survival when treated with capecitabine plus lapatinib compared to capecitabine alone. This study did not describe the sHER2 assay methods clearly, did not report that sHER2 was scored blind to outcome and used an uncommon threshold for sHER2 positivity. No test of treatment by sHER2 status was reported.

Randomized trial results summary. The methodologic quality of these randomized trials is generally poor. Only one randomized trial was conducted with a prespecified plan to assess the relation of sHER2 to outcome. The same trial was the only one that conducted multivariate analyses, however it appeared to have too few events to support the large number of predictor and interaction terms used and the modeling techniques were overall poorly described. The other three trials performed retrospective treatment by sHER2 subgroup analyses of 17 percent, 62 percent and 93 percent of patients originally enrolled. Only one study used blinded assessment of sHER2 in relation to outcome.

These four randomized trials each addressed a different comparison of treatments. The only study that tested treatment by sHER2 status interactions found them to be nonsignificant for TTP and ORR in a comparison of paclitaxel with and without trastuzumab. A comparison of epirubicin either with paclitaxel or cyclophosphamide did not consistently find sHER2 to be related to different treatment outcomes (OS, PFS, ORR). A trial comparing letrozole and tamoxifen found sHER2 to be a more consistent predictor of treatment outcome for TTP and TTF, less so for ORR and clinical benefit. A trial of capecitabine with or without lapatinib found better PFS for those receiving combination treatment for both those in the highest quartile and lower quartiles of sHER2 values. Only the Gasparini, Gion, Mariani, et al. (2007) trial, which analyzed nearly all patients randomized, used multivariate methods, while the other two trials used univariate analyses of much smaller subsets of those randomized.

Single-arm study, multivariate analysis. Among three single-arm studies that conducted multivariate analysis, Colomer, Llombart-Cussac, Lloveras, et al. (2007) included 226 patients with metastatic breast cancer who received letrozole. The authors prespecified their interest in assessing the relation between sHER2 status and treatment outcomes; however they provided inadequate detail in describing Cox regression methods such as selection of candidate variables, model-building strategy, testing of assumptions, forcing of standard prognostic variables and handling of continuous variables. It is unclear if sHER2 results were scored blind to outcomes and validation of the final model was not mentioned. The multivariate analysis found sHER2 and ECOG performance status to be significant independent predictors of time to progression.

Colomer, Montero, Lluch, et al. (2000) included 55 patients with metastatic disease who were receiving first-line doxorubicin and paclitaxel. Of the 77 patients originally enrolled in this Phase II study, 75 percent had evaluable serum samples. The plan to assess the relation between sHER2 and outcome was prespecified in this study; however, the multivariate logistic and Cox regression techniques were poorly described. It is unclear how candidate variables were selected, what model-building strategy was used, whether assumptions were tested, whether final models included all standard prognostic variables and whether continuous variables were well handled. Furthermore, models did not appear to be validated and it is unclear if sHER2 was scored blindly to outcome. In the logistic regression of response, there were only 39 events, but six variables entered into the multivariate model (more than the recommended one variable per greater than 10 events). A similar problem existed for the Cox regression of response duration. These authors found elevated sHER2 to be significantly associated with poorer results on response duration and overall response rate, in both univariate and multivariate analyses.

The study by Yamauchi, O'Neill, Gelman, et al. (1997) was originally a randomized comparison of three doses of droloxifene as first-line hormonal therapy. Of the 369 patients randomized, 94 were included in this retrospective analysis (25 percent). Logistic regression of overall response and Cox regression of time-to-progression and overall survival all used the stepwise model building strategy, a method with major weaknesses. The description of modeling methods was poor, lacking details on: candidate variable selection, whether assumptions were tested, whether final models included standard prognostic variables and whether continuous variables were well handled. The article did not make clear whether sHER2 results were scored blindly to outcome. Multivariate analyses entered dose into models but was not retained, suggesting similar results by different doses and dose groups were pooled. After adjustment for other variables, this study found consistently worse results for sHER2 positive patients on time to progression, overall survival and overall response rate.

Single-arm study, univariate analysis. These studies reported on 55 patients or fewer. With the exception of the study by Esteva, Valero, Booser, et al. (2002), positive sHER2 results were associated with worse outcomes. The lack of multivariate analyses in these studies makes these findings of limited use for guiding treatment decisions. These studies could be described as exploratory, hypothesis-generating investigations that might inform future, more sophisticated studies.

Single-arm study results summary. This body of evidence is quite heterogeneous with respect to treatment regimens, outcomes assessed, and definitions of elevated sHER2. Only three of 11 studies conducted multivariate analyses, but the modeling methods were poorly described. Evidence from single-arm series more often shows that sHER2 status predicts outcomes among patients treated, however, there were several instances in which it was nonpredictive and one study found better response among those with elevated sHER2 in conflict with all other studies.

Conclusions, Key Question 4

The evidence is weak on whether sHER2 predicts outcome after treatment with any regimens in any setting. Evidence primarily focused on first-line or second- and subsequent-line treatment of metastatic disease using variety of regimens. Furthermore these studies used different thresholds for a positive sHER2 result and varied on whether patient selection required positive tissue HER2 status. There were only four randomized trials and only one used multivariate analysis, while three single-arm studies performed multivariate analysis. The quality of reporting on multivariate analyses lacked sufficient detail. Univariate analyses provide very limited information value, suggesting candidate variables for future multivariate analyses. These studies do not support clear conclusions for whether sHER2 predicts disease progression, treatment response, or outcomes of any specific treatment regimen.

Key Question 5

In patients with ovarian, lung, prostate, or head and neck cancers, what is the evidence that:

  • a

    testing tumor tissue for HER2; or

  • b

    monitoring serum or plasma concentrations of HER2;

either predicts response to therapy, or detects tumor progression or recurrence; and if so, what is the evidence that decisions based on her2 assay results improve patient management and outcomes?

Study Selection

Studies were included for Key Question 5 if they were:

  • randomized trials, prospective single-arm studies, or retrospective series of identically treated patients; that

  • measured HER2 in tumor tissue, serum, or plasma from patients with ovarian, lung, prostate, or head and neck cancers, and either:

    • a

      associated HER 2 status from tissue assays, or baseline values or changes in serum or plasma HER2 concentration, with one or more outcomes of interest (primary or secondary; see above); or

    • b

      compared outcomes of treatment decisions based on tumor HER2 status, or serum or plasma assay results, with outcomes of decisions made in absence of test results.

Part I. Lung Cancer

Table 30

Hierarchy of evidence, KQ5, lung cancer
Level of EvidenceStudynSettingTreatmentsOutcomeResults
RCT stratified on HER2 status/HER2-guided vs. non-HER2-guided
RCT prespecified MV SGA
RCT post-hoc MV SGA
RCT treatment by HER2 SGA
1-arm prespecified MV analysis
1-arm post-hoc MV analysis Koukourakis 1999189NSCLCsurgeryOSunivariate: IHC HER2 not associated with OS
T1–2, N0–1Cox regression: IHC HER2 not entered in model
Cappuzzo 2005101locally advanced, metastaticgefitinibORRunivariate: IHC HER2+↑ vs. - p=.001
ORRCox regression IHC HER2+↑ vs. - p=.08
OSunivariate IHC HER2+↑ vs. - p=.056 (discrepancies)
NSCLCTTPunivariate IHC HER2+↑ vs. - p=.02 (discrepancies)
Hirsch 200556stage IIIB/IVgefitinibOSunivariate FISH HER2+ vs. - p=.80
BAC. BAC-OSCox regression FISH HER2 not entered in model
like ACORRunivariate FISH HER2+ vs. - p>.05
Saad 2004100stage IsurgeryOSunivariate AC IHC HER2+↓ vs. - p=signif
AC/BACOSCox regression AC IHC HER2+↓ vs. - signif independent predictor
OSunivariate BAC IHC HER2+↓ vs. - p=signif
OSCox regression BAC IHC HER2+↓ vs. - signif independent predictor
1-arm UV analysisCappuzzo 200742stage III/IVgefitinibRespFISH HER2+↑ vs. - p=.007
NSCLCTTPFISH HER2+ vs. - p=.2
OSFISH HER2+ vs. - p=.1
Daniele 200742stage III/IVgefitinibRespFISH/CISH+↑ vs. - p=.0005
NSCLC
Krug 200565stage IIIB/IVdocet/paclitOSIHC HER2+ vs. - p=NS
NSCLC+trastuz
Pelosi 2005345stage IsurgeryOSFISH HER2+ vs. - p=NS
NSCLCDFSFISH HER2+ vs. - p=NS
Langer 200456stage IIIB/IVtrastuz+OSIHC HER2 3+ vs. 2+ vs. 1+ p=.77
recurrentpaclit+PFSIHC HER2 3+ vs. 2+ vs. 1+ p=.34
NSCLCcarbopl
Cappuzzo 200363stage IIIB/IVgefitinibTTPIHC HER2+ vs. - p=NS
NSCLCOSIHC HER2+ vs. - p=NS
ORRIHC HER2+ vs. - p=.126
Koukourakis 2000112T1–2, N0–1surgeryOSIHC HER2+ vs. - p=NS
NSCLC
Graziano 199866stage IIIAcispl+etopOSIHC HER2+ vs. - p=.617
NSCLC(PE), surgery,PE, RTORRIHC HER2+ vs. - p=.999
Pfeiffer 1996186stage I-IVsurgeryOSIHC HER2 none vs. low vs. high p=NS
NSCLC

Abbreviations: carbopl: carboplatin; cispl: cisplatin; DFS: disease-free survival; etop: etoposide; HR: hazard ratio; MV: multivariate; ORR: overall response rate; OS: overall survival; paclit: paclitaxel; pCR: pathologic complete response; PFS: progression-free survival; RCT: randomized, controlled trial; RFS: recurrence-free survival; SGA: subgroup analysis; trastuz: trastuzumab; TTP: time to progression; Tx: treatment; UV: univariate;

Table 31

Study quality assessment, KQ5, lung cancer
StudyProspective designPrespecified hypotheses about relation of marker to outcomeLarge, well-defined, representative study populationMarker assay methods well-describedBlinded assessment of marker in relation to outcomeHomogeneous treatment(s), either randomized or rule-based selectionLow rate of missing data (≤ 15%)Sufficiently long follow-upWell-described, well-conducted multivariate analysis of outcome: 1) clear candidate variable selection, 2) clear, appropriate model-building guidelines, 3) assumptions tested, 4) standard prognostic variables included, 5) continuous variables well handled, 6) validation
1)2)3)4)5)6)
Koukourakis et al., 1999NN