NCBI » Bookshelf » Health Services/Technology Assessment Text (HSTAT) » AHRQ Evidence Reports » Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes
 
hserta
AHRQ Evidence Reports
public health

Chapter  160:  Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes

A257712

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0018

Prepared by:

The Johns Hopkins University Evidence-based Practice Center, Baltimore, MD

Investigators

Luigi Marchionni, M.D., Ph.D.

Renee F. Wilson, M.Sc.

Spyridon S. Marinopoulos, M.D., M.B.A.

Antonio C. Wolff, M.D.

Giovanni Parmigiani, M.D.

Eric B. Bass, M.D., M.P.H.

Steven N. Goodman, M.D., M.H.S., Ph.D.

AHRQ Publication No. 08-E002

January 2008

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

Suggested Citation:

Marchionni L, Wilson RF, Marinopoulos SS, Wolff AC, Parmigiani G, Bass EB, Goodman SN. Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes. Evidence Report/Technology Assessment No. 160. (Prepared by The Johns Hopkins University Evidence-based Practice Center under contract No. 290-02-0018). AHRQ Publication No. 08-E002. Rockville, MD: Agency for Healthcare Research and Quality. January 2008.

This report is based on research conducted by the Johns Hopkins University Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0018). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.

This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

The investigators have no relevant financial interests in the report. The investigators have no employment, consultancies, honoraria, or stock ownership or options, or royalties from any organization or entity with a financial interest or financial conflict with the subject matter discussed in the report.

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0018

Prepared by:

The Johns Hopkins University Evidence-based Practice Center, Baltimore, MD

Investigators

Luigi Marchionni, M.D., Ph.D.

Renee F. Wilson, M.Sc.

Spyridon S. Marinopoulos, M.D., M.B.A.

Antonio C. Wolff, M.D.

Giovanni Parmigiani, M.D.

Eric B. Bass, M.D., M.P.H.

Steven N. Goodman, M.D., M.H.S., Ph.D.

AHRQ Publication No. 08-E002

January 2008

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

Suggested Citation:

Marchionni L, Wilson RF, Marinopoulos SS, Wolff AC, Parmigiani G, Bass EB, Goodman SN. Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes. Evidence Report/Technology Assessment No. 160. (Prepared by The Johns Hopkins University Evidence-based Practice Center under contract No. 290-02-0018). AHRQ Publication No. 08-E002. Rockville, MD: Agency for Healthcare Research and Quality. January 2008.

This report is based on research conducted by the Johns Hopkins University Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0018). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.

This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

The investigators have no relevant financial interests in the report. The investigators have no employment, consultancies, honoraria, or stock ownership or options, or royalties from any organization or entity with a financial interest or financial conflict with the subject matter discussed in the report.

Preface

The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The Centers for Disease Control and Prevention (CDC) requested and provided funding for this report. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.

To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.

AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.

We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by e-mail to .

Acknowledgments

The Evidence-based Practice Center thanks Michael Oladubu, D.D.S. and Allison Jonas, for their assistance with literature searching and database management, and project organization; Aly Shogan for her assistance in completing the sections on economics; Brenda Zacharko for her assistance with budget matters, and for her assistance with final preparations of the report. The Center also wishes to thank Gurvaneet Randhawa, M.D., M.P.H., AHRQ Task Order Officer, for his efforts in guiding this project and coordination with the CDC EGAPP group.

Structured Abstract

Objective: To assess the evidence that three marketed gene expression-based assays improve prognostic accuracy, treatment choice, and health outcomes in women diagnosed with early stage breast cancer.

Data Sources: MEDLINE®, EMBASE, the Cochrane databases, test manufacturer Web sites, and information provided by manufacturers.

Review Methods: We evaluated the evidence for three gene expression assays on the market; Oncotype DX™, MammaPrint® and the Breast Cancer Profiling (BCP or H/I ratio) test, and for gene expression signatures underlying the assays. We sought evidence on: (a) analytic performance of tests; (b) clinical validity (i.e., prognostic accuracy and discrimination); (c) clinical utility (i.e., prediction of treatment benefit); (d) harms; and (e) impact on clinical decision making and health care costs.

Results: Few papers were found on the analytic validity of the Oncotype DX and MammaPrint tests, but these showed reasonable within-laboratory replicability. Pre-analytic issues related to sample storage and preparation may play a larger role than within-laboratory variation. For clinical validity, studies differed according to whether they examined the actual test that is currently being offered to patients or the underlying gene signature. Almost all of the Oncotype DX evidence was for the marketed test, the strongest validation study being from one arm of a randomized controlled trial (NSABP-14) with a clinically homogeneous population. This study showed that the test, added in a clinically meaningful manner to standard prognostic indices. The MammaPrint signature and test itself was examined in studies with clinically heterogeneous populations (e.g., mix of ER positivity and tamoxifen treatment) and showed a clinically relevant separation of patients into risk categories, but it was not clear exactly how many predictions would be shifted across decision thresholds if this were used in combination with traditional indices. The BCP test itself was examined in one study, and the signature was tested in a variety of formulations in several studies. One randomized controlled trial provided high quality retrospective evidence of the clinical utility of Oncotype DX to predict chemotherapy treatment benefit, but evidence for clinical utility was not found for MammaPrint or the H/I ratio. Three decision analyses examined the cost-effectiveness of breast cancer gene expression assays, and overall were inconclusive.

Conclusions: Oncotype DX is furthest along the validation pathway, with strong retrospective evidence that it predicts distant spread and chemotherapy benefit to a clinically relevant extent over standard predictors, in a well-defined clinical subgroup with clear treatment implications. The evidence for clinical implications of using MammaPrint was not as clear as with Oncotype DX, and the ability to predict chemotherapy benefit does not yet exist. The H/I ratio test requires further validation. For all tests, the relationship of predicted to observed risk in different populations still needs further study, as does their incremental contribution, optimal implementation, and relevance to patients on current therapies.

Executive Summary

Introduction

Breast cancer is the most commonly diagnosed cancer in women. This tumor is the second leading cause of cancer-related deaths in women in the United States, with approximately 178,000 new cases and 40,000 deaths expected among U.S. women in 2007. Treatment for breast cancer usually involves surgery to remove the tumor and involved lymph nodes. Frequently, surgery is followed by radiation therapy (in case of breast conservation or in women with large tumors or many involved lymph nodes), endocrine therapy (for essentially all women with tumors that express the estrogen receptor (ER-positive)), and/or chemotherapy (for women having a high risk for a poor outcome such as those with large tumors, involved lymph nodes, advanced disease, or inflammatory breast cancer). More than three-quarters of patients are expected to survive with this multi-modality approach.

Gene expression profiling has been proposed as an approach to address this issue in clinical settings, and three breast cancer gene expression assays are now available in the U.S. The Oncotype DX™ Breast Cancer Assay, the MammaPrint® Test, and the Breast Cancer Profiling test (BCP or H/I ratio). MammaPrint is based on the use of microarray technology, while the other two assays are based on the reverse transcriptase polymerase chain reaction (RT-PCR). All of these tests combine the measurements of gene expression levels within the tumor to produce a number associated with the risk of distant disease recurrence. These tests aim to improve on risk stratification schemes based on clinical and pathologic factors currently used in clinical practice. As therapeutic decisions are based on risk estimates, tests that improve such estimates have the potential to affect clinical outcome in breast cancer patients by either avoiding unnecessary chemotherapy and its attendant morbidity or by employing it where it might not otherwise have been used, thereby reducing recurrence risk.

The literature was searched for evidence about the use of gene expression profiling in breast cancer. Our analytical framework for reporting the results distinguishes between the assays, as they are offered to patients, and the underlying signatures, which comprise the genes whose expression is measured. This measurement of expression can be done in a number of ways that may not be identical to the procedures used for the marketed test, producing an unknown number of different predictions. We also distinguish between developmental and validation studies.

Methods

Working with the Agency for Healthcare Research and Quality (AHRQ), the Centers for Disease Prevention and Control (CDC), the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) working group, and members of a technical expert panel, we formulated four key questions, and addressed them on the basis of the evidence available about the specific assays and the underlying gene expression signatures. The original set of key questions was refined to focus primarily on two gene expression profiling tests: Oncotype DX (Genomic Health, Inc.) and MammaPrint (Agendia). During the course of the evaluation, a third gene expression profiling test came to our attention, the H/I ratio test based on the two-gene signature (AviaraDX/Quest Diagnostics, Inc.), and was thus investigated. We searched and retrieved studies in MEDLINE®, EMBASE, and the Cochrane databases (1990-2006). We supplemented this search with recent publications that appeared after the time period initially considered in the systematic search, and about the two-gene test (H/I ratio). We also searched for relevant documents on the Food and Drug Administration's web site, and solicited additional documentation from the companies offering the tests. The systematic searches yielded a total of 12983 citations. Specific inclusion and exclusion criteria were developed and pairs of readers reviewed each title; the same procedure was used to review selected abstracts. We identified 63 studies for full text review. We developed tables to summarize each article. Initial data were abstracted by investigators and entered directly into evidence tables. Quality and consistency of the abstracted data was then evaluated by a second reviewer, and a senior investigator examined all reviews to identify potential problems with data abstraction. These were discussed at meetings of group members. A system of random data checks was applied to ensure data abstraction accuracy.

Results

Literature on Key Questions

Key Question 1. What is the direct evidence that gene expression profiling tests in women diagnosed with breast cancer (or any specific subset of this population) lead to improvement in outcomes?

Direct evidence was defined as a study where the primary intervention is the use of a prognostic test (with therapeutic decisionmaking directed by the result) and the outcomes are patient morbidity, mortality and/or quality of life. No direct evidence was found in the published data on improvement of patients' outcomes due to such testing in women diagnosed with breast cancer, nor were there any randomized studies using the tests' predictions to manage patients. However, as described under Key Questions 3 and 4, some of the tests' supporting evidence was derived from past randomized controlled trials (RCTs) with prospectively gathered patient samples, giving them strong evidential value. Two ongoing RCTs, TAILORx and MINDACT (using Oncotype DX, and MammaPrint respectively), will provide further evidence allowing almost direct inference about the impact on patient outcomes.

Key Question 2. What are the sources of and contributions to analytic validity in these two gene expression-based prognostic estimators for women diagnosed with breast cancer?

In the field of gene expression there are no “gold standards” outside the technologies used in the tests under study, i.e., microarrays and RT-PCR. Consequently, a definitive evaluation of the analytic validity of expression-based tests is difficult. Evidence about operational characteristics was partial and limited to a few publications. A 2007 paper by Cronin and colleagues, on the analytic validity of Oncotype DX was the most detailed study for any of these tests so far, showing good performance for a number of analytic components of the assay. Data about the sources and contributions to variability of the tests and about their reproducibility was generally limited to analyses of few samples, and thus a complete evaluation of the impact of such variability on risk assessment was not available. Partial evidence about analytic validity was provided in the percentage of subjects whose samples were successfully analyzed with these tests, and those numbers were fairly good. Continuous monitoring of laboratory procedures and careful evaluation of the quality of the submitted specimens are major factors affecting test reliability.

Key Question 3. What is the clinical validity of these tests in women diagnosed with breast cancer?

  • a

    How well does this testing predict recurrence rates for breast cancer compared to standard prognostic approaches? Specifically, how much do these tests add to currently known factors or combination indices that predict the probability of breast cancer recurrence, (e.g., tumor type or stage, age, ER, and human epidermal growth factor receptor 2 (HER-2) status)?

  • b

    Are there any other factors, which may not be components of standard predictors of recurrence (e.g., race/ethnicity or adjuvant therapy), that affect the clinical validity of these tests, and thereby generalizability of results to different populations?

Clinical validity is defined as the degree to which a test accurately predicts the risk of an outcome (i.e., calibration), as well as its ability to separate patients with different outcomes into separate risk classes (discrimination). Clinical validity was documented to some degree for all three gene expression signatures. Oncotype DX was validated on a homogenous population of lymph node negative, ER positive patients all treated with tamoxifen, derived from an arm of an RCT, the National Surgical Adjuvant Breast and Bowel Project (NSABP-14). MammaPrint, on the other hand, was validated on samples from a clinical series with a wide range of clinical and treatment characteristics, and sometimes it was the signature and not the MammaPrint test itself that was validated. Data that made clear the incremental value of the test over standardized risk predictors using classical clinical factors, in the form of risk reclassification tables, was limited to Oncotype DX in one population, and for one of those predictors (Adjuvant! Online for MammaPrint). The evidence behind the two-gene test is quite heterogeneous, in that the specific manner in which the index was calculated differed in each, and only one examines the index that is to be used as part of the BCP (or H/I ratio) test in a study that was still using statistical methods to find optimal cut points, i.e., a training study. So the Oncotype DX test, which has been validated in exactly the form given to patients on clinically homogeneous samples with clear treatment implications, is regarded as the index with the strongest claim to clinical validity. It is not yet as clear to which populations MammaPrint best applies, and how much incremental value it would have within those clinically homogeneous populations above various standard predictors. Since the number of validation studies for any of the tests is still relatively small, more remains to be learned about stability between different populations of the relationship between expression-based score and the absolute observed risk. Essentially nothing is known about how specific characteristics of these populations might affect test performance.

While the H/I ratio test shows some promise, it must be regarded as still being in a developmental phase; it cannot yet be considered fully validated. It was not clear whether samples were processed by Quest Diagnostics, which hold the current license. There are a number of intriguing biological insights and plausible mechanisms to support the rationale for the test, but its consistent value in well-defined clinical settings has not yet been firmly established.

Key Question 4. What is the clinical utility of these tests?

  • a

    To what degree do the results of these tests predict the response to chemotherapy, and what factors affect the generalizability of that prediction?

  • b

    What are the effects of using these two tests and the subsequent management options on the following outcomes: testing or treatment related psychological harms, testing or treatment related physical harms, disease recurrence, mortality, utilization of adjuvant therapy, and medical costs.

  • c

    What is known about the utilization of gene expression profiling in women diagnosed with breast cancer in the United States?

  • d

    What projections have been made in published analyses about the cost-effectiveness of using gene expression profiling in women diagnosed with breast cancer?

Few studies addressed the clinical utility of Oncotype DX recurrence score (RS) in predicting the benefits of adjuvant chemotherapy, although the probability of recurrence represents an upper bound on the degree of absolute benefit. One fairly strong retrospective study produced preliminary evidence that the RS has predictive power in assessing the benefit of chemotherapy usage in ER-positive, lymph node negative breast cancer patients. This study was embedded within a large, well conducted RCT (National Surgical Adjuvant Breast and Bowel Project (NSABP B-20)). Some patients from the tamoxifen-only arm of the trial were in the training data sets for the Oncotype DX assay development, and this could potentially translate into a somewhat enhanced estimate of the discriminatory effect of Oncotype DX, although it is unlikely to eliminate entirely the effect seen here. Other studies produced preliminary evidence that the RS from the Oncotype DX assay has predictive power in assessing the likelihood of pathologic complete response after pre-operative chemotherapy with various drugs and regimens, although very limited sets of patients have been used. One study produced preliminary evidence that the RS cannot predict pathologic complete response after primary chemotherapy in advanced breast cancer patients.

One study produced preliminary evidence that the knowledge of the RS from the Oncotype DX assay can have an impact on the clinical management of patients diagnosed with ER positive, lymph node negative, and early breast cancer. However, it did not report specifically what the patients (or doctors) were told or understood about their absolute risk of recurrence, and therefore was minimally informative as to the actual risk thresholds used by women and their treating physicians, or whether absolute risks even entered into the decision.

There were no studies that addressed the clinical utility of the MammaPrint or H/I ratio tests.

Three published studies have addressed economic outcomes associated with use of the breast cancer gene expression tests. One study reported that using the 21-gene RT-PCR assay to reclassify patients who were defined by 2005 National Comprehensive Cancer Network (NCCN) criteria as low risk (to intermediate or high risk) would lead to an average gain in survival per reclassified patient of 1.86 years. The associated cost-utility of using recurrence score testing for this cohort was $31,452 per quality-adjusted life-year (QALY) gained. The analysis also reported that using the 21-gene RT-PCA assay to reclassify patients who were defined by 2005 NCCN criteria as high risk (to low risk) was cost saving. In a hypothetical population of 100 patients with characteristics similar to those of the NSABP B-14 participants, more than 90 percent of whom were NCCN-defined as high risk, using the 21-gene RT-PCR assay was expected to improve quality-adjusted survival by a mean of 8.6 years and reduce overall costs by about $203,000. However, the EPC team had only moderate confidence in the results of this analysis because the study was sponsored in part by the manufacturer of the 21-gene RT-PCR assay and the authors did not provide sufficient information about methodological and structural uncertainties as well as other potential sources of bias such as the derivation of the utility estimates. Furthermore, the 2007 NCCN guideline indicates that the use of chemotherapy in these patients is now considered optional, further diminishing the usefulness of these projections.

The second study reported that use of the 21-gene RT-PCR assay was associated with a gain of 0.97 QALYs and a cost-utility ratio of $4432 per QALY compared with use of tamoxifen alone, and a gain of 1.71 QALYs with net cost savings when compared with the chemotherapy and tamoxifen combination. However, the EPC team had little confidence in the results of this analysis, which was supported in part by the manufacturer, because the study did not meet many of the standards that the team used for appraising the quality of the analysis.

The third study compared the cost-effectiveness of the Netherlands Cancer Institute gene expression profiling (GEP) assay (MammaPrint) to the U.S. National Institutes of Health (NIH) guidelines for identification of early breast cancer patients who would benefit from adjuvant chemotherapy. The GEP assay was projected to yield a poorer quality-adjusted survival than the NIH guidelines (9.68 vs. 10.08 QALYs) and lower total costs ($29,754 vs. $32,636). To improve quality-adjusted survival, the GEP assay would need to have a sensitivity of at least 95 percent for detecting high risk patients while also having a specificity of at least 51 percent. The EPC team had confidence in the results of this analysis because it met most of the standards for appraising the quality of an economic analysis.

Based on the appraisal of these three studies, the overall body of evidence on economic outcomes was inconclusive.

Limitations of the Report

The report included only English publications and was restricted to three gene expression tests.

Limitations of the Literature and Implications for Future Research

There are several issues that concern all of these tests.

  • 1

    While all of the tests exhibit a fair bit of risk discrimination (i.e., separating patients into different risk groups), the calibration of the estimates (i.e., how close the predicted risk is to the observed risk) in varying settings is still not as well established. Of greatest interest is the observed risk in the lowest risk groups, since the absolute level of this risk is critical for informed decisionmaking, and patients may forego chemotherapy on the basis of this information.

  • 2

    The manner in which the tests are best used-in combination with other prediction scores, as continuous scores, or as categorical predictors-has not been established. In addition, the current cut-points for designation of Low and High risks (with or without an intermediate category) are not clearly derived from decision-analytic criteria.

  • 3

    The incremental value of these tests is best assessed from cross-classification tables that show how many subjects are placed in different risk categories (corresponding to different clinical decisions) by the addition of the information from the test in comparison or in addition to standard predictors. Such tables have been developed for Oncotype DX, but for only one set of risk thresholds, and some of the conventional guidelines used for those comparisons have since been updated.

  • 4

    In practice, pre-analytic issues related to sample preparation, transport and processing could cause the tests to perform differently in practice than in investigational contexts; continued monitoring of test procedures and performance will be important as they are used more widely.

  • 5

    The relevance of validation studies in past tamoxifen-treated populations for current populations treated with aromatase inhibitors needs further research.

  • 6

    Studies examining the use of the tests should provide women and physicians with quantitative risk information and report how this alters clinical decisionmaking. The manner in which this risk information is presented should also be studied.

Oncotype DX

  • 1

    The role of the RS in guiding treatment of HER-2 positive patients is unclear, as most of these patients were classified in the high RS group in the initial trials.

  • 2

    While awaiting the TAILORx results, the findings of the Paik 2006 study predicting treatment benefit need independent confirmation.

MammaPrint

  • 1

    The prognostic value of the 70-gene signature has been assessed in different populations facing different therapeutic choices. In the analysis by van de Vijver and colleagues, 130 of the 295 patients received adjuvant therapy in a non-randomized fashion. Patients in the original development cohort were not treated, and Buyse validated the marketed assay in untreated patients. It is not yet clear which are the optimal patient populations for the use of this test, exactly what its performance is in those populations, and how many of its predictions would result in different therapeutic decisions. Larger independent validation studies in therapeutically homogeneous groups would be very valuable.

  • 2

    There is no evidence for the degree to which this test predicts the benefit of adjuvant chemotherapy.

Breast Cancer Profile (H/I ratio) Test

  • 1

    The BCP test is not yet as well validated as either of the other tests, with most of the supporting studies examining slightly different ways of either performing (e.g., different reference standards) or calculating the index. More work needs to be done documenting the risk discrimination and risk calibration of the marketed test in clinically homogeneous populations, as well as its incremental value.

  • 2

    There is no evidence for the degree to which this test predicts the benefit of adjuvant chemotherapy.

In addition to the conclusions above, a series of other observations were made on the basis of what was learned in this investigation.

Assay Validation

In general, it is clear that validation studies need to deal with populations for whom the decision-making implications of various risk groupings are clear. For all tests except Oncotype DX, both validation and development studies have been on mixed populations, without sufficient sample sizes to stratify into large enough homogeneous groups to guide clinical decisionmaking. In addition, validation samples are often re-used by other investigators; the pool of such samples in the public domain needs to be greatly expanded.

Potential for Scale Problems

One problem that may be faced in the future is that of the consequences of an increase in demand for these tests. Whether the degree of accuracy seen in investigational settings can be maintained with increasing demands should be monitored by scientific or regulatory bodies.

Genetic Variability and Gene Expression

It is unknown whether gene expression profiles are more or less likely than more traditional biomarkers to be generalizable beyond the populations in which they were initially developed. Gene expression may reflect fundamental biological tumor features, and thus be relatively stable across ethnic groups. This speaks to the importance of validating these tests in populations with varying genetic background. Of particular interest will be the variation of the observed absolute risk in those populations, and its correlates.

The Need for Databases, Reproducibility, and Standards

Consideration should be given to the development of databases with complete data on each patient tested with these and future tests (absent identifiers). The data should include all the analyses performed, laboratory logs, the raw and processed data, and all the information about procedures and analyses that have been performed to produce a risk estimate from a tumor sample.

Where is the Field Going?

We can expect many new tests, as well as new uses for the assays that already exist. More genes might be added to the signatures, and in the particular case of MammaPrint this will be possible without changing the experimental procedures, since the array contains more genes than the ones that are incorporated in the 70-gene signature. In this regard, we might also expect other modifications: subsets of the current signatures might be proposed as alternatives to current clinical risk factors, or be proposed in different populations or for different purposes. For Oncotype DX, a natural evolution could be related to its use as an alternative to immunohistochemistry and/or pathology to evaluate tumor Grade, S-phase index, ER, progesterone receptor, and HER2 expression, since such genes are part of the set included in the assay. Reporting of individual gene expression results may also prove useful.

“Comparative Effectiveness” Studies

As these tests mature and proliferate, an important question will be how they compare to each other, and whether there is value in their combination. In the therapeutic domain, this has been called “comparative effectiveness” research. Such research has traditionally been difficult to fund by government or by industry, because it may not hold out as much therapeutic promise as new discoveries, and because industry understandably is not anxious to fund head-to-head comparisons with competitive products. This same dynamic could easily take hold in the risk prediction arena, with a proliferation of licensed prediction indices without any clear notion of what new ones are contributing over previous tests. In this perspective, development of future expression-based predictors should account for direct contrasts with “established” methods.

Conclusion

The introduction of these gene-expression tests has ushered in a new era in which many conventional clinical markers and predictors may be seen merely as surrogates for more fundamental genetic and physiologic processes. The multidimensional nature of these predictors demands both large numbers of clinically homogeneous patients to be used in the validation process, and exceptional rigor and discipline in the validation process, all with an eye toward how the test will be used in a clinical decisionmaking context. Every study provides an opportunity to tweak a genetic signature, but we must find the right balance between speed of innovation and development of scientifically and clinically reliable tools. Going forward, it will be important to harness, if possible, as much genetic and clinical information on patients who undergo these tests to facilitate achieving each goal without unduly sacrificing the other.

Chapter 1. Introduction

Breast Cancer

Breast cancer is the most commonly diagnosed cancer in women.1 This tumor is currently the second leading cause of cancer-related deaths in women in the U.S., with approximately 178,000 new cases and 40,000 deaths expected among U.S. women in 2007.1 Treatment for breast cancer usually involves surgery to remove the tumor and involved lymph nodes. Frequently, surgery is followed by radiation therapy (in case of breast conservation or in women with large tumors or many involved lymph nodes), endocrine therapy (for essentially all women with tumors that are estrogen receptor (ER)-positive (see Appendix Aa for a list of acronyms), and/or chemotherapy (for women having a high risk for a poor outcome, such as those with large tumors, involved lymph nodes, advanced disease, or inflammatory breast cancer). Chemotherapy administered in addition to surgery is called “adjuvant” chemotherapy. More than three-quarters of all patients are expected to survive with this multi-modality approach.

One major challenge in breast cancer treatment relates to the decision about whether or not to use adjuvant chemotherapy. Although adjuvant chemotherapy can reduce the annual odds of recurrence and death for many women with breast cancer, especially those with ER-negative tumors,2 it has considerable adverse effects. Even though most women with early-stage breast cancer are advised to undergo chemotherapy, not all will benefit from it and some may remain free of disease recurrence at 10 years without it, especially those with small tumors and ER-positive disease. Decisionmaking protocols have been proposed with the intent of guiding clinicians involved in breast cancer treatment. Examples include the National Institutes of Health (NIH) Consensus Development criteria,3,4 the St. Gallen expert opinion criteria,5 the National Comprehensive Cancer Network (NCCN) guideline,6 and the computer-based algorithm Adjuvant! Online,7,8 which produces risk assessment and recommendations based on patient information, clinical data, tumor staging, and tumor characteristics (including age, menopausal status, comorbidity, tumor size, number of positive axillary nodes, and ER status). In addition, measurement of the human epidermal growth factor receptor 2 (HER-2) is now established as another predictive marker and has been incorporated into some of these indices,9 as it serves to identify candidates for adjuvant therapy with the monoclonal antibody trastuzumab (Herceptin®; Genentec, Inc., San Francisco, CA). Such patients may also be candidates for adjuvant treatment with other new agents such as the tyrosine kinase anti-HER-2 inhibitor lapatinib (Tykerb®, GSK, PA) and the anti-vascular epithelial growth factor (VEGF) receptor antibody bevacizumab (Avastin®; Genentech), which are being studied in trials now in progress. With the proliferation of treatment advances in breast cancer, treatment decisions have become more complex, thereby increasing the demand for tests and predictive models that could help identify those patients most likely to benefit from specific therapies.

Breast cancer is increasingly understood as a broad umbrella label, with various tumor subtypes exhibiting different prognoses and different responses to the various treatment options available for use in the adjuvant setting. Evidence from large randomized trials, and systematic reviews, forms the basis of the various treatment algorithms and nomograms described above. These tools help caregivers determine the risk of recurrence and death and the chances of benefiting from a specific therapy within a tumor subtype (e.g., anti-estrogens alone for ER-positive disease, trastuzumab for HER-2-positive disease). Unfortunately, the predictive utility of these tools for an individual patient within a specific tumor subset is quite limited, and a large number of patients with ER-positive disease or HER-2-positive disease still experience tumor recurrence and die from their disease despite having received adjuvant anti-estrogen therapy or trastuzumab, respectively. Therefore, there is great interest in developing, testing, and validating strong predictive markers that can be used in daily clinical practice to accurately identify those patients most likely to benefit from specific therapy options such as chemotherapy, endocrine therapy, and anti-HER-2 therapy, alone or in combination.

Gene Expression Profiling

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-breastcanoutf1.jpg.

   Figure 1. Increasing complexity of information from genome to transcriptome and proteome: gene expression profiling focuses on the analysis of the transcriptome

Gene expression profiling (see Glossary, Appendix B) is an emerging technology for identifying genes whose activity may be helpful in assessing disease prognosis and guiding therapy. Gene expression profiling examines the composition of cellular messenger ribonucleic acid (RNA) populations. The identity of the RNA transcripts (see Glossary, Appendix B) that make up these populations and the number of these transcripts in the cell provide information about the global activity of genes that give rise to them. The number of mRNA transcripts derived from a given gene is a measure of the “expression” of that gene. Given that messenger RNA (mRNA) molecules are translated into proteins, changes in mRNA levels are ultimately related to changes in the protein composition of the cells, and consequently to changes in the properties and functions of tissues and cells in the body. However, only 2 percent of the genome (see Glossary, Appendix B) is translated into proteins, and little is known about how the expression of this 2 percent is controlled. The key intermediate is the transcriptome (see Glossary, Appendix B), which is made up of all the individual transcripts produced by the cell (see Figure 1).

Investigators have developed approaches to gene expression analysis that have led to substantial advances in our understanding of basic biology. Gene expression profiling has been applied to numerous mammalian tissues, as well as plants, yeast, and bacteria.1014 These studies have examined the effects of treating cells with chemicals and the consequences of overexpression of regulatory factors in transected cells. Studies also have compared mutant strains with parental strains to delineate functional pathways. In cancer research, such investigation has been used to find gene expression changes in transformed cells and metastases, to identify diagnostic markers, and to classify tumors based on their gene expression profiles (see Glossary, Appendix B).1518 The use of this approach for specific clinical problems, however, is relatively recent and poses several challenges related to the validity, reproducibility, and reliability required for use in diagnostic or predictive testing.

In recent years, gene expression profiling has been successfully used in breast cancer research. For instance, distinct subtypes of breast tumors (such as tumors expressing HER-2) have been identified as having distinctive gene expression profiles, representing diverse biologic entities associated with differences in clinical outcome.1923 Other investigators 24 have found gene expression signatures (see Glossary, Appendix B) associated with the ER and lymph node status of patients, thus identifying subgroups of patients with different clinical outcomes after therapy. From such studies, investigators have proposed a number of gene expression profiles that could be used to classify prognosis. In a case-control study from the Netherlands Cancer Institute (Amsterdam, the Netherlands), one such gene profile, consisting of 70 genes, was developed using archived frozen tissue from 78 young, node-negative women with breast cancer.21 In this study, tumors from patients who suffered rapid relapses after primary therapy had gene expression profiles that were quite distinct from those who remained disease-free. These gene expression profiles were then applied to a second validation set of 295 frozen tissue specimens collected from young women (including 61 patients from the previous cohort), yielding very similar results.25 Indeed, it appeared that this 70-gene profile more accurately predicted outcomes than did the traditional clinical criteria. Results from these preliminary studies further suggested that gene expression profiling may provide a powerful tool for estimating prognosis and the likelihood of benefit from selected therapeutic agents.

Breast Cancer Assays on the Market

Three breast cancer gene expression profiling-based assays are now available in the U.S. These assays investigate the expression of specific panels of genes by measuring their RNA levels in breast cancer specimens using different techniques, real-time reverse transcription-polymerase chain reaction (RT-PCR) 26 (Glossary) and DNA microarrays27 (see Glossary, Appendix B):

  • 1

    The Oncotype DX™ Breast Cancer Assay (Genomic Health, Redwood City, CA) quantifies gene expression for 21 genes in breast cancer tissue by RT-PCR.28 This test is intended to predict the likelihood of recurrence in women of all ages with newly diagnosed Stage I or II breast cancer, lymph node-negative and ER-positive, who will be treated with tamoxifen, an anti-estrogen agent.

  • 2

    The MammaPrint® Test is based on microarray technology, uses the 70-gene expression profile developed by van't Veer and colleagues,21,25 and is marketed by Agendia (Amsterdam, the Netherlands). This is a prognostic test for women 61 years of age or younger with primary invasive breast cancer who are lymph node-negative and ER-positive or negative. The company voluntarily submitted this test to the U.S. Food and Drug Administration for approval under proposed new guidelines for such tests, and received such approval in February 2007. These guidelines were finalized in July 2007.

  • 3

    The Breast Cancer Profiling Test is based on the expression ratio of the two genes HOXB13 and IL17RB, and for this reason is also known as the H/I ratio test. The assay was developed by AviaraDX and licensed to Quest Diagnostics, Inc. (Lyndhurst, NJ). This assay is based on RT-PCR and is offered to treatment-naïve women with ER-positive, lymph node-negative breast cancer.

Table 1

Description of the three gene expression profile assays
Assay General InformationMeasurementsAssay procedures
Assay: Oncotype DX™, Genomic HealthGenes for normalization: ACTB, GAPDH, RPLPO, GUS, TFRC:
  • 1

    FFPE specimen shipment.

  • 2

    Central pathological review for tumor content.

  • 3

    RNA preparation and quality evaluation.

  • 4

    Triplicate real time RT-PCR.

  • 5

    Gene expression measures normalization.

  • 6

    RS score computation and re-scaling; Risk group assignment.

Analytic studies: Cronin 2004,44 Cronin 200745Cancer related genes (the following functional groups are used to assess patients' Recurrence Score):Results: Report with RS and Risk Group.
Clinical validity and utility studies: Chang 2007,55 Cobleigh 2005,47, Esteva 2005,48 Gianni 2005,49 Habel 2006,50 Mina 2006,51 Oratz in press,56 Paik 200428, Paik 200653 Proliferation: Ki67, STK15, Survivin, CCNB1, MYBL2
Economics studies: Lyman 2007,75 Hornberger 200567 HER2: GRB7, HER2
What is measured: 16 cancer genes, 5 normalizing genes by real time RT-PCR Estrogen: ER, PGR, BCL2, SCUBE2
To whom it is offered: Invasion: MMP11, CTSL2
  • 1)

    Stage I, ER positive, who will be treated with tamoxifen;

  • 2)

    Stage II, ER positive, LN negative, who will be treated with tamoxifen;

 Single genes: GSTM1, CD68, BAG1
Web site: http://www.genomichealth.com/oncotype/default.aspxAlgorithm: The recurrence score (RS) is obtained in four steps as follows:
1. The expression for each gene is normalized relative to the expression of the 5 reference genes. Reference-normalized measurements range from 0 to 15, with a 1-unit increase reflecting approximately a doubling of RNA;
2. Scores for the groups of genes are calculated from individual expression measurements, as follows:
 HER2 group = 0.9*GRB7 + 0.1*HER2, (set to 8, if less);
 ER group = (0.8*ER + 1.2*PGR + BCL2 + SCUBE2) ÷ 4;
 Proliferation group = (Survivin + KI67 + MYBL2 + CCNB1 + STK15) ÷ 5 (set to 6.5, if less);
 Invasion group = (CTSL2 + MMP11) ÷ 2
3. The unscaled recurrence score (uRS) is calculated, using predefined coefficients defined in the three training sets:
 uRS = +0.47*HER2 group - 0.34*ER group + 1.04*proliferation group + 0.10*invasion group + 0.05*CD68 - 0.08*GSTM1 - 0.07*BAG1
4. The RS is rescaled from the uRS, as follows:
 RS = 0 if uRS < 0
 RS = 20* (uRS - 6.7) if 0 ≤ uRS ≤ 100
 RS = 100 if uRS >100
5. Risk groups:
 Low risk: RS ≤ 17
 Intermediate risk: 18 ≤ RS ≤ 30
 High risk: RS ≥ 31
Assay: MammaPrint®, AgendiaGenes for normalization: ~1800;
  • 1

    Fresh specimen shipment;

  • 2

    Central pathological review for tumor content;

  • 3

    RNA preparation and quality evaluation;

  • 4

    Dye swap microarray hybridization on MammaPrint®;

  • 5

    Image analysis;

  • 6

    Gene expression measures normalization;

  • 7

    70-gene signature computation and correlation to 70-gene profile; Risk group assignment;

Analytic studies: Ach 2007,57 Glas 2006,5870-gene signatures (the following functional groups are NOT used to assess patients' risk):Results: Report with Risk Group;
Clinical validity and utility studies: Buyse 2006,59 van de Vijver 2002,25, van't Veer 2002,21 Cell signaling, growth factors, transcription: MS4A7, GPR180, RTN4RL1, ZNF533, GPR126, ECT2, ESM1, FGF18, FLT1, GNAZ, STK32B, IGFBP5, IGFBP5, MELK, EBF4, NMU, CDC42BPA, TGFB3, WISP1, SCUBE2;
Other studies: Fan 2006,79 and Espinosa 200580 Cell cycle, chromatin, nuclear proteins: TSPYL5, CCNE2, CENPA, CDCA7, LGP2, EXT1, NDC80, MTDH, DTL, NUSAP1, MCM6, ORC6L, PRC1, RFC4;
What is measured: gene expression of 1900 genes, including the 70 genes in triplicate, by two-color microarray (Agilent Technologies); Cell adhesion, cell motility, cytoskeleton organization: AYTL2, DIAPH3, COL4A2, DIAPH3, DIAPH3, MMP9;
To whom it is offered: Metabolism, Intracellular transport, Golgi: ALDH4A1, AP2B1, QSOX2, GMPS, GSTM3, PITRM1, OXCT1, PECI, PECI, RAB6A, SLC2A3, EGLN1;
  • 1)

    Stage I, ER positive or negative, < 61 years;

  • 2)

    Stage II, ER positive or negative, LN negative, < 61 years;

 Ubiquitination: FBXO31, UCHL5;
Web site: http://www.agendia.com/en/Professional/About-MammaPrint/About-MammaPrint Apoptosis: BBC3;
 Drug resistance: DCK;
 Unknown function: LOC286052, PALM2-AKAP2, AA834945, LOC643008, AA404325, AI283268, AI224578, RUNDC1, C9orf30, AW014921, C16orf61, C20orf46, HRASLS, SERF1A
Algorithm: classification of patients into risk groups is obtained as follows:
  • 1

    Scanned image analysis, background correction for non-specific hybridization, and gene expression measurements normalization. Expression values are expressed as log10 ratios. Probes are excluded from further calculations if their background corrected intensities are below zero and/or if spots are flagged as outliers, as determined by the image analysis software. The overall expression value for each signature gene is computed using an error weighted mean over the triplicate probes.

  • 2

    The gene signature risk classification is given as a dichotomized value only: high or low risk. A tumor is defined as having a low-risk gene signature if the cosine correlation coefficient for the expression of the 70-gene profile in that tumor with the previously established classifier is above 0.4, the cut point used in the original study by van't Veer, 2002.21

Assay: Breast cancer Profiling (BCP), also known as H/I assay, AviaraDX/Quest DiagnosticsGenes for normalization: ACTB, HMBS, SDHA, and UBC;
  • 1

    FFPE Specimen shipment

  • 2

    Central pathological review for tumor content;

  • 3

    Specimen macro-dissection, if needed, LCM;

  • 4

    RNA preparation and quality evaluation;

  • 5

    Duplicate real time RT-PCR;

  • 6

    Gene expression measures normalization and Z-transformation;

  • 7

    Two-gene index computation; Risk group assignment;

Clinical validity and utility studies: Goetz 2006,62 Jansen 2007,72 Jerevall 2007,63 Ma 2004,64 Ma 2006,61 Reid 2005,69 and Fan 200679Two gene ratio: HOXB17, IL17BR;Results: Report with risk group
What is measured: 6 genes by real time RT-PCR;Algorithm: The H/I index is obtained in four steps as follows:
To whom it is offered:
  • 1

    For each sample, cycle threshold for the 4 reference genes are averaged to obtain the reference CT (refCT);

  • 2

    Relative expression levels for HOXB13 and IL17BR are computed as individual CT differences from the refCT;

  • 3

    Normalized CT differences are z-transformed and the H/I index is obtained by subtracting these two values;

  • 4

    An optimized cut-off point is used to dichotomize patients into two risk groups, low and high

  • 1)

    Treatment-naïve individuals with ER positive, LN negative breast cancer;

Web site: http://www.aviaradx.com/index.html
http://www.questdiagnostics.com/index.html

FFPE = formalin-fixed paraffin-embedded; RT-PCR = reverse transcriptase polymerase chain reaction; ER = estrogen receptor; LN = lymph node; HER = human epidermal growth factor receptor; RS = recurrence score; CT = cycle threshold; LCM = laser-capture micro dissected

All three tests have defined protocols for evaluating the tumor content of the specimens to be analyzed, preparing the RNA samples, normalizing the raw expression measurements, and computing summary indices which are related to patient prognosis. The characteristics of the assays, the gene panels used, and the procedures involved in the analysis are summarized in Table 1. Detailed descriptions of the genes can be found in Appendix C. These differences between tests must be taken into account in the evaluation of the available evidence about such tests. In the following section, we provide a brief description of the technologies that are used. A more detailed description is presented in Appendix D.

RT-PCR

RT-PCR is a molecular biology technique that combines reverse transcription with real-time PCR (see Glossary, Appendix B). This methodology allows the quantification of a defined RNA molecule. It is accomplished by reverse transcription of the specific RNA into its complementary DNA, followed by amplification of the resulting DNA using PCR. The quantification of the DNA produced after each round of amplification is accomplished by the use of fluorescent dyes that intercalate with double-stranded DNA, or by modified DNA oligonucleotide probes (see Glossary, Appendix B) that fluoresce when hybridized with complementary DNA.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-breastcanoutf2.jpg.

   Figure 2. Quantitative RT-PCR. Panel A: PCR reaction using sets of quenched primers and probes. Panel B: binding of fluorescent probe molecules to double-stranded DNA. Panel C: fluorescence intensity curves for different dyes and samples: on the x-axis, the number of PCR cycle is shown, and on the y-axis, the corresponding fluorescence detected is indicated; the dashed line is used to calculate the cycle threshold for each sample. Panel D: computation of the relative levels of expression

In a PCR template, relative ratios of the product and reagent vary. At the beginning of the reaction, reagents are in excess, and template and product are present in low concentrations and do not compete with primer binding, so that the amplification proceeds at a constant, exponential rate. After this initial phase, the process enters a linear phase of amplification, and then in the late reaction cycles, the amplification reaches a plateau phase and no more product accumulates To achieve accuracy and precision, it is necessary to collect quantitative data during the exponential phase of amplification, since in this phase the reaction is extremely reproducible. In RT-PCR, this process is automated, and measurements are made at each cycle. Finally, several implementations of this technique allow multiple DNA species to be measured in the same sample (multiplex PCR), since fluorescent dyes with different emission spectra may be attached to the different probes. Multiplex PCR allows internal controls to be co-amplified with the target transcripts (see Glossary, Appendix B) and permits allele discrimination in single-tube, homogeneous assays (Figure 2).

This technique is extremely sensitive. The development of novel chemistries and instrumentation platforms has led to widespread adoption of real-time RT-PCR as the method of choice for quantifying absolute changes in gene expression. Moreover, this technique has become the preferred method for validating results obtained from microarray analyses and other techniques that evaluate gene expression changes on a global scale.

Microarrays

The analysis of gene expression by microarray technology is based on the Watson-Crick pairing of complementary nucleic acid molecules. In this technique, a collection of DNA sequences, called probes (see Glossary, Appendix B), are “arrayed” on a miniaturized solid support (microarray) and used to detect the concentration of the corresponding complementary RNA sequences, called targets (see Glossary, Appendix B), present in a sample of interest. The advancements made in attaching or synthesizing nucleic acid sequences to solid supports and robotics have allowed investigators to miniaturize the scale of the reactions, and it is now possible to assess the expression of thousands of different genes in a single reaction.2931

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-breastcanoutf3.jpg.

   Figure 3. Schematic model for microarray hybridizations. Panel A: two-color scheme design. Panel B: single-color design

In the basic microarray experiment, RNA harvested from the sample of interest is labeled with a fluorescent dye and hybridized to the microarray, then incubated in the presence of RNA from a different sample labeled with a different fluorescent dye. In this two-color experimental design, samples can be directly compared to one another or to a common reference RNA, and their relative expression levels can be quantified. After hybridization, gray-scale images corresponding to fluorescent signals are obtained by scanning the microarray with dedicated instruments, and the fluorescence intensity corresponding to each gene investigated is quantified by specific software. After normalization, the intensity of the hybridization signals can be compared to detect differential expression by using sophisticated computational and statistical techniques (Figure 3).

Sources of Variability in Gene Expression Analysis

Gene expression analysis poses several general challenges that can affect the reproducibility and reliability of the measurements obtained. The control of such sources of variability is clearly a concern when such technologies are used to make decisions about the clinical management of patients. Given the complexity of the procedures used in this type of investigation, the sources of uncertainty are multiple, from the preparation of tissue specimens to the computational analysis used to quantify expression levels.

The first source of variability relates to the various types of specimens that can be used to prepare the RNA to be used in gene expression analysis, including tissue specimens obtained in vivo. In this case, the resulting RNA template will be a mixture of the RNA content of all the cells contained in the specimen, and the relative content of the different cell populations (malignant vs. normal) present in the specimen processed is a major source of variability in gene expression. For this reason, special care must be taken when tumors are sampled for gene expression analysis. In general, macro- or micro-dissection of the samples is performed to ensure that the specimens contain a sufficient percentage of cancer cells.

A second major source of variability is related to the protocols used to prepare the specimens, since several alternatives have been used in the field, including the use of formalin-fixed, paraffin-embedded (FFPE) tumor specimens or laser-captured, micro-dissected (see Glossary, Appendix B) specimens and fresh or snap-frozen samples. Other factors likely to affect RNA quality include storage time and the reagents, and particular batches used. Unlike DNA, RNA is very unstable. The degradation of RNA can be triggered by pH changes as well as by specific enzymes called ribonucleases (see Glossary, Appendix B) that are present in cells and that can remain active in the RNA preparation if the RNA isolation is not properly carried out.

Watson-Crick hybridization of complementary nucleic acid moieties is the fundamental principle that forms the basis of any gene expression analysis. For this reason, sequence selection and gene annotation (see Glossary, Appendix B) are among the most relevant factors that can contribute to variability in the analysis of gene expression.

As in any other laboratory investigation, the use of different platforms (see Glossary, Appendix B), protocols, and reagents can also affect the variability of the obtained measurements, and thus the reproducibility within and across laboratories. Indeed, numerous platforms exist to perform both RT-PCR and microarray-based gene expression analyses. Moreover, within each technique, the same procedure can be performed using different instruments, each with its own different operational characteristics and performance.

Finally, since gene expression measures are virtually never used as raw output but rather undergo sequential steps of mathematical transformation, another source of variability is data pre-processing and analysis. Moreover, the levels of gene expression can be further processed and combined according to complex algorithms to obtain composite summary measurements that are associated with the phenotypes investigated.

International standards have been developed to address the quality of microarray-based gene expression analysis, focusing on documentation of experimental design, details, and results (see MIAME in Glossary, Appendix B).32 Several publications also have addressed the levels of reproducibility across platforms and laboratories.33,34 Such efforts emphasize the importance of trying to control the many described sources of variability in gene expression analysis and of ensuring that the information derived from such analyses is specific and does not represent accidental associations.

Objectives of the Evidence Report

The overall purpose of this evidence report is to review and synthesize the available evidence concerning the analytic and clinical validity of breast cancer gene expression profiling in predicting disease recurrence, as well as its efficacy and effectiveness in improving chemotherapy choices and subsequent outcomes (clinical utility) in women newly diagnosed with early-stage breast cancer. The report was prepared by the Evidence-based Practice Center (EPC) at the Johns Hopkins University (JHU) Bloomberg School of Public Health in response to a task order issued by the Agency for Healthcare Research and Quality (AHRQ) on behalf of the Centers for Disease Control and Prevention (CDC) Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Project. The key questions we were charged with addressing in this evidence report were:

  • 1

    What is the direct evidence that gene expression profiling tests in women diagnosed with breast cancer (or any specific subset of this population) lead to improvement in outcomes?

  • 2

    What are the sources of and contributions to analytic validity in these gene expression-based prognostic estimators for women diagnosed with breast cancer?

  • 3

    What is the clinical validity of these tests in women diagnosed with breast cancer?

    • a

      How well does this testing predict recurrence rates for breast cancer when compared to standard prognostic approaches? Specifically, how much do these tests add to currently known factors or combination indices that predict the probability of breast cancer recurrence (e.g., tumor type or stage, ER and HER-2 status)?

    • b

      Are there any other factors, which may not be components of standard predictors of recurrence (e.g., race/ethnicity or adjuvant therapy), that affect the clinical validity of these tests and thereby the generalizability of the results to different populations?

  • 4

    What is the clinical utility of these tests?

    • a

      To what degree do the results of these tests predict the response to chemotherapy, and what factors affect the generalizability of that prediction?

    • b

      What are the effects of using these two tests and the subsequent management options on the following outcomes: testing- or treatment-related psychological harms, testing- or treatment-related physical harms, disease recurrence, mortality, utilization of adjuvant therapy, and medical costs?

    • c

      What is known about the utilization of gene expression profiling in women diagnosed with breast cancer in the United States?

    • d

      What projections have been made in published analyses about the cost-effectiveness of using gene expression profiling in women diagnosed with breast cancer?

This task is of particular relevance, since the National Cancer Institute (NCI) recently announced its sponsorship of a clinical trial to be conducted by The North American Breast Cancer Intergroup (TBCI) assessing individualized options for breast cancer treatment: the Trial Assigning Individualized Options for Treatment (TAILORx). In this trial, tumors of patients with ER-positive and lymph node-negative breast cancer (and who will be treated with tamoxifen) will be tested using the Oncotype DX assay, and patients will be divided into groups according to the recurrence scores derived from the use of the assay. Patients showing low recurrence scores will receive endocrine therapy alone, while patients with high recurrence scores will receive endocrine therapy and adjuvant chemotherapy. Patients with mid-range scores will receive endocrine therapy and be randomly assigned to chemotherapy or no chemotherapy. This trial is designed to evaluate the treatment implications of Oncotype DX results in a large representative patient population, focusing primarily on patients with intermediate recurrence scores. The trial will also allow for generation of new data on patients with recurrence scores near the ends of the spectrum. Patients at the low end of the recurrence score spectrum will be compared to a pre-specified target of 95 percent recurrence-free survival. It should be noted that the cutoff values used in the TAILORx trial are different than those delineated in other studies of Oncotype DX. The results of the TAILORx trial will not be available for some time (around 2013) and with growing interest in and use of these tests (particularly Oncotype DX) in the oncology community, this evidence review could have an impact on clinical practice in the interim.35

A separate trial (MINDACT, or Microarray in Node-negative Disease may Avoid ChemoTherapy) has recently been activated by TRANSBIG (Translating molecular knowledge into early breast cancer management: building on the Breast International Group (BIG)), a research network of 39 institutions in 21 countries. The trial will compare two different ways of assessing the risk of cancer recurrence and making therapeutic decisions: a “traditional method” using Adjuvant! Online versus the MammaPrint assay. The rationale for this study is that many women who actually have “low risk” tumors are currently classified as “average” or “high risk” and therefore ultimately are recommended to receive adjuvant chemotherapy that ultimately may be of no benefit. The investigators estimate that 12–20 percent of women with early-stage breast cancer fall into this category.36

Structured Approach to Assessment of the Questions

The EPC team used a structured approach to assess the evidence regarding the key questions listed above. The structured approach was based on the following questions:

  • 1

    What was tested? One fundamental concept is the distinction between the investigated gene expression signatures (see Glossary, Appendix B) and the actual gene expression-based tests. The gene “signature” is the collection of genes whose expression levels are measured in a given test, together with the algorithm that combines those levels into a prognostic index; akin to a test's “recipe.” But just like a recipe can be implemented in subtly different ways with different results, this signature can be measured using a variety of technologies and procedures which may not be identical to those used in the actual marketed test being offered to patients. This distinction is important because clinicians' decisions, patients' choices, and the resulting benefits and harms will ultimately depend on the performance of marketed tests rather than on the more general gene expression signatures, although they typically track closely. Information about the signatures is highly relevant to the assessment of the marketed test, but is not identical.

  • 2

    What population was tested? This question required consideration of whether the study involved a representative sample of patients, from a clinical series or from a clinical trial subject to detailed eligibility criteria. This also required consideration of whether the population was clinically homogeneous enough for the implications of risk prediction to be clear and similar for every member of the study population (or for each subgroup). For example, predicting the relapse of patients on tamoxifen therapy may be different than predicting outcomes for untreated patients. The latter tests “intrinsic tumor aggressiveness,” which may not be the same as the factors that determine resistance to tamoxifen.

  • 3

    Was the study a developmental or validation study? Developmental studies were defined as the original reports in which new gene expression signatures were first described or in which previously developed gene expression signatures were first proposed to have a use different from the original use (e.g., the use on different subsets of patients with different purposes). Validation studies were defined as those that confirmed results in independent populations (with approximately the same characteristics as the population of the corresponding development study). If a developmental study, were appropriate statistical methods used to adjust for multiplicities, and was internal validation done? If a validation study, were all the test procedures, cutoffs, definitions, and measurements predefined?

  • 4

    Is it clear, from a clinical decisionmaking perspective, what is the incremental value of the test over and above standardized clinical predictors? It was not sufficient to simply insert clinical predictors into regression equations since this does not properly quantify the numerical consequences of decisions made with and without the new test.

  • 5

    Were the ways in which the tests had been evaluated optimal for clinical decisionmaking? This question required consideration of the choice of cutoffs, definition of categories, and combinations (or lack thereof) with other predictors.

  • 6

    What was the strength of the study design used to estimate clinical utility? Randomized controlled trials, with all samples taken concurrently, which could have taken place in the past, provide the strongest evidence of utility.

  • 7

    For studies of clinical utilization, what specific information was provided to patients and their physicians? Such studies are informative only if they are specific about the information that was given and how it informed decisionmaking.

Using this structured approach, the EPC team evaluated the evidence regarding the key questions of analytic validity, clinical validity, and clinical utility of each test, evaluated separately. The EPC team then used the review of the evidence to formulate both test-specific and general conclusions.

Chapter 2. Methods

The CDC submitted a request for an evidence report on the “Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes” to the AHRQ on behalf of the EGAPP. This evidence report will be used to inform the CDC's Working Group as part of their work in formulating evidence-based recommendations. Our project consisted of recruiting technical experts, formulating and refining the specific questions, performing a comprehensive literature search, summarizing the state of the literature, constructing evidence tables, and submitting the evidence report for peer review.

Recruitment of Technical Experts and Peer Reviewers

At the beginning of the project, we assembled a core team of experts from JHU who had strong expertise in medical oncology, clinical trials, and biostatistics as well as a special interest in gene expression profiling tests. We also recruited external technical experts from diverse professional backgrounds, including academic, clinical, and corporate settings. The core team asked the technical experts and members of the EGAPP working group to give input regarding key steps of the process, including the selection and refinement of the questions to be examined. Peer reviewers were recruited from professional societies with an interest in breast cancer and gene expression profiling tests. Representatives from Agendia (MammaPrint®), Genomic Health, Inc. (Oncotype DX™), and Quest Diagnostics, Inc.® (BCP or H/I ratio) were also asked to review the report (see Appendix Ea).

Key Questions

The core team worked with the technical experts and representatives of the EGAPP and AHRQ to develop the Key Questions that are presented in the Specific Aims section of Chapter 1 (Introduction). The Key Questions apply to any gene expression profiling test, but they have been focused primarily on two gene expression profiling tests; Oncotype DX, and MammaPrint, because these are the tests that were expected to be commercially available in 2007. During the course of this review, the third gene expression profiling test, the Breast Cancer Profiling (BCP, or H/I ratio) Test (AviaraDX through Quest Diagnostics, Inc.) came to our attention. Although the BCP test was not included in our initial consideration of the Key Questions, we added studies regarding this test as an example of the types of gene expression profiling tests that are likely to be available in the coming years.

Literature Search Methods

Searching the literature involved identifying reference sources, formulating a search strategy for each source, and executing and documenting each search. For the searching of electronic databases we used medical subject heading (MeSH) terms that were relevant to breast cancer and gene expression profiling. We used a systematic approach for searching the literature to minimize the risk of bias in selecting articles for inclusion in the review. In this systematic approach, we were very specific about defining the eligibility criteria for inclusion in the review. The systematic approach was intended to help identify gaps in the published literature.

This strategy was used to identify all the relevant literature that applied to our Key Questions. The team specifically looked for articles that would provide information about the gene expression profiling tests identified in the Key Questions. We also looked for eligible studies by reviewing the references in eligible studies and pertinent reviews, by querying our experts, by contacting the manufacturers of the two tests, and by reviewing abstracts from relevant professional conferences.

Sources

Our comprehensive search plan included electronic and hand searching. On January 9, 2007, we ran searches of the MEDLINE® and EMBASE® databases, and on February 7, 2007, we searched the Cochrane database, including Cochrane Reviews and The Cochrane Central Register of Controlled Trials (CENTRAL), and CINAHL®. All searches were limited to articles published in 1990 or later. This cut-off year was established based on the introduction date of the MeSH heading “gene expression profiling,” 2000, and the introduction date of the MeSH heading “gene expression,” 1990. Also, test searches of earlier dates returned limited and irrelevant results.

“Gray” literature was searched following a protocol that was reviewed and approved by EGAPP and the technical expert panel:

  • 1

    Conference abstracts were reviewed using the same criteria as for journal articles but were only included if we felt we had a sufficient understanding of the underlying study and the data reported were critical enough to merit inclusion.

  • 2

    Web sites for the gene profiling tests included in this review, Agendia (MammaPrint®) and Genomic Health (Oncotype DX™), were searched for additional information not available in the peer-reviewed literature.

  • 3

    Agendia and Genomic Health, Inc. were contacted directly with requests for the following information:

    • a

      A listing of articles that applied to the analytic validity or clinical utility of the gene profiling test,

    • b

      Marketing materials on the gene profiling test, and

    • c

      Any pertinent unpublished data.

  • 4

    We searched the Web site of the Food and Drug Administration (FDA) Center for Devices and Radiological Health for additional publicly available, unpublished information. 3739

  • 5

    A request was sent to the Center for Medical Technology Policy (CMTP) Gene Expression Profiling for Early Stage Breast Cancer Work Group to provide all background materials available on our study topic.

Search Terms and Strategies

Search strategies specific to each database were designed to enable the team to focus available resources on articles most likely to be relevant to the Key Questions. We developed a core strategy for MEDLINE, accessed via PubMed, based on an analysis of the MeSH terms and text words of key articles identified a priori. The PubMed strategy formed the basis for the strategies developed for the other electronic databases (see Appendix F).

Organization and Tracking of the Literature Search

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-breastcanoutf4.jpg.

   Figure 4. Summary of literature search and review process (number of articles )

The results of the searches were downloaded into ProCite® version 5.0.3 (ISI ResearchSoft, Carlsbad, CA). Duplicate articles retrieved from the multiple databases were removed prior to initiating the review. We then reviewed the citations by scanning the titles, abstracts, and the full articles as described below (Figure 4).

Title Review

To efficiently identify citations that were obviously not relevant, paired reviewers first independently scanned the article titles. For a title to be eliminated at this level, both reviewers had to indicate that it was clearly ineligible (see Appendix G, Title Review Form).

Abstract Review

Inclusion and Exclusion Criteria

The abstract review phase was designed to identify articles that reported on the analytic validity, clinical validity, and/or clinical utility of the gene expression profile tests of interest. Abstracts were reviewed independently by two investigators and were excluded only if both investigators agreed that the article met one of the following exclusion criteria:

  • 1

    The study applied only to breast cancer biology;

  • 2

    The study did not involve Oncotype DX or MammaPrint,

  • 3

    The study did not involve original data or original data analysis;

  • 4

    The study did not involve women;

  • 5

    The study did not involve breast cancer patients;

  • 6

    The study was not in the English language; or

  • 7

    The study did not apply to the key questions.

We excluded letters to the editor and editorials when they did not present original data (usually in the form of electronic supplements in the case of letters). If a letter or editorial cited Some original data, it generally was not sufficiently original for consideration in this report. As mentioned earlier, the initial scope of this project did not include the H/I ratio test, and thus this test was not identified on the abstract review form (Appendix G, Abstract Review Form).

Abstracts were promoted to the article review level if both reviewers agreed that the abstract could apply to one or more of the key questions. Differences of opinion regarding abstract eligibility were resolved through consensus adjudication.

Article Inclusion/Exclusion

Full articles selected for review during the abstract review phase underwent another independent review by paired investigators to determine whether they should be included in the full data abstraction. At this phase of review, investigators determined which of the Key Questions each article addressed (see Appendix G, Article Inclusion/Exclusion Form). If articles were deemed to have applicable information, they were included in the final data abstraction. Differences of opinion regarding article eligibility were resolved through consensus adjudication. A list of articles excluded at this level is included in Appendix H.

Data Abstraction

The purpose of the article review was to confirm the relevance of each article to the research questions and to collect evidence that addressed the questions. Articles eligible for full review had to address one or more of the Key Questions. Because of the heterogeneous nature of the applicable literature, we used a loosely structured approach for extracting data from the studies. Reviewers were given a standard matrix in which to enter data from each article (Appendix G, Data abstraction tables).

For all the data abstracted from the studies, we used a sequential review process. In this process, the primary reviewer completed all data abstraction forms. The second reviewer checked the first reviewer's data abstraction forms for completeness and accuracy. Reviewer pairs were formed to include personnel with both clinical and methodological expertise. Reviewers were not masked to the articles' authors, institutions, or journal.40 In most instances, data were directly abstracted from the article. If possible, relevant data were also abstracted from the figures. A number of articles provided links to supplemental data, and these resources were used during the data abstraction process. Differences of opinion were resolved through consensus adjudication.

For all articles, reviewers extracted information on general study characteristics, such as study design, study participants, and sample size (see Appendix G, Data abstraction tables). Data abstracted regarding participants' characteristics were: information on intervention arms, age, menopausal status, race, diagnoses, methods of diagnosis, exclusion criteria, treatments, and treatment outcomes.

An analytic validity (Key Question 2) data abstraction matrix was developed by the team (see Appendix G, Data abstraction tables). Our data abstraction was designed to capture data in the following general areas: tumor specimens' processing validity, annotation validity; within- and across-laboratory validity; and validity associated with gene expression data preprocessing and analysis.

Studies addressing clinical validity (Key Question 3a, 3b) and utility (Key Question 4a, 4b, 4c) were approached in a similar manner (see Appendix G, Data abstraction tables). The free-form tables developed for these questions were designed to capture details regarding a study's context, the methods used to analyze the data collected, results of the study, and conclusions made by the study authors.

Only three articles addressed the cost-effectiveness of the gene expression profiling tests. Therefore, the reviewers did not use standardized data abstraction forms to abstract results from these studies. Instead, the reviewers extracted information directly into the table that is presented as Evidence Table 5. Please refer to the Philips, 2004 41 article for a detailed explanation of why these domains and their sub-domains are important.

Quality Assessment

We used a synthesis of the general principles of the REporting recommendations for tumour MARKer prognostic studies (REMARK)42 and Standards for Reporting of Diagnostic Accuracy (STARD)43 guidelines. The REMARK guidelines were developed to encourage transparent and relevant reporting of study design, preplanned hypotheses, patient and specimen characteristics, assay methods, and statistical analysis methods, in order to help others judge the usefulness of the data presented.42 STARD was developed to improve the accuracy and completeness of studies reporting diagnostic accuracy, in order to allow readers to assess the potential for bias in a study and to evaluate the generalizability of the results43 (Appendix G, Quality Assessment Matrix).

Because of the extreme variability of the articles included in this report, we did not systematically apply the general principles to them. The strengths and weaknesses of each study were also dependent on the question(s) to which it applied. These strengths and weaknesses are highlighted in the Results section and the Discussion.

The EPC team appraised economic analyses using published guidelines for good practice in decision-analytic modeling in health technology assessment (Phillips 2004). The appraisal took into consideration the domains of structure, data, and consistency (see Evidence Table 5 for details).

Data Synthesis

We created a set of detailed evidence tables containing all the information extracted from eligible studies and stratified the tables according to the gene expression profile test. The investigators reviewed the tables and eliminated items that were rarely reported. They then used the resulting versions of the evidence tables to prepare the text of the report and selected summary tables.

Data Entry and Quality Control

Initial data were abstracted by the investigators and entered directly into the data abstraction tables. Second reviewers were generally more experienced members of the research team, and one of their main priorities was to check the quality and consistency of the first reviewers' answers. In addition to the second reviewers checking the consistency and accuracy of the first reviewers, a senior investigator examined all reviews to identify problems with the data abstraction. If problems were recognized in a reviewer's data abstraction, the problems were discussed at a meeting with the reviewers. In addition, research assistants used a system of random data checks to assure data abstraction accuracy.

Grading of the Evidence

After reviewing the available evidence on the Key Questions, the core team concluded that it would be inappropriate to grade the overall body of evidence using any of the published schemes for grading evidence. None of the grading schemes fit the nature of the data in these studies about gene expression profiling tests. The team therefore decided that it was more appropriate to focus on the specific strengths and weaknesses of the studies on each Key Question.

Peer Review

Throughout the project, the core team sought feedback from the external technical experts and the EGAPP Working Group through ad hoc and formal requests for guidance. A draft of the report was sent to the technical experts and peer reviewers, as well as to representatives of AHRQ, the CDC, the NIH, and the FDA. In response to the comments from the technical experts and peer reviewers, we revised the evidence report and prepared a summary of the comments and their disposition that was submitted to the AHRQ.

Chapter 3. Results

Key question 1. What is the direct evidence that gene expression profiling tests in women diagnosed with breast cancer, or any specific subset of this population, lead to improvement in outcomes?

In a study defined as providing direct evidence of improvement in outcomes, the use of the test in decisionmaking is compared to not using the test, with health outcomes as an endpoint, generally in the form of an RCT. There is currently no direct evidence that the investigated gene expression profiling tests lead to improvement in outcomes in any subset of women diagnosed with breast cancer. Two ongoing RCTs aim to provide almost direct evidence for Oncotype DX™, and for MammaPrint®. These studies are described at the end of this chapter.

Key question 2. What are the sources of and contributions to analytic validity in these gene expression-based prognostic estimators for women diagnosed with breast cancer?

Analytical validity is usually assessed by determining how much observed measurements differ from expected values derived from a standard reference method. In the measurement of gene expression, however, universal standard reference RNAs and universally accepted, definitive methods of analysis are not available. Consequently, a definitive evaluation of the analytic validity of such type of test is difficult. It is more appropriate to focus instead on test variability. In clinical use, gene expression-based prognostic tests involve multiple steps with individual components that are difficult to separate. Ultimately, reproducibility of patient classification into clinically relevant risk groupings is what matters. From this perspective, the most important sources of variability are tumor sampling and handling, specimen preparation, and biologic variation within and between different samples of the same tumor. The analytic validity of expression-based tests can therefore be assessed by asking the following questions:

  • 1

    How reproducible is the test when applied repeatedly to the same patient, either by examining the same specimen, or a different specimen?

  • 2

    How reproducible is the test over time?

  • 3

    What are the factors that most affect the overall performance of the test?

Few existing studies directly address analytical issues involved with the assays, and additional information could only be collected from clinical studies. Overall, this evidence was heterogeneous, spanning technical aspects, reproducibility, the number of successfully performed assays, or the comparison of RNA and protein levels of individual genes. Table 1 describes the three assays; Oncotype DX, MammaPrint, and the H/I ratio (HOXB13 and IL17RB).

Oncotype DX™

Table 2

Successful assays, Oncotype DX™
Study, yearProtocolMeasureSuccessTumor < 5%Poor RNAPathological reviewRT-PCRLow reference genesClinically ineligible
Chang, 200755StandardFFPE80/97 (82.4%)16 /97 (16.5%)1/80 (1.2%)
Cobleigh, 200547StandardFFPE78/85 (91.7%)7 (8.2%)1*
Esteva, 200548StandardFFPE149/220 (67.7%)42/220 (19.0%)*4/220 (1.8%)3 /220 (1.3%)22/220 (10%)
Gianni, 200549StandardFFPE89/95 (93.7%)2 patients (2.1%)4 patients (4.2%);
Habel, 200650StandardFFPE865/79019 (7.9%)59§1%
Mina, 200651StandardFFPE45/57 (78.9%)3/57 (5.3%) (< 20%)9/57 (15.8%)
Oratz, in press56StandardFFPE72 /74 (97.3%)2 /74 (2.7%)
Paik, 200428StandardFFPE668/675 (98.9%)675/754 (89.5%)
Paik, 200653StandardFFPE651/670 (97.2%)
*

Clinically Ineligible (Stage IV)

Studies performed according to Oncotype DX protocols, but not performed at Genomic Health

It is not clear what the total number was, since it was not reported

§

59 tumors underwent macro-dissection; it is not clear whether they were used in the subsequent analyses

Out of the 70 eligible patients 57 were analyzed (No consent: 3 patients; No specimen available: 10 patients)

The total of 754 (79+675) patients was computed and not clearly reported in the study

FFPE = formalin fixed paraffin embedded; RNA= ribonucleic acid; RT-PCR= reverse transcriptase polymerase chain reaction.

Evidence about the analytic validity of Oncotype DX is available from two technical studies, Cronin et al, 2004,44 and Cronin et al. 2007,45) and from several clinical reports. Information about the overall success rate of the assay was documented in 9 studies (Chang, 2007,55 Cobliegh, 2005,47 Esteva, 2005,48 Gianni, 2005,49 Habel, 2006,50 Mina, 2006,51 Oratz, in press,52 Paik, 2004,28 and Paik, 200653). This success rate ranged from 78.9 percent to 98.9 percent, and only some of the studies provided detailed descriptions of the reasons for assay failure. Reported failures were mainly ascribed to an insufficient number of cancer cells in the specimens, to poor RNA quality, and in a few cases, to failure of the RT-PCR technique. A synopsis of this evidence is provided in Table 2.

Table 3

Variability and reproducibility, Oncotype DX™
Study, yearAims and Methods:ResultsConclusions
Cronin, 200745To assess individual gene and RS reproducibilityReproducibility, CT measurements SD for the individual genes:Authors reported that the following procedures were performed to assure the reproducibility of the assay:
Repeated measurements of 2 aliquots of a single RNA across multiple days, operators, RT-PCR plates, 7900HT instruments, and liquid-handling robots Total SD range: 0.06 to 0.15 CT units A standard RNA control sample is assayed at least once per batch of patient (46 samples)
Mixed-effect ANOVA was used to estimate components of variance; Between day SD range: 0 to 0.055 CT units PCR controls are run in every assay plate
 Between plated SD range: 0 to 0.090 CT units RT-PCR failures are excluded from analysis
 Within plated SD range: 0.057 to 0.147 CT units Expression values are assigned when at least 2 of 3 assay wells provide acceptable RT-PCR results
At a CT of 30 a maximum SD of 0.15 translates into a CV of 0.5% All 21 genes must have an expression value assigned for an RS to be calculated and reported
The largest differences between operators, liquid handling robots, and 7900HT instruments < 0.5 CT
Reproducibility, CT measurements SD for the RS:
 Total SD was 0.792 RS units
 Between day SD was 0 RS units
 Between plated was 0 RS units
 Within plated was 0.792 RS units
Habel, 200650To assess RS reproducibilityRS (as a continuous value) SD and Pearson's correlation observed in two unpublished studies:
Pearson's correlation and ANOVA to assess within-patients correlation and variability: Overall between blocks SD was 3.0 RS units
 60 blocks that did not undergo macro-dissection from 20 patients (2 to 5 blocks per patient); For 16 of the 20 patients, the between blocks SD was < 2.5 RS units
 49 core biopsies or tumor resection blocks Pearson's correlation = 0.86
Similar results from the second study
Paik, 200428To assess individual genes and RS reproducibilityReproducibility evaluation:
Reproducibility within and between blocks was assessed by performing the assay in five serial sections from six blocks in two patients 16 Cancer genes SD ranged from 0.07 to 0.21 CT units;
 Within-block RS SD = 0.72 RS unit (95% CI = 0.55 to 1.04);
 Total within-patient SD (including between and within-block SD) = 2.2 RS units;
Similar variability in the RS was observed in reanalysis of clinical trial samples on separate days with different reagent lots (data not shown).

RS = recurrence score; RNA=ribonucleic acid; RT-PCR = reverse transcriptase polymerase chain reaction; ANOVA=analysis of variance; CT = cycle threshold; SD = standard deviation

Data on assay variability and reproducibility were available from 3 studies (Cronin, 2007,45 Habel, 2006,50 and Paik, 200428). These studies assessed the variability of repeated gene expression measurements using RNA from either the same or different FFPE blocks at repeated time points, and across different instruments and operators. Data reported in the study concerned the variability of individual genes in the assay as well as the RS reproducibility. Variability evidence was reported for 66 FFPE blocks from 22 distinct patients, and from repeated measurements of two aliquots of a pooled reference RNA. Overall, the standard deviation (SD) for the recurrence score was below 3 RS units, although the authors did not discuss the impact on risk stratification. This evidence is reported in Table 3.

Table 4

Analytic validity, Oncotype DX™
Study, yearContext and MethodsResultsComments
Cronin, 200444RT-PCR protocol optimization from FFPE specimensCorrelation = 0.91, p value < 0.0001Gene expression profiling analysis is possible using FFPE blocks and comparable to results obtained from frozen specimens.
Pearson's correlation to compare FFPE and frozen specimen
Comparison between FFPE and frozen specimens, 48 genes
Cronin, 200745Analytic components addressed:The LOD and the LOQ for all the genes proved to be within the pre-specified limits of CT units (< 40 cycles)Authors reported that the following procedures were performed to assure the reproducibility of the assay:
 LODAmplification efficiencies for all the genes ranged from 75.3% (GAPDH) to 112.1% (BAG1), with values exceeding 100% due to the cumulative nature of the analysis along the sample-dilution series A standard RNA control sample is assayed at least once per batch of patient (46 samples)
 LOQAssay linearity and dynamic range: PCR controls are run in every assay plate
 Amplification efficiency 6 genes were linear over the entire range; RT-PCR failures are excluded from analysis
 Dynamic range linearity 4 genes were linear over a range of 2e-8 to 2000 ng; Expression values are assigned when at least 2 of 3 assay wells provide acceptable RT-PCR results
 Accuracy and precision The estimated maximal deviation from linearity was below 1 CT over a linear range > 2000-fold, as specified by CLSIAll 21 genes must have an expression value assigned for an RS to be calculated and reported
 ReproducibilityAssay quantitative bias and precision at the 2-ng/well for the 16 cancer-related genes:
A pooled reference RNA was used to perform repeated measurement and a serial dilutions experiment to assess LOD, LOQ, amplification efficiency, dynamic range linearity, and accuracy and precision range = -10% (BAG1) to 6% (CTSL2)
In all the experiments template input was equivalent to 2 ng RNA; In the linearity study input varied between 2×10-10 and 2000 ng per reaction estimated mean deviation = 0.3%
Repeated measurements of 2 aliquots of a single RNA sample across multiple days, operators, RT-PCR plates, RT-PCR instruments, and liquid-handling robots were performed to assess individual gene and RS reproducibility by ANOVA CV of 5.7%
Assay quantitative bias and precision at the 2-ng/well for the 5 reference genes:
 range = -1.5% (GUSB) to 3.3% (ACTB)
 estimated mean deviation = 0.7%
 CV of 3.2%

RT-PCR = reverse transcriptase polymerase chain reaction; FFPE = formalin-fixed paraffin-embedded; LOD = limits of detection; LOQ = limits of quantification; RNA=ribonucleic acid; RS = recurrence score; ANOVA = analysis of variance; CT=cycle threshold; CLSI = Clinical and Laboratory Standards Institute; CV=coefficients of variation

Two studies addressed technical and operational aspects of analytic validity (Cronin, 2004,44 and Cronin, 200745). The first study presented data about the development the assay procedures, comparing gene expression measurements between frozen tumor specimens and FFPE blocks. The optimization of the RT-PCR primers (see Glossary, Appendix B1), and the normalization strategy were discussed. The second study addressed relevant analytic components of the assay, such as detection and quantification limits (limit of detection (LOD) and limit of quality (LOQ) respectively), amplification efficiency, linearity, dynamic range, accuracy, precision, and assay reproducibility. The available evidence is reported in Table 4.

Table 5

RT-PCR vs IHC comparison assays, Oncotype DX™
Study, yearComparisonMethodDetailsResultsComments
Chang, 200755IHC and RT-PCR ConcordanceCohen's κ statisticsPositivity from RT-PCR:ER, k = 1.00, 95% CI = 1.00 to 1.00;The RT-PCR technology provides a potential platform for a predictive test using small amounts of routinely processed specimens (core biopsies)
Methods for IHC not describedER, >6.5.0 CTPR, k= 0.57, 95% CI = 0.37 to 0.77;
PR, > 5.5 CTHER2, k= 0.74; 95% CI = 0.45 to 1.00
HER2, > 11.5 CT
Cobleigh, 200547IHC and RT-PCR ConcordanceCohen's κ statisticsPositivity from RT-PCR:ER, k = 0.83The accuracy and specificity of this RT-PCR assay of formalin-fixed, paraffin embedded tumor tissue was supported by comparison of the results of RT-PCR assay of RNA and IHC assay of protein for ER, PR, and HER2
IHC by standard biotin-streptavidin method and appropriate antibody (DAKO, CA, USA)ER, >6.5.0 CTPR, k= 0.40
ER+ if IHC staining was present in more than 10% of cellsHER2, = 0.67
Ki-67/MIB1, = 0.22
Cronin, 200444Methods for IHC and FISH not describedPercentage of agreementER positivity from RT-PCR: ER, > 8 CTER, 93.5%
PR, 84%
HER2, 100%
Esteva, 200548IHC and RT-PCR ConcordanceCohen's κ statisticsPositivity from RT-PCR:ER, k = 0.80, (=+ 0.05)
IHC methods in Esteva et al., 200388 and Wang et al., 200289Logistic model (IHC as a quantal response)ER, >7.0 CTPR, k = 0.48
PR, > 6.0 CTHER2, k = 0.6, (=+ 0.08)
HER2, > 11.5 CTLogistic model p value: < 0.001
RT-PCR specificity and sensitivity, in comparison to IHC for HER2 were obtained at the different thresholdsHER2, RT-PCR > 11.50CT:
 Specificity: 77%
 Sensitivity: 84%
HER2, RT-PCR > 11.5 CT:
 Specificity: 89%
 Sensitivity: 84%
HER2, RT-PCR > 12.0 CT:
 Specificity: 95%
 Sensitivity: 68%
Gianni, 200549IHC and RT-PCR ConcordanceCohen's κ statisticsPositivity from RT-PCR:ER, k = 0.83
IHC by reagents from Lab Vision-Neomarkers (Fremont, CA)ER, >6.5.0 CTPR, k= 0.40
ER+ if IHC staining was present in more than 10% of cellsHER2, = 0.67
Ki-67/MIB1, = 0.22
Habel, 200650ER status in medical and RT-PCR ConcordanceCohen's κ statisticsPositivity from RT-PCR:ER = 0.49, 95% CI 0.41–0.56
ER status methods not definedER, >6.5.0 CT115/122 discordances ER+ by RT-PCR
Mina, 200651IHC and RT-PCR ConcordancePercentage of agreementPositivity from RT-PCR:41/45 concordant samples:Gene expression analysis on core biopsy samples is feasible;
IHC by standard biotin-streptavidin method and appropriate antibody (DAKO, CA, USA)ER, >6.5.0 CT2 ER+ by IHC were ER- by RT-PCR
ER+ if IHC staining was present in more than 10% of cells;2 ER+ by RT-PCR were ER-by IHC
Agreement data for PR, KI-67 and HER2/neu were not reported
Paik, 200428ER and PR receptors proteins were measured by ligand-binding assaysHER2 DNA was measured by a fluorescence in situ hybridization assay (PathVysion, Vysis)

IHC = immunohistochemistry; RT-PCR = real time polymerase chain reaction; ER = estrogen receptor; PR = progesterone receptor; HER-2 = human epidermal growth factor receptor 2; FISH = Fluorescence in situ Hybridization; CT = cycle threshold; CI= confidence interval.

Finally, eight studies (Chang, 2007,55 Cobliegh, 2005,47 Esteva, 2005,48 Gianni, 2005,49 Cronin, 2004,44 Habel, 2006,50 Mina, 2006,51 and Paik, 200428) compared gene expression measurements of specific individual genes (ER, progesterone receptor (PR), HER-2) to measurements of the corresponding proteins produced by those genes as obtained by other techniques, in particular immunohistochemistry (IHC). Such studies used various cycle thresholds (CTs) to define positivity for the genes (see Table 5). Overall agreement between RT-PCR and IHC proved generally good for ER (k statistics ranging from 0.80 to 1). In one study (Habel, 200650), agreement was low (0.49), although RT-PCR measurements were comparable to data available in the clinical records. In general, agreement for PR and HER-2 was moderate or poor. Such evidence is reported here for completeness (see Table 5), although it does not contain any relevant information about the assay as a whole.

Individual studies are briefly described below.

Cronin et al., 2004.44 In this study, the authors discussed the primer (see Glossary, Appendix B) design optimization and expression level normalization necessary to obtain reliable RT-PCR measurements from archival FFPE samples, with the goal of establishing the reliability of their results with partially degraded RNA samples. The authors compared gene expression levels in 62 matched FFPE and frozen tissue specimens prepared from the same breast tumor. They showed that the relative expression profiles obtained from the two analyses were similar (correlation = 0.91, P value < 0.0001), although the magnitude of the measurements differed. They successfully corrected the differences using normalization based on the expression of five reference genes. Convincing evidence supporting the use of the implemented protocols in assessing gene expression levels from archival (i.e., formalin-fixed, paraffin-embedded) tumor specimens was shown.

The authors also analyzed several genes that were reported to show similar patterns in the literature20 for co-expression,54 and confirmed these correlations. Specifically, the expression of cytokeratin 5 and cytokeratin 17 (r = 0.85), LPL and RBP4 (r = 0.84), HER-2 and GRB7 (r = 0.71), ER1 and GATA3 (r = 0.6) were highly correlated.

Additionally, the authors compared RT-PCR measurements of ER, PR, and HER-2 (all components of Oncotype DX) expression to IHC analysis of corresponding protein levels, and to fluorescent in situ hybridization (FISH) analysis for HER-2 for a subset of 17 samples. The concordance among the different assays detecting the protein products of the genes and the relative RNA levels as measured by RT-PCR was high (94 percent, 84 percent, and 100 percent, respectively, see Table 5).

In summary, this study provided a foundation for the use of the Oncotype DX assay in archival tissue, although it did not contain data about the development of the RS (Appendix I, Evidence Tables 1, 2 and 3).

Paik et al., 2004.28 In this clinical study, the authors reported data on the variability of the RS, and the overall success rate of the assay. The authors evaluated the reproducibility of the Oncotype DX assay within and between FFPE blocks from the same patient. The Oncotype DX assay was carried out on 5 serial sections from 6 different blocks from 2 distinct patients. Seventy nine blocks out of 754 were not analyzed due to insufficient tumor content, but RT-PCR was successful in 668 of the remaining 675 (98.9 percent) tissue blocks.

For the 16 genes considered in the RS, the SD of expression ranged from 0.07 to 0.21 expression units across serial sections from the same block. The within-block SD of the combined RS proved to be 0.72 RS units (with 95 percent CI: 0.55–1.04), while the within-patient SD, which included both among-block and within-block variation, proved to be 2.2 RS units. The impact of this variation on the risk stratification provided by the RS was not discussed in the paper. The difference between the low- and high-risk groups is 14 RS units, far larger than the standard deviations reported. Although ER, PR and HER-2 were also assessed by other techniques, the agreement of the measurement obtained by the different technologies was not reported.

In summary, this paper reported evidence about the fraction of tissue blocks that can be successfully typed by the Oncotype DX assay, as well as limited data about the reproducibility of the RS between different sections and FFPE blocks from the same patient. The impact of such variability on the risk stratification was not examined (Table 3, Appendix I, Evidence Tables 1, 2 and 3).

Esteva et al., 2005.48 In this study, the authors evaluated the correlation of RS, both as a whole and broken into its components, with known standard prognostic markers in FFPE tumor specimens. Specifically, the relationship between RT-PCR and IHC for ER, PR, and HER-2 was examined. The concordance for PR status was poor (k of 0.48), high for ER (k = 0.81), and proved moderate for HER-2 (k = 0.60).

A logistic model using IHC HER-2 measurement as a quantal response indicated a significant (P < 0.0001) degree of correlation between IHC and RT-PCR. Sensitivity and specificity for HER-2 were also measured, using different RT-PCR cutoff points and positivity, and are reported in Table 5.

In summary, this paper reported evidence about the percentage of successfully-analyzed samples (67.7 percent, 149/220) in a large population from a single institution (M.D. Anderson Cancer Center) (Table 5, Appendix I, Evidence Tables 1, 2 and 3).

Cobleigh et al., 2005.47 This study reports on the development of the 21-gene Recurrence Score assay (Oncotype DX), Duplicated gene expression measures were obtained by RT-PCR in archival FFPE tumor tissue blocks. An initial set of 192 genes (187 cancer-related and 5 controls) were analyzed and 16 additional candidate genes were added at a later time. Ninety-one point six percent (78/85) of samples were successfully analyzed

IHC-measured protein levels and RT-PCR mRNA levels for ER, PR, HER-2, and Ki-67/MIB-1 (a proliferation marker of cancer cells) were compared. The concordance was high for both ER (k = 0.83) and HER-2 (k = 0.67), somewhat lower for PR (k = 0.40), and poor for Ki-67 (k = 0.22). (Table 5, Appendix I, Evidence Tables 1, 2 and 3).

Gianni et al., 2005.49 The authors of this paper evaluated the correlation of IHC-measured protein levels with RT-PCR mRNA measurements of ER and PR expression in tumors. The concordance was high for ER (k = 0.84; 95 percent CI, 0.71 to 0.96) and moderate for PR (k = 0.71; 95 percent CI, 0.56 to 0.86). This paper also reports preliminary evidence about the use of the Oncotype DX assay in fixed core biopsies from breast cancer patients. The percentage of successfully analyzed samples was 93.6 percent (89/95) (Table 5, Appendix I, Evidence Tables 1, 2 and 3).

Mina et al., 200651 In this study, the authors evaluated the usefulness of FFPE core biopsies from a completed phase II trial in identifying genes that correlated with a response to primary chemotherapy. Out of the 70 patients enrolled in the study, 67 gave their consent, and specimens from 57 patients were available to perform gene expression analysis by RT-PCR. Out of these 57 patients, gene expression levels could be accurately measured in 45 patients. Failures were due either to low RNA yield (9 patients) or low tumor content in the biopsies (3 patients).

In this study the authors compared the expression levels of ER mRNA obtained by RT-PCR to ER protein expression as measured by IHC. Using a pre-defined cutoff of 6.5 CT, 64 percent of the 45 tumors were ER positive, while 36 percent were considered ER negative. ER expression by IHC correlated well with ER mRNA expression by RT-PCR (see Table 5), and only four of the 45 samples did not show agreement. The authors concluded that gene expression analysis on core biopsy samples was feasible. Data for PR, HER-2 and Ki-67 were not reported.

In summary, this paper reported preliminary evidence about the expression of some of the Oncotype DX assay genes in fixed core biopsies from breast cancer patients. The percentage of successfully analyzed samples was about 79 percent (45/57), raising concerns about the real feasibility in clinical settings (Table 5, Appendix I, Evidence Tables 1, 2 and 3).

Habel et al., 2006.50 This study contains several results that are relevant for the overall analytic validity of the Oncotype DX assay. The authors cited two unpublished studies with data concerning the reproducibility of the RS. These studies analyzed, respectively, 60 blocks from a total of 20 distinct patients, and 49 core biopsies or resections from advanced breast cancer patients. In the first study the RS SD between different blocks from the same patient was 3.0 RS units, and less than 2.5 for 16 out of 20 patients. Similar results were claimed for the second study, although the actual data were not shown.

Finally, the authors compared the agreement of ER status, as obtained by RT-PCR, to the ER status reported in the medical records. A positive or negative classification was based on a CT cutoff point of 6.5. The RT-PCR failure rate was about 1 percent for specimens available after pathological review, and 7.9 percent of the samples were not assessable due to low tumor contents. In this study population, the concordance between RT-PCR and the medical chart information was only moderate (k = 0.49, 95 percent CI 0.41–0.56). In the multivariate models used in the following statistical analyses, the RT-PCR based ER status was used.

In summary, this paper reported a high percentage of successfully analyzed samples in a large population from a single institution and the reproducibility of the RS between different blocks from the same patient. The impact of such variability on the risk stratification was not addressed (Table 5, Appendix I, Evidence Tables 1, 2 and 3).

Paik et al., 2006.53 In this clinical study the authors reported several results that can be used as indirect evidence for the overall analytic validity of the Oncotype DX assay. Particularly relevant, FFPE blocks with sufficient tumor content were available from 670 of the 2,299 eligible patients in the NSABP N-20 trial, and the RT-PCR assay was successful on 651 of the 670 patients (97.2 percent). (Appendix I, Evidence Tables 1, 2 and 3).

Cronin et al., 2007.45 This study is the most extensive analysis to date of the analytic components of the Oncotype DX assay. Detection and quantification limits of the RT-PCR reactions, amplification efficiency, linearity, dynamic range, accuracy, precision, and assay reproducibility were investigated in serial dilution experiments, using a common RNA obtained by pooling 15 distinct RNA samples.

Detection and quantification limits proved to be well within the instrument's pre-specified CT unit limits for all the genes. Amplification efficiencies (100 percent efficiency means that the RT-PCR reaction products achieved perfect doubling) for the 16 cancer-related genes ranged from 75 percent to 112 percent, with an average of 96 percent, while the mean efficiency proved to be 88 percent for the reference genes, with a range from 75 percent to 101 percent.

Accuracy and precision studies were conducted at the target RNA concentration of 2 ng per assay well, which is what is used in the Oncotype DX assay. The mean percent bias from each gene target was -0.3 percent (ranging from -10 percent to 6 percent) for cancer-related genes, and 0.7 percent for reference genes (-1.5 percent to 3.3 percent), indicating 99 percent mean quantitative correctness at this assay condition. The CV averaged 5.7 percent for the cancer-related genes and 3.2 percent for reference genes. The implications of such variability for RS were not discussed.

Finally, individual gene and RS reproducibility were measured by performing repeated analyses across multiple days, operators, RT-PCR plates, RT-PCR instruments, and liquid-handling robots. Two operators obtained replicate CT measurements on two aliquots of a single RNA sample over the course of five days with three real time PCR instruments (7900HT instruments) and two liquid-handling robots. The study design allowed the estimation of all main effects, including operator, RT-PCR instrument, and liquid-handling robot. Total SD in CT measurements varied from 0.06 to 0.15 CT units across the 21 genes, and the upper bounds on 2-sided 95 percent confidence intervals for the CV were all within 10 percent. The authors reported that a maximum SD of 0.15 at a CT of 30 translates into a CV of 0.5 percent, allowing a 15 percent change in gene expression to be distinguished. The day-to-day SD for all 21 genes ranged from 0 to 0.055, the between-plate SD ranged from 0 to 0.09, while the within-plate SD ranged from 0.057 to 0.147. The standard deviation for the overall RS (total and within-plate) was 0.8 RS unit. The largest differences between operators, as well as between liquid handling robots and 7900HT instruments, were 0.5 CT units for each of the 21 Oncotype DX genes, while SD and CV for the RS were not reported.

In summary, this study presented extensively detailed results about several relevant analytic components of the assay (Table 4, Appendix I, Evidence Tables 1, 2 and 3).

Chang et al., 2007.55 This clinical study reported several results that can be used as indirect evidence for the overall analytic validity of the Oncotype DX assay. Ninety-seven FFPE blocks from core biopsies were analyzed by the standard assay protocols, and the percentage of successfully analyzed samples was 82.4 percent.

In summary, this paper provides preliminary evidence about the use of the Oncotype DX assay in fixed core biopsies from breast cancer patients (Table 2, Appendix I, Evidence Tables 1, 2 and 3).

Oratz et al., in press.56 This clinical study evaluated the impact of the Oncotype DX assay on clinical management, and also provided indirect evidence for the assay's overall analytic validity. Seventy-four FFPE blocks were analyzed by the standard assay protocols, and the percentage of successfully analyzed samples was 97.3 percent. No explicit eligibility criteria were used. The samples were included based on the request for analysis from the patient's clinician.

In summary this paper contains evidence about the use of the Oncotype DX assay on FFPE blocks from breast cancer patients (Table 2, Appendix I, Evidence Tables 1, 2 and 3).

MammaPrint®

Analytic validity and variability evidence for MammaPrint was available from two technical studies ( Ach, 2007,57 and Glas, 200658) and information on the overall success rate of the assay was documented in just one study, Buyse, 200659(80.9 percent).

Table 6

Successful assays, MammaPrint®
Study, yearProtocolMeasureSuccessTumor < 5%Poor RNAPathological reviewRT-PCRLow reference genesClinically ineligible
Buyse, 200659MammaPrintFreshtumors326/403 (80.9%)77/403 (19.1%)

RT-PCR=reverse transcriptase polymerase chain reaction.

Table 7

Reproducibility, MammaPrint®
Study, yearContext and MethodsResultsConclusions
Ach,200757MammaPrint assay intra- and inter-laboratory reproducibilityReplicate hybridizations Pearson's correlation at the same site:The authors showed very low influence on sample-to-reference ratios based on averaged triplicate measurements in the two-color experiments
Variation in RNA amplification and labeling, hybridization and wash, and slide scanning was measured on 4 tumors, dye-swap design, 24 slides (8 per site) For 1 tumor in 1 sub-array = 0.983RNA labeling was the largest contributor to inter-laboratory variation
Methods: To assess reproducibility in this study, ANOVA P values and Pearson's correlation were used For 2 tumor in 2 sub-arrays = 0.988Overall, despite this variation, measurement of 70-gene signature in three different laboratories was found to be highly robust
 For all the other technical replicates > 0.993
Scanning reproducibility across sites:
 Cy3: Pearson correlation >0.995, slope = 0.97
 Cy5: less reproducible (data not shown)
70-gene signature reproducibility:
 No differences between hybridization sites
 No differences between hybridization days (regardless of site)
Statistically significant difference (P value <0.05) between labeling sites for two tumors
Glas, 200658Pearson's correlation to assess correlation with original data and reproducibilityComparison to original 70-gene signature data, Pearson's correlations and in repeated measurements:Microarray technology can be used as a reliable diagnostic tool
ANOVA analysis to model variability in repeated experiments using the 70 genes of the signature 78 van't Veer21 patients, r = 0.92, p value < 0.0001The MammaPrint assay performed similarly to the original 70-genes signature
Reproducibility in time was assessed by repeated measurements of RNA aliquots: 145 (84+61) van de Vijver25 LN-negative patients: r = 0.88 p value < .0001
 1 patient with cosine correlation to Good profile = 0.61, for 12 months 49 patients analyzed twice, r = 0.995
 1 patient with cosine correlation to Good profile = -0.44, for 12 monthsReproducibility results from ANOVA analysis:
 1 border-line with cosine correlation to Good profile = 0.43, for 4 months No variation within individuals (p value = 0.96)
 Significant variation between individuals and genes
Reproducibility in time analysis results:
 For both patients assessed over a period of 12 months measurements SD was 0.028 of the cosine correlation
For the 1 border-line sample assessed over a period of 4 months measurements SD was 0.027 of the cosine correlation
This latter sample was misclassified 6 times (15%)

ANOVA = analysis of variance; Cy3: the green fluorescent dyes commonly used in two colors design microarray hybridization; Cy5 the red fluorescent dyes commonly used in two colors design microarray hybridization; LN = lymph node; SD = standard deviation

Table 8

Analytic validity, MammaPrint®
Study, yearContext and Methods:ResultsComments
Ach, 200757Context: MammaPrint assay was used to evaluate the intra- and inter-laboratory reproducibility of the assay involving three laboratories. Variation in RNA amplification and labeling, hybridization and wash, and slide scanning was measured on 4 tumors, dye-flip design, 24 slides (8 per site).Replicate hybridizations Pearson's correlation at the same site:The authors showed very low influence on sample-to-reference ratios based on averaged triplicate measurements in the two-color experiments;
Methods: To assess reproducibility in this study, ANOVA P values and Pearson's correlation were used. For 1 tumor in 1 sub-array = 0.983RNA labeling was the largest contributor to inter-laboratory variation;
 For 2 tumor in 2 sub-arrays = 0.988Overall, despite this variation, measurement of 70-gene signature in three different laboratories was found to be highly robust;
 For all the other technical replicates > 0.993
Scanning reproducibility across sites:
 Cy3: Pearson correlation >0.995, slope = 0.97
  Cy5: less reproducible (data not shown)
70-gene signature reproducibility:
 No differences between hybridization sites
 No differences between hybridization days (regardless of site)
Statistically significant difference (P value <0.05) between labeling sites for two tumors
Glas, 200658Context: MammaPrint assay development through re-analysis of patients from the van't Veer21 and van de Vijver25Comparison to original 70-gene signature data, Pearson's correlations and in repeated measurements:The authors demonstrate for the first time that microarray technology can be used as a reliable diagnostic tool;
A different reference RNA was used, as well as a different quantification method 78 van't Veer21 patients, r = 0.92, p value < 0.0001The MammaPrint assay performed similarly to the original 70-genes signature
Methods: 162 total samples from fresh-frozen specimens: 145 (84+61) van de Vijver25 LN-negative patients: r = 0.88 p value < .0001
84 patients from the van de Vijver25 cohort 49 patients analyzed twice, r = 0.995
All 78 patients form the van't Veer cohort21Reproducibility results from ANOVA analysis:
A combination of the two population above: 145 (84+61) LN-negative patients No variation within individuals (p value = 0.96)
49 patients analyzed twice Significant variation between individuals and genes
Reproducibility in time analysis results:
 For both patients assessed over a period of 12 months measurements SD was 0.028 of the cosine correlation
 For the 1 border-line sample assessed over a period of 4 months measurements SD was 0.027 of the cosine correlation
This latter sample was mis-classified 6 times (15%)

ANOVA = analysis of variance; Cy3: the green fluorescent dyes commonly used in two colors design microarray hybridization; Cy5 the red fluorescent dyes commonly used in two colors design microarray hybridization; LN = lymph node; SD = standard deviation

Data about variability and reproducibility were obtained in these studies using repeated gene expression measurements over time, within and across individual microarrays, across different laboratories, protocols instruments, and operators (see Tables 6, 7, 8). No comparisons were made between expression measurements of individual genes and their corresponding protein level by IHC.

The following is a brief description of each study.

Glas et al., 2006.58 In this study the authors reported a summary of the results obtained during the development of the commercially marketed version of the 70-gene prognostic signature,21,25 the expression array-based test known as MammaPrint. The authors evaluated and compared both technical aspects and the clinical validity of the assay using the originally published data (see Key Question 3).

MammaPrint uses a microarray accounting for 1,900 features (individual microarray locations where the probes are positioned), containing each of the 70 genes in the signature spotted in triplicates. In this paper the authors re-analyzed the data from the original series21,25 using the new array, a dye-swap hybridization design, a different reference RNA and a different approach to computing gene expression levels. Triplicate measurements were obtained for each gene of the 70-gene signature and summarized by an error-weighted average, rather than the approach proposed by Hughes et al., 2000,60 which was used in the original studies.

The results obtained with the new signature were comparable to the original results. Briefly, MammaPrint proved reproducible on the original development series21 (Pearson's correlation coefficient = 0.92 P value < 0.0001), and in a subset of the van de Vijver cohort25 (Pearson's correlation coefficient of 145/151 lymph node-negative patients = 0.88, P value < 0.0001). The replication of the experiment within patients and along time suggested high reproducibility as well. In particular, the Pearson's correlation coefficient on 49 patients analyzed twice was 0.995, and no significant variability within individuals was found by an analysis of variance (ANOVA) for the 70-gene signature P value = 0.96).

Risk classification by MammaPrint is obtained by measuring the cosine correlation of individual patients' gene expression profiles to the mean gene expression profile obtained in the van't Veer21 series. The variability of such correlation was measured by repeated analysis of 3 patients over time and showed very small SDs (0,028, 0,028 and 0.027 respectively).

In summary, this study reported detailed data about the development of the MammaPrint assays as it is offered in clinical settings, as well as data about the reproducibility of the assay within a single laboratory (Tables 7 and 8, Appendix I, Evidence Tables 6, 7 and 8).

Buyse et al., 2006.59 In this clinical study the authors reported several results that can be used as indirect preliminary evidence for the overall analytic validity of the MammaPrint® assay. Fresh frozen blocks from primary breast cancer patients collected in 5 distinct institutions were shipped for analysis to Agendia, and the percentage of successfully analyzed samples was 80.9 percent (326/403 patients) (Appendix I, Evidence Tables 6, 7 and 8).

Ach et al., 2007.57 The inter-laboratory reproducibility of the MammaPrint assay was assessed in this paper. Results for the same set of four patients were obtained at three different sites and compared in order to assess the variation resulting from several important phases of analysis, including RNA amplification and labeling, hybridization and wash, and slide scanning. The same input RNA was used for all experiments.

In the first phase of the analysis, two laboratories, one in Amsterdam and one in California, amplified and labeled the RNA samples, then exchanged aliquots of the templates. Hybridization and slide scanning were performed at both locations and the scanned slides were then exchanged for re-analysis by the other laboratory. The same lot of labeling kits and microarrays were used at both sites. Technical replication variability was assessed by analyzing two separate slides in two different days. This experimental design allowed examination of both intra- and inter-laboratory variation.

The Pearson correlation coefficient across all technical replicates for all tumors analyzed proved to be above 0.983, indicating that the signals from replicate hybridizations correlated extremely well for genes expressed at all the measured intensity levels.

The reproducibility of laboratory scanning procedures was evaluated by scanning each of the 16 microarray slides at both sites. Signals for green fluorescent dye proved extremely reproducible, irrespective to the site of first hybridization and scan (Pearson correlation coefficient > 0.995, slope = 0.97), while signals for the red dye correlated less well and were always lower on the rescanned slide. The correlation of the 70-gene expression profile to the previously developed59 mean signature58 was computed for each dye-swapped pair of arrays and ANOVA was used to evaluate the variability by hybridization site, labeling site, and hybridization day. No significant differences were found between hybridization sites, or hybridization days (regardless of site), but two tumors showed a statistically significant difference (P value <0.05) between labeling sites. Variability due to the RNA labeling site was further confirmed for expression measurements of individual genes of the 70-gene expression profile, as well as on the 182 most highly expressed genes.

In the second phase of the study, the assay performance was evaluated by a third laboratory in Paris, France, using a different batch of arrays, reagents, and labeling kits, on the same four tumor RNAs, several months after the initial comparison. The 70-gene signature correlation values for each of the four tumors were compared by ANOVA analysis, and significant differences were found for two of the tumors, when stratified by labeling site (P values of 0.0004 and 0.01 respectively), whereas one tumor proved to be significantly different (P value, 0.016) by hybridization site. The authors predicted, but did not provide supporting data, that if variations in the washing protocols were introduced between laboratories, significant discrepancies in the 70-gene signature results would emerge. They concluded that while some sources of variation have measurable influence on individual microarray measurements, the overall impact on the 70-gene signature is low.

In summary, this study thoroughly investigated factors that could affect the reproducibility of the 70-gene signature within and across different laboratories. RNA labeling proved to be the largest contributor to inter-laboratory variation, but the authors did not address the impact of such factors on the classification of individual patients into different risk groups. The data (although from only four distinct patients) implies that results from MammaPrint testing cannot be compared across laboratories and that the test must be centralized (Tables 7 and 8, Appendix I, Evidence Tables 6, 7 and 8).

H/I Ratio

None of the studies reviewed here explicitly referred to the marketed H/I ratio (BCP assay). However, one publication described the analytic procedures involved with such test, Ma, 2006.61 The rest of the available analytic validity and variability evidence was specific to the way in which the two-gene ratio profile was computed in each clinical study, and did not contain direct information about the marketed test.

Table 9

Successful assays, two-gene signature and H/I ratio assays
Study, yearProtocolMeasureSuccessInsufficient TumorPoor RNAPathological reviewRT-PCRLow reference genesClinically ineligible
Goetz, 200662Study SpecificFFPE206/211 (97.6%)*211/227 (93%)5/211 (2.4%)
Jerevall, 200763Study SpecificFFPE357/373 (95.7%%)16/373 (4.3%)
Ma, 200661H/I ratio assay852/870 (98%)132/1002 (13.2%)18/870 (2%)
*

Out of the 256 eligible patients 227 were analyzed (No specimen available: 29 patients)

Out of the 1002 eligible

Tumor content < 10%

RT-PCR = reverse-transcriptase polymerase chain reaction; FFPE = formalin fixed paraffin embedded

Table 10

Reproducibility, two-gene signature and H/I ratio assay
Study, yearMethods:Results
Jerevall, 200763Reproducibility between two institutions, Pearson's correlations, 10 patientsHOXB13:b-actin, r = 0.96, P < 0.001
IL17BR:b-actin, r = 0.87, P = 0.002
HOXB13:IL17BR, r = 0.99, P < 0.001
Ma, 200464Correlations between microarray and RT-PCR: 59 patientsHOXB13, r = 0.83
IL17BR, r = 0.93
HOXB13/IL17BR, r = 0.83

RT-PCR = real time polymerase chain reaction

Table 11

RT-PCR vs IHC comparison assays, two-gene signature and H/I ratio assays
Study, yearComparisonMethodDetailsResultsComments
Ma, 2006 61IHC and RT-PCR ConcordanceCohen's κ statisticsIHC Allred 92 scores of 3 to 8 were considered positive for ER or PR 93ER, 91% concordance, κ = 0.83, P value = .0001According to the authors these results confirmed the significant correlations between mRNA and protein levels for ER and PR and provided validation of our FFPE gene expression assay platform.
Methods for IHC in 9091 Both ER and PR mRNA RT-PCR measurements were bimodal; midpoints used as cutoffs:ER, 85% concordance, κ = 0.70, P value = .0001
2.5 CT for ER
5.9 CT for PR

IHC = immunohistochemistry; RT-PCR =real time polymerase chain reaction; FFPE = formalin fixed paraffin embedded; ER = estrogen receptor; PR = progesterone receptor; mRNA=messenger ribonucleic acid; CT =cycle threshold

Three studies (Goetz 2006,62 Jerevall 2007,63 Ma 200661) reported the overall success rate of the analyses, one report, Jerevall 2007,63 assessed the reproducibility between two different institutions, one assessed the correlation between RT-PCR and microarray based gene expression measurements for the two genes (HOXB13 and IL17RB), and one, Ma 2004,64 study compared ER status by RT-PCR and IHC (see Tables 9, 10, and 11). No comparisons were made between expression measurements of HOXB13 and IL17RB transcripts and the corresponding proteins by IHC. For completeness, a brief description of individual studies follows.

Ma et al., 2004.64 In this study the authors developed the HOXB13/IL17BR two-gene ratio signature. They identified differentially expressed genes associated with breast cancer recurrence in patients who were treated with tamoxifen, using gene expression arrays on whole mount as well as on laser micro-dissected (LMC) specimens. From a total of 5,475 genes selected because of their high variability across tumors, three differentially expressed genes proved to be common between the two analyses (macro-dissected specimens vs. LCM). These genes were HOXB13 (identified twice as AI700363 and BC007092), the 17B receptor IL17BR (AF208111), and EST AI240933.

HOXB13 was found to be over-expressed in tamoxifen recurrence cases, whereas IL17BR and AI240933 were over-expressed in tamoxifen non-recurrence cases. The authors confirmed relative gene expression by RT-PCR microarray analysis on 59 out of the 60 original patients. The Pearson correlation coefficient between array and RT-PCR results was 0.83 for HOXB13, and r = 0.93 for IL17BR. The RT-PCR-derived HOXB13/IL17BR ratios also highly correlated with its microarray-derived counterpart (0.83). The authors also evaluated by RT-PCR 20 additional ER-positive early-stage primary breast tumors from women treated with adjuvant tamoxifen monotherapy between 1991 and 2000. These were used as a validation set (see Key Question 3).

In summary, this study provides a foundation for the use of the H/I ratio signature in LMC FFPE specimens (Table 10, Appendix I, Evidence Tables 10, 11 and 12).

Ma et al., 2006.61 The authors developed the two-gene index concept in this study, based on the two-gene ratio they originally published in Ma et al, 2004.64 New RT-PCR primers/probes for HOXB13 and IL17BR were used, and four reference genes were introduced for normalization. Total RNA was isolated from two 7-micrometer thick tissue sections for each sample, reverse transcribed into cDNA using a pool of gene-specific primers, and quantitated by TaqMan RT-PCR in duplicate in a 384-well plate. For each sample, CT values for the four reference genes were averaged and the relative expression level of each target gene was expressed as the difference from mean reference CT after Z-transformation. This resulting value is no longer a simple ratio, and is thus referred to as the two-gene index.

RNA for this study was prepared from cancer cells isolated by LCM from FFPE tissue microarray sections (see Glossary, Appendix B) of originally frozen tumor specimens. From 870 patients, 98.0 percent of samples were successfully processed (Table 9).

In this study the authors evaluated the concordance between ER and PR protein levels assessed by IHC and the corresponding gene expression measured by RT-PCR. Since the distributions were found to be bimodal for both genes, the midpoints between the two populations were used as cutoff points (2.5 CT for ER and 5.9 for PR). Both the ER (91 percent concordance; kappa = 0.83; P value = .0001), and PR (85 percent concordance; kappa = 0.70; P value = .0001) status proved to be highly concordant. According to the authors, this confirms the significant correlations between mRNA and protein levels for ER and PR and provided validation of their gene expression analysis.

In summary, this clinical study, in which the HOXB13-to-IL17BR index was developed, represents the foundation for using the two-gene ratio signature in tissue microarray FFPE specimens analysis (Table 11, Appendix I, Evidence Tables 10, 11 and 12).

Goetz et al., 2006.62 In this clinical study, FFPE tumors samples from 206 of 211 primary breast cancer patients were successfully processed by laser micro-dissection (LMC) prior to total RNA preparation. This study provides generic evidence about the analytic validity of the two-gene signature in primary breast cancer patients, as computed from LMC processed FFPE blocks (Table 9, Appendix I, Evidence Tables 10, 11 and 12).

Jerevall et al., 2007.63 This paper quantified expression of HOXB13 and IL17BR (normalized to beta-actin) by RT-PCR in fresh frozen specimens from two distinct institutions in Sweden. RT-PCR reactions at the two institutions were performed using the same sets of primers and fluorescent probes, and two distinct instruments. Ninety-six percent of the 373 samples were successfully analyzed.

In summary, good reproducibility of the measurement between institutions was documented for each individual gene and the ratio (Pearson's correlation coefficient = 0.99, P value < 0.001) (Table 10, Appendix I, Evidence Tables 10, 11 and 12).

Key Question 3. What is the clinical validity of gene expression profiling tests in women diagnosed with breast cancer?

Table 12

Clinical validity, Oncotype DX™
Study, Population characteristicsEnd points and Exclusion criteriaClinical validity and utility resultsConclusions/Comments
Cobleigh, 200547Metastases;RS < 18:14% of patients:Development of RS
78/86 analyzed patients:<10 LN+; Recurrence: 29% (95%CI: 0–53%)GEP was correlated with the likelihood of DRFS
mean age 57 y, all LN+;<5% cancer cells;RS >18 and RS < 31: 24% of patients:
TS 0–2cm: 33%; TS >5cm:non invasive breast cancer; 10 years recurrence: 72% (95%CI: 38–88%)
31%; tamoxifen 54%;10 y DRFS in LN+, ER+ and ER- patientsRS > 31: 62% of patients
TG-1: 29%; TG-3: 36%; 10 years recurrence: 80% (95% CI: 63–89%)
adjuvant CMF 80%
Esteva, 200548follow-up < 5y;No significant correlation between age, tumor size, or RS and DFRSDifferences between the NSABP B-14 population used in Paik et al.:
149/220 analyzed patients:adjuvant therapy;No significant difference between RS risk groups with respect to distant recurrence-free survival; ER+ and ER- patients were used;
mean age 58 y;LN+; ER+ patients not treated with tamoxifen;
ER+: 69.1%, PR+: 66.4%;<5% of cancer cells; Association between high nuclear grade and improved outcome;
HER-2+: 16.8%;DFRS in LN-, untreated patients; Patients from a single institution
TG-1: 12.1%, TG-3: 30.2%;
median TS 2.3 cm
Habel, 200650LN+, age >75 y;RS <18 / ER+, tamoxifen: 2.8%, 95%CI: 1.7–3.9RS associated with risk of breast cancer death in:
220/234 analyzed cases:initially treated with chemotherapy;RS <18 / ER+, no tamoxifen: 6.2%, 95%CI: 4.5–7.9 ER+ patients treated with tamoxifen;
TG-Well: 11%; TG-Poor:metastases, inflammatory or bilateral breast cancer;RS 18–30 / ER+, tamoxifen: 10.7%, 95%CI: 6.3–14.9 ER+ patients not treated with tamoxifen;
47%;TS < 2: 64%; TS >2cm:unknown tamoxifen;RS 18–30 / ER+, no tamoxifen: 17.8%, 95%CI: 11.8–23.3 ER- patients;
36%;ER+: 76%prior cancer;RS ≥31 / ER+, tamoxifen: 15.5%, 95%CI: 7.6–22.8Such associations remained after accounting for tumor size and grade. Moreover the RS was able to identify a larger subset of patients with low risk of breast cancer death than it was possible with either of these standard prognostic indicators
570/631 analyzed controls:10 y breast cancer-specific mortality in ER+, LN patients;RS ≥31 / ER+, no tamoxifen: 19.9%, 95%CI: 14.2–25.2
TG-Well: 31%; TG-Poor:
23%; TS < 2: 79%; TS >2cm:
21%;ER+: 90%
Paik, 200428<5% cancer cells;51% of the patients RS <18, KM estimates = 6.8, 95%CI = 4.0–9.6RS validated in tamoxifen-treated, LN-, ER+ breast cancer patients
668/754 analyzed patientsinsufficient RNA;22% of the patients RS > 18 <31, KM estimates = 14.3–95%CI=8.3–20.3
(tamoxifen treatment arm of NSABP B-14)weak RT-PCR signal (average cycle threshold for reference genes >35);27% of the patients RS > 31, KM estimates =30.5–95%CI=23.6–37.4)
distant recurrence and the Overall Survival in LN-, ER+ breast cancer;PR, ER, HER, age, size, grade, and RS: p-Value=0.001, Hazard Ratio =2.81 (95%CI= 1.70–4.64 for 50 units increase
Paik, 200653<5% invasive tumor;20.6% of the patients RS<18, tamoxifen, 96.8 93.7% to 99.9%The RS assay predicts the magnitude of chemotherapy benefit in women with node-negative, ER-positive breast cancer
651/670 analyzed patients:insufficient RNA;33.5% of the patients RS<18, chemotherapy, 95.6 92.7% to 98.6%If RS risk groups are considered:
TS < 2: 66%; TG-Well: 13%;weak RT-PCR signal (average cycle threshold for reference genes >35);7% of the patients RS >18 <31, tamoxifen: 90.9 82.5% to 99.4% a minimal benefit from chemotherapy is seen in the low risk group, however with large intervals
TG-Poor: 28%; TS >2cm: 34%;distant recurrence in ER+, LN- breast cancer from NSABP B2013.7% of the patients RS >18 <31, chemotherapy: 89.1 82.4% to 95.9% benefit is not assessable in the Intermediate risk group due to the uncertainty in the estimates
ER+: 100%, LN-: 100%7.2% of the patients RS>31, tamoxifen: 60.5 46.2% to 74.8% a large chemotherapy benefit is seen in the high risk group
tamoxifen treatment arm of NSABP B-2018% of the patients RS>31, chemotherapy: 88.1 82.0% to 94.2%

LN = lymph node; TS = tumor size; TG = tumor grade; CMF = cyclophosphamide, methotrexate, and fluorouracil; DRFS= distant recurrence-free survival; ER = estrogen receptor; CI=confidence interval; RS = recurrence score; GEP=gene expression programming; HER = human epidermal growth factor receptor; NR = not reported; pCR=complete pathological response; INT= Italian National Cancer Institute of Milan, Italy; NSABP = The National Surgical Adjuvant Breast and Bowel Project; RT-PCR = reverse transcriptase polymerase chain reaction; PR = progesterone receptor; KM=Kaplan Meier.

A synopsis of the clinical validity evidence presented in the following section is reported in Table 12.

Oncotype DX

Paik et al., 2004.28 This study was the first to validate the prognostic validity of Oncotype DX in a population independent from that used to develop the test. The population consisted of a sample of 668 (out of 2617) lymph node-negative, ER positive breast cancer patients from the tamoxifen-treated arm of the National Surgical Adjuvant Breast and Bowel Project (NSABP) Trial B-14. This 668-patient subset had enough analyzable tissue in paraffin blocks to be evaluated using the Oncotype DX assay, and was reported to be similar in baseline characteristics to the overall sample. A more complete sample was impossible because of sample unavailability or processing problems. In this study, the overall 10-year distant recurrence rate was 15 percent and the RS was significantly correlated with disease-free survival and overall survival (P<0.001 for both). The authors reported that RS alone was a better predictor of the distant recurrence risk than traditionally used predictors. In a multivariate model including age, tumor size grade, ER, PR, and HER, the RS Hazard Ratio was 2.81 (95 percent CI, 1.70–4.64, P<0.001, per 50 unit increase). Forty-four patients out of the 109 with small tumors (diameter less than 1 cm), were classified using Oncotype DX into the intermediate or high risk groups (Table 12, Appendix I, Evidence Tables 1, 2 and 4).

Esteva et al., 2005.48 In this study the Oncotype DX assay was evaluated in a population of 149 patients treated at the MD Anderson Cancer Center between 1978 and 1995. These patients had been diagnosed with node-negative breast cancer and did not receive tamoxifen or chemotherapy, and had a median 18 year followup. The number of recurrences was not reported, and this study failed to find correlation between RS and distant breast cancer recurrence. ER, PR, and HER-2 showed no prognostic value, and well-differentiated tumors were correlated with worse survival than higher grade tumors, the reverse of expected. The population was unusual in that it received no treatment, and was different from the one used by Paik et al.28 (Table 12, Appendix I, Evidence Tables 1, 2, and 4).

Cobleigh et al., 2005.47 This report is the only study among the three used to develop the 21-gene Recurrence Score assay (Oncotype DX) to be published in a peer-reviewed journal. Seventy-eight breast cancer patients with more than 10 positive nodes from Rush University Cancer Center were studied, and 55 had recurred. Two hundred and fifty-five candidate genes were amplified with RT-PCR from FFPE tumor tissue obtained as long as 24 years ago. Twenty-two genes were significantly correlated with distant recurrence-free survival (DRFS) (unadjusted P value < 0.05). An RS was developed using these genes which very strongly predicted disease-free survival, but as this was training and not validation data, it has minimal evidential value in assessing Oncotype DX predictive properties (Table 12, Appendix I, Evidence Tables 1, 2 and 4).

Habel et al., 2006.50 The Oncotype DX assay was used to assess the risk of breast cancer-specific mortality among women in a large case-control study population derived from fourteen Northern California Kaiser community hospitals with ER positive, node-negative breast cancer.

There were a total of 4,964 eligible patients, 220 had died and 570 were living controls. All were younger than 75 years old, diagnosed between 1985 and 1994, and had not been treated with adjuvant chemotherapy. For ER positive tamoxifen-treated patients, RS risk groups (as defined by pre-specified thresholds chosen by the test developers) showed similar 10-year risks of death from breast cancer (3 percent, 12 percent, and 27 percent respectively for low, intermediate, and high risk, groups) as Paik28 reported for the NSABP B-14 patients. Multivariate analysis showed that RS and tumor size were significant and independent risk predictors of breast cancer death in both ER positive, tamoxifen-treated (hazard ratio per 50 units = 7.6, P<0.001) and untreated patients (RS hazard ratio per 50 units = 4.1, P<0.001). Tamoxifen-treated patients were shown to have a higher risk of death, and tumor grade proved to be a significant, independent predictor as well. The RS score showed some prognostic value in ER negative patients, although this group was too small to perform a reliable analysis.

ER status was missing from the medical record for a substantial proportion of patients in this study, and therefore ER status based on gene expression was used in the analysis. Cases and controls were matched with respect to tamoxifen treatment, so it was not possible to assess whether the RS was able to identify patients who are likely to respond to tamoxifen therapy. The performance of the Oncotype DX assay RS was not compared to standard risk stratification methods (e.g., St. Gallen, NIH criteria, or Adjuvant! Online) (Table 12, Appendix I, Evidence Tables 1, 2, and 4).

Table 13

Risk classification of Oncotype DX™ against the St. Gallen criteria
St. Gallen risk group10 yr. risk of distant relapse by St. GallenOncotype DX Risk group10 yr. risk of distant relapse% of St. Gallen stratum (n)
Low5%Low 0% 72% (38)
(n=53)Medium 18% 22% (12)
8%High43%6% (3)
Intermediate9%Low 5% 60% (134)
(n=222)Medium 6% 23% (51)
33%High21%17% (37)
High18%Low 8% 42% (134)
(n=393)Medium 18% 22% (51)
59%High33%36% (37)

Table 14

Risk classification of Oncotype DX™ against the 2004 NCCN guidelines
2004 NCCN risk group10 yr. risk of distant relapse by NCCNOncotype DX Risk group10 yr. risk of distant relapse% of NCCN stratum (n)
Low5%Low 0% 72% (38)
(n=53)Medium 18% 22% (12)
8%High43%6% (3)
High15%Low 8% 49% (300)
(n=615)Medium 14% 22% (137)
92%High30%29% (178)

NCCN = National Comprehensive Cancer Network

Table 15

Risk classification of Oncotype DX™ against the Adjuvant! guidelines
Adjuvant! Online risk group10 yr. risk of distant relapse by Adjuvant!Oncotype DX Risk group10 yr. risk of distant relapse% of Adjuvant! Online stratum (n)
Low8%Low 6% 61% (216)
(n=354)Medium-High13%39% (138)
53%
Med-High22%Low 9% 39% (122)
(n=314)Medium-High31%61% (192)
47%
Paik et al., 2004,65Bryant 2005,66and Hornberger et al., 2005.67 These posters showed the cross-classified risk predictions of the Oncotype DX assays compared to the risk stratifications using the 2004 NCCN and 2003 St. Gallen criteria, with the observed 10 year risks of relapse in the cross-classified strata. NCCN guidelines have since been modified, and the St. Gallen criteria did not accounted for HER-2. Patients came from the Paik65 NSABP-14 validation cohort, N=668. Using the 2004 NCCN guidelines, the study indicated that of the 92 percent who were in the high-risk NCCN category, about half were reclassified as low-risk by RS, with a 10-year relapse risk of 7 percent (95 percent CI, 4–11 percent), which is similar to the risk observed in the low risk RS group, without the NCCN information65. Finally, against the Adjuvant Online criteria, roughly 40 percent of those assessed to be at high risk (22 percent relapsed) were reclassified as having an 8 percent risk if they had a low RS score. These data, demonstrate that optimal predictions may come from a combination of expression predictors and standardized indices, although the latter contribute less than the RS to the risk estimate (Tables 13, 14, and 15).

MammaPrint

Table 16

Clinical Validity, MammaPrint® and 70-gene signature
Study, Population characteristicsEnd points and Exclusion criteriaClinical validity and utility resultsConclusions/Comments
Buyse, 200659Exclusion criteria: Age > 61 y, TS >5cm, previous malignancy (except basal cell carcinoma), bilateral synchronous breastKM analysis stratified by MammaPrint and Adjuvant (% of patients with distant recurrence):MammaPrint is a better predictor of TTM than Age, Size, Grade, ER, Adjuvant!, NPI, St Gallen
307/403 analyzed patients, all age < 60 y, all < 5cm (ER missing: 5 patients)End points: OS, RFS, TTM Good(R>0.4), Adjuvant!Low: 52 patientsSt Gallen is a better predictor of DFS than MammaPrint
Clinical low risk/gene low risk n=52 (TS < 2cm: 67%, ER+: 100%, TG-Good: 43%, TG-Poor: 0%) OS(10years): 0.88 (0.74 to 0.95)MammaPrint is a better predictor for OS than Age, Size, Grade, ER, Adjuvant!, NPI, St Gallen
Clinical low risk/gene high risk n=28 (TS < 2cm: 59%, ER+: 100%, TG-Good: 43%, TG-Poor: 0%) Good(R>0.4), Adjuvant!High: 59 patientsThe signature remained a statistically significant prognostic factor for TTM and OS even after adjustment for various risk classifications methods based on clinicopathologic factors
Clinical high risk/gene low risk n=59 (TS < 2cm: 29%, ER+: 91%, ER-: 9%, TG-Good: 12%, TG-Poor: 18%) OS(10years): 0.89 (0.77 to 0.95)The lack of statistical significance for DFS was explained by the fact that the signature was originally developed using TTM as the endpoint
Clinical high risk/gene high risk n=163 (TS < 2cm: 25%, ER+: 48%, ER-: 52%, TG-Good: 3%, TG-Poor: 69%) Poor(R<0.4), Adjuvant!Low: 28 patientsOverall the 70-gene signature adds independent prognostic information to clinicopathologic risk assessment for node-negative untreated patients with early breast cancer
 OS(10years): 0.69 (0.45 to 0.84)Clinical risk hazard ratios, adjusted for the gene signature were not significant, suggesting that most of their prognostic utility is subsumed by the gene signature
 Poor(R<0.4), Adjuvant!High: 163 patients
 OS(10years): 0.69 (0.61 to 0.76)
Hazard Ratios (unadjusted), MammaPrint:
 TTM=2.32 (95% CI = 1.35–4.00)
 DFS=1.50 (95% CI = 1.04–2.16)
 OS=2.79 (95% CI = 1.60–4.87)
MammaPrint adjusted by Adjuvant:
 TTM= 2.13 (95% CI = 1.19 to 3.82)
 DFS= 1.36 (95% CI = 0.91 to 2.03)
 OS= 2.63 (95% CI =1.45 to 4.79)
Development of metastases within 5 years:
 Sensitivity for Gene signature 0.90 (0.78 to 0.95)
 Sensitivity for Adjuvant! 0.87 (0.75 to 0.94)
 Specificity for Gene signature 0.42 (0.36 to 0.48)
 Specificity for Adjuvant! 0.29 (0.24 to 0.35)
ROC area under the curve:
 MammaPrint®: TTM: 0.681
 MammaPrint®: OS: 0.659
 Adjuvant: TTM: 0.648
 Adjuvant: OS: 0.576
van't Veer, 200221Exclusion criteria: Age >55 y, TS >5cm, metastases, previous malignancy, diagnosed before 1983, or after 199665/78 correct predictions:The 70-genes signatures is a better predictor of the risk of distant metastases than standard clinical predictors
Population: 78 + 19 patients, (mean age 44.9 y, TS < 2cm: 41.2%, ER+: 70.2%. PR+: 57.7%, LN-: 100%, TG-Good: 12%, TG-Poor: 49%)End point: distant metastases as first relapse event (5 years) 5 poor in the 70-gene Good group
Therapy:Outcome: 8 good in the 70-gene Poor group
Hormonal (3 patients), Chemotherapy (3 patients)No metastases within 5yrs: 51;17/19 correct predictions:
Metastases within 5yrs: 46 1 poor in the 70-gene Good group
 1 good in the 70-gene Poor group
Univariate OR=15, (95%CI=4–56, P =0.0000041)
Multivariate OR = 18, (95%CI=3.3–94, P = 0.00014)
van de Vijver, 200225Exclusion criteria: Age >52 y, TS >5cm, previous malignancy, apical axillary LN involvementThe 70-genes association with age, tumor grade, ER (P value<0.001), and tumor size (P =0.012);The authors demonstrate for the first time that microarray technology can be used as a reliable diagnostic tool
Population: 295 patients, all age < 53 y, all < 5cm, 61 in common with van't Veer 2002:21End point: distant metastases as first relapse event, OS67 LN- patients (not in van't Veer 200221) OR = 15.3, (95%CI = 1.8–127, P = 0.003)The MammaPrint assay performed similarly to the original 70-genes signature and is, therefore, an excellent tool to predict outcome of disease in breast cancer patients
Poor Prognosis n=180, (TS < 2cm: 647%, LN-: 51%, ER+: 63%);180 LN+ and LN- patients (not in van't Veer 200221): OR = 14.6, (95%CI = 4.3–50, P < 0.0001)
Hormonal (13% patients), Chemotherapy (37% patients)All patients, HR = 5.1, (95%CI = 2.9–9.0, P < 0.001)
Good Prognosis n=115, (TS < 2cm: 62%, LN-: 52%, ER+: 97%);151 LN-, HR = 5.5, (95%CI = 2.5–12.2, P < 0.001)
Hormonal (15% patients), Chemotherapy (38% patients)144 LN+, HR = 4.5, (95%CI = 2–10.2, P < 0.001)
Multivariate HR = 4.6, 95%CI = 2.3–9.2, p value < 0.001
Glas, 200658Exclusion criteria:See van't Veer, 200225 above78 patients from the van't Veer21 series:The authors demonstrate for the first time that microarray technology can be used as a reliable diagnostic tool
Population: 162 LN-, untreated patients (<55 years), from the van de Vijver and van't Veer cohortsEnd point: distant metastases as first relapse event MammaPrint OR = 13.95 (95%CI = 3.9–44);The MammaPrint assay performed similarly to the original 70-genes signature and is, therefore, an excellent tool to predict outcome of disease in breast cancer patients
All 78 patients form the van't Veer were re-analyzed 70-genes signature OR = 15, 95%CI = 2.1 to 19)
 7/78 differently classified by MammaPrint
145 LN- patients from the van de Vijver25 series:
 MammaPrint HR = 5.6 (95%CI = 2.4–7.3, P = 0.0001)
Similar results were obtained for OS

ER = estrogen receptor; TS = tumor size; TG = tumor grade; OS=overall survival; RFS=relapse free survival; TTM=time to distant metastases; ROC = Receiver operating characteristic; NPI=Nottingham prognostic index; DFS=disease free survival; OR = odds ratio; CI = confidence interval; LN = lymph node; HR=hazard ratio; KM=Kaplan-Meier.

A synopsis of the clinical validity evidence presented in the following section is reported in Table 16. In the following section we will be distinguishing between MammaPrint, the marketed assay, and the gene expression profile which is the 70-gene signature originally published by van't Veer et al., in 2002.21

van't Veer et al., 2002.21 This study reported the development data for the 70-gene panel that is the basis for the MammaPrint test. A gene expression array containing 25,000 features was used to select genes associated with metastases-free survival at 5 years from surgery in 78 node negative patients, including 34 patients who recurred at 5 years and 44 who had not. Using the development of metastasis within 5 years as the first relapse event, 65 out of the 78 patients were correctly classified into good and poor prognosis groups by the 70-gene signature. Among the 13 misclassified patients, 5 patients with poor prognosis were in the good prognosis group, while 8 patients with good prognosis were classified in the poor prognosis group. Seventeen of 19 were correctly classified in the validation set.

Table 17

MammaPrint® compared with traditional composite risk markers
Number classified as “high risk”
CriteriaPatients with metastasesSensitivityPatients without metastasesSpecificity
n=34n=44
St. Gallen533 (97%)31 (30%)
NIH3,432 (94%)40 (9%)
70-Gene panel31 (91%)18 (59%)
The odds ratio (OR) to develop metastases within 5 years was 28, (95 percent CI, 7–107), while after leave-one-out cross-validation it was 15 (95 percent CI, 4–56). Using univariate analysis, the 70-gene signature performed better than tumor grade, size, patient age (less than 40years), ER status, and angioinvasion. Using multivariate analysis, the 70-gene signature was an independent predictor of metastases within 5 years, OR = 18 (95 percent CI, 3.3–94) (Tables 16 and 17, Appendix IEvidence Tables 6, 7 and 9).

van de Vijver et al., 2002.25 This was the first major validation of the 70-gene signature as reported in van't Veer 2002 using the same protocol and approach. Banked tumor specimens from the Netherlands Cancer Institute were used from a consecutive series of 295 women with breast cancer, with a mix of lymph node positivity, ER status, chemotherapy, and tamoxifen treatment. Time to metastases, as well as overall survival (OS) were used as primary end points in survival models, and 61 patients in this cohort had been in van't Veer's21 original 78 patient training set.

Patients were young (less than 52 years) with small tumors (less than 5 cm). The 70-gene signature was shown to be associated with grade, size and ER positivity, with almost all of ER positive patients falling into the good prognosis category. Those with “good prognosis” 70-gene expression signatures had dramatically better 5-year (95 percent vs. 61 percent) and 10-year (85 percent vs. 51 percent) DRFS and OS (95 percent vs. 55 percent at 10 years) than the “poor prognosis” group. Multivariate analysis showed that the prognosis group, tumor size, and adjuvant chemotherapy were the strongest predictors of distant metastases. The “poor prognosis” signature had the largest hazard ratio = 4.6 (95 percent CI, 2.3–9.2). Analyses excluding the 61 previously-included patients produced similar results. Fourteen of the 115 “good signature” patients experienced a recurrence by 10 years, demonstrating that the “good prognosis” group may not be at low enough long-term risk to justify forgoing chemotherapy when the 70-gene signature is used alone.

The authors did not compare a regression-based predictor using only conventional variables with one including the 70 gene panel. However the authors demonstrated the prognostic value of the 70 gene index using survival curves stratified by the NIH and St. Gallen criteria, which showed substantial separation between 70-gene prognostic groups that were either low or high risk by those conventional indices. These stratified survival curves also showed that optimal prediction was achieved when the gene index and conventional predictors were combined (Table 16, Appendix I, Evidence Tables 6, 7, and 9).

Buyse et al., 2006.59 This study compared the MammaPrint assay with the conventional combination risk predictors Adjuvant Online, Nottingham Prognostic Index, and St. Gallen. Patients were drawn from five distinct European institutions, in the context of an independent, multicenter validation study performed by the TRANS-BIG consortium. Gene expression in frozen tumor specimens from node negative patients younger than 60 years old who did not receive systemic adjuvant chemotherapy, and were diagnosed between 1980 and 1998 was characterized using the MammaPrint® assay. Final results were obtained for 302 out of 402 eligible patients. The median followup was 13.6 years, and the overall rate of distant metastasis was 25 percent.

The three primary end points of the study were time to distant metastases (TTM), DFS, and OS. The hazard ratios of the MammaPrint assay for TTM and OS were statistically significant after adjustment for St. Gallen, NPI and Adjuvant! On-line, but were generally far below (in the 1.5–2.5 range) that seen in the original validation cohort.25,58 The partial explanation offered by the authors was that this study had a longer median followup time than the one used by the van de Vijver25 cohort. Additionally, the authors introduced an interesting analysis showing the marked (3–6 fold) lowering of the hazard ratio for various endpoints when patients were artificially censored at increasing times, up to 10 years. Also, none of the ER positive patients reported in this study received hormonal therapy as did some of the original van de Vijver25 cohort.

Specificity and sensitivity of the MammaPrint assay and the Adjuvant! algorithm were compared for distant metastases within 5 years and for death within 10 years. Similar sensitivities were found, but a higher specificity was demonstrated for MammaPrint. The areas under the Receiver operating characteristic (ROC) curves were comparable between MammaPrint and Adjuvant! (0.68 vs. 0.66 for distant metastases at 5 years). The use of alternative thresholds for the Adjuvant! Online results did not change the overall results, and Adjuvant! hazard ratios were greater than unity but not statistically significant when adjusted for the gene signature. Finally, there was no statistical heterogeneity in any outcomes between centers, suggesting that this prediction model has transportability across populations with possibly different genotypic patterns.

This study is particularly important in that it provided the first evidence for the degree of clinical validity of the MammaPrint assay distinct from the 70-gene signature. It provided insight into the impact of differing lengths of followup in validation cohorts, and concluded that the prognostic contribution was sizable. However, this study's predictions were made in the context of no adjuvant hormonal or chemotherapy treatment, thus its applicability to women over 60 years old and treated with tamoxifen is unknown68 (Table 16, Appendix IEvidence Tables 6, 7, and 9).

Glas et al., 2006.58 This study used the same patients as in the van't Veer,21 and van de Vijver25 studies and compared the currently offered MammaPrint assay results to the results of the previous studies. RNA was available for all the 78 patients in the van't Veer series, but only 145 lymph node negative patients were available for reanalysis from the van de Vijver series. A different reference RNA was used, as well as a different quantification method, however odds ratios and hazard ratios were very similar. A total of 15 patients were incorrectly classified into discrepant risk categories. The results of the 70-gene signature used in the original cohorts therefore apply equally to the MammaPrint assay based on that signature (Table 16, Appendix IEvidence Tables 6, 7 and 9).

H/I Ratio

Table 18

Clinical Validity, two-gene signature and H/I ratio assays
Study, YearPopulation size, NEnd Points and Major FindingsComments
Goetz, 200662Population: 206/256 eligible patients, from the randomized NCCTG 89-30-52 trial on tamoxifen treatmentEnd points: RFS (any event of recurrence), DFS (recurrence, or death), and OS (death)According to the authors a high 2-gene expression ratio is associated with increased relapse and death in patients with node-negative ER positive breast cancer treated with tamoxifen
(TS < 3cm: 76%, LN-: 63%, ER+: 100%, HER2, 0; 11%, HER2, 1; 36%, HER2 2; 34%, TG-1: 26%, TG-3: 18%)Clinical validity and utility results:In this study the 2-gene ration was normalized by standard curve, and no reference genes were used; optimized cut-off points were identified and used
Exclusion criteria: NRNodal status, tumor size and Nottingham grade significantly associated with endpoints
All patient in the study:
 RFS, Multivariate F-S 1.45 (95% CI 0.93, 2.27)
 DFS, Multivariate F-S 1.57 (95% CI 1.04, 2.38)
 OS, Multivariate F-S 1.29 (95% CI 0.81, 2.08)
Node negative patients in the study (n=130):
 RFS, Multivariate F-S 1.73 (95% CI 0.92, 3.25)
 DFS, Multivariate F-S 1.77 (95% CI 0.99, 3.16)
 OS, Multivariate F-S 2.01 (95% CI 1.02, 3.99)
Jansen, 200772Population: 1,252/1693 eligible patients, subsets:End points: disease-free survival (DFS), progression free survival (PFS), post-relapse survival (PRS), and overall survival (OS)High HOXB13-to-IL17BR ratio expression levels associate with both tumor aggressiveness and tamoxifen therapy failure
 DFS: ER+, LN-, no Adjuvant (N = 468)Clinical validity and utility results:The ratio was significantly associated with DFS and PFS in the specific subsets of patients
 PFS: ER+, first-line tamoxifen (N = 193)Multivariate analysis, ER+, LN-, no Adjuvant (N = 468):In multivariate analysis, the ratio was associated with a shorter DFS for node-negative patients only
Exclusion criteria: distant recurrence within the first month of surgery, missing LN, ER, and HOXB13 and IL17B, < 30% tumor cells in specimens, poor RNA quality Dichotomized ratio with 3′ I17RB, DFS HR = 1.74; 95% CI = 1.17 to 2.59; P = 0.006Expression levels normalized to a different set of control genes respect to MA et al 200661 using fresh frozen samples
Multivariate analysis, relapsing ER+, tamoxifen (N = 193):
 Optimal dichotomized ratio with 3′ I17RB, PFS HR = 2.97; 95% CI = 1.82 to 4.86; P < 0.001
 Standard* dichotomized ratio with 3′ I17RB, PFS HR = 1.95; 95% CI = 1.39 to 2.73; P < 0.001
*as in Ma et 200661
Jerevall, 200763Population: 357 patients analyzed, 264 post-menopausal, and 93 pre-menopausal.End points: Correlation with clinical prognostic factors.Lower expression of IL17BR, but not HOXB13, was correlated to several factors related to poor prognosis, IL17BR might be an independent prognostic factor in breast cancer
 Postmenopausal patients: randomized clinical trial, comparing 2 years (163 patients, 62%) and 5 years (101 patients, 38%) of adjuvant tamoxifen treatment.The ratio was significantly associated to:
Exclusion criteria: NR Tumor size, P = 0.003;
 ER, P < 0.001;
 PR, P < 0.001;
 HER2, P = 0.003;
 NHG, P < 0.001;
 Ploidy, P < 0.001;
 S-phase, P = 0.005;
 ER, HER2, S-phase and NHG correlations are mostly due to IL17BT;
 PR and ploidy correlation have contribution from both genes
Ma, 200464Population: by RT-PCR, Frozen specimens: 59/60, FFPE: 20/20 eligible patientsEnd points: DFS (months) was calculated from the date of diagnosis.The authors concluded that HOXB13/IL17BR 2-gene ratio predicts tumor recurrence in the setting of tamoxifen therapy
Frozen, recurrence, n=28, (mean age: 65.1, LN-: 57%, TS > 5cm: 9, ER+: 97%, TG-1: 7%, TG-3: 39%)Clinical validity and utility results:
Frozen, non recurrence, n=32, (mean age: 69.1, LN-: 47%, ER+: 100%, TG-1: 3%, TG-3: 22%) Frozen, recurrence, DFS: 54.8 (range: 5–137)
FFPE, recurrence, n=10, (mean age: 65.5, LN-: 80%, ER+: 100%, TG-1: 10%, TG-3: 30%) Frozen, non recurrence, DFS: 115.6 (range: 61–169)
FFPE, non recurrence, n=10, (mean age: 65.2, LN-: 100%, ER+: 100%, TG-1: 10%, TG-3: 10%) FFPE, recurrence, DFS: 51.4 (range: 15–117)
Exclusion criteria: NR FFPE, non recurrence, DFS: 95.8 (range: 25–123)
KM analysis, log-rank test, RT-PCR on the training set: P value = 0.0000058
KM survival analysis, log-rank test, RT-PCR on the validation set: P value = 0.002
Classification results in the validation set (RT-PCR data): 16/20 correctly classified
Ma, 200661Population: 852/1002 eligible patients:End points: Relapse-free survival (RFS), defined as the time from initial diagnosis to any recurrence. Optimal threshold for dichotomization of the 2-gene ratio was identified and applied in the analysisAccording to the authors these results confirmed the significant correlations between mRNA and protein levels for ER and PR and provided validation of the FFPE gene expression assay platform
 All samples, n=852, (age > 50 y: 82%, LN-: 72%, ER+: 73%);Clinical validity and utility results:The HOXB13:IL17BR index was only significant in node-negative patients
 Tamoxifen treated, 286, (age > 50 y: 91%, LN-: 40%, ER+: 89%);Two-gene index on a continuous scale, 5-year recurrence risk for untreated patients:Higher HOXB13:IL17BR index was associated with a higher risk of relapse
 Untreated, 566, (age > 50 y: 77%, LN-: 84%, ER+: 65%); Index of -2.0 = 15% (95% CI, 9.8% to 20.5%)Two-gene index was a significant predictor of clinical outcome in ER+, node-negative, patients irrespective of tamoxifen therapy
Exclusion criteria: NR Index of +2.0 = 36% (95% CI, 26.5% to 45.2%)
Multivariate Cox Regression Analysis; ER+ node negative, untreated test set and tamoxifen treated patients (n = 225), dichotomized HOXB13:IL17BR index (high versus low)
HR = 3.9 (95% CI = 1.5 to 10.3) p value = 0.007
Reid, 200569Population: Tamoxifen 20mg/day for 5 years: 58 patients, (Age > 50yrs: 93.1%, TS ≤2cm: 37.9%, LN-: 22.5%, HER2+: 20.7%, PR+: 79.3%, ER+: 100%)End points: Disease Free Survival (DFS)Although the proposed predictive model is very appealing the use of the two-gene ratio signature in an independent population yielded statistically non significant results
Exclusion criteria: NRClinical validity and utility results:The authors failed to confirm the association of the 2-gene ratio with response to tamoxifen on their cohort (which is however different in terms of clinical characteristics from the original Ma 2004 cohort)
Univariate logistic regression: odds ratio:The authors also failed to classify patients using Discriminant Linear Analysis on two published data sets, including the Ma 2004 original series
 HOXB13 OR = 1.04, 95% CI = 0.92 to 1.16, P = 0.54
 IL17BR OR = 0.69, 95% CI = 0.40 to 1.20, P = 0.18
 HOXB13/IL17BR OR = 1.30, 95% CI = 0.88 to 1.93, P = 0.18
Similar results by the other methods

NCCTG= North Central Cancer Treatment Group; TS = tumor size; LN=lymph node; ER= estrogen receptor; HER-2= Human epidermal growth factor receptor 2; TG = tumor grade; NR = not reported; RFS = relapse-free survival; DFS = disease-free survival; OS = overall survival; CI= confidence interval; RNA=ribonucleic acid; PFS = progression-free survival; PRS = post-relapse survival; NHG= Nottingham histologic grade; PR = progesterone receptor; RT-PCR = reverse transcriptase polymerase chain reaction; FFPE = formalin-fixed paraffin-embedded; KM = Kaplan Meier; HR= hazard ratio.

A synopsis of the clinical validity evidence presented in the following section is reported in Table 18.

Ma et al., 2004.64 This study reported the development of the two-gene ratio predictor. The authors generated gene expression profiles with gene chips from whole and laser-capture microdissected (LCM) frozen tumor specimens from 60 ER positive, node positive or negative breast cancer patients all treated with adjuvant tamoxifen monotherapy. Twenty-eight of the cohort (46 percent) experienced a distant recurrence within 4 years and 54 percent had no recurrence by 10 years. Twenty-two thousand genes were screened in the whole tissue sections and in LMC samples for their ability to predict DFS. Only three genes were highly predictive of DFS in both tissue sets, with over-expression of HOXB13 predicting recurrence and over-expression of IL17BR predicting non-recurrence. These expression values were combined in the form of a ratio, which outperformed both existing biomarkers and either gene alone. The univariate OR (interquartile) was 10.2 (95 percent CI, 2.9–36), multivariate OR was 7.3 (95 percent CI, 2.1–26.3) with adjustment for tumor size, PR and ERBB2 (none statistically significant) in a logistic regression. Area under the receiver-operating-characteristic curve (AUCs) for the ratio were reported in the 0.8 range.

Next, the above analysis was repeated using just the two-gene ratio calculated by RT-PCR on 59 fresh-frozen samples from the training set along with 20 additional FFPE specimens to independently validate the ratio. Sixteen of these 20 were accurately predicted. The RT-PCR-measured expression was reported to have similar predictive power to that measured via gene arrays. No comparison with the full array of clinical predictors (e.g. tumor grade) or with standard combination predictors (e.g., Adjuvant!) was performed (Table 18, Appendix I, Evidence Tables 10, 11, and 13).

Reid et al., 2005.69 In this paper the authors attempted to validate the two-gene ratio on an independent cohort of 58 patients with ER positive breast cancer. These patients had been treated with tamoxifen monotherapy, had larger tumors, a higher frequency of lymph node metastases (78 percent vs. 47 percent), and a higher HER-2 positivity (21 percent vs. 5 percent) than those in the Ma et al., 2004 study. Eighteen patients had distant recurrences within a median time of 31 months, and 40 had no recurrence after a median of 93 months (range 70–125). The expression of the genes HOXB13 and IL17BR was measured by RT-PCR and the association between their expression and outcome was assessed by use of univariate logistic regression, AUC, a two-sample t test, and a Mann-Whitney test. None of these analyses revealed any statistical relationship with outcome.

The authors then took the original data of Ma et al.64 and applied standard supervised methods to this and to another independent data set with 99 similar patients.70 They tried to estimate the classification accuracy obtainable by using two or more genes in a microarray-based predictive model, using linear discriminant analysis and extensive cross-validation. The authors failed to validate the two-gene ratio and found high error rates with two-gene predictors.

Overall, findings from this paper argued against the prognostic utility of the two-gene ratio in ER positive breast cancer patients treated with tamoxifen. However, it must be noted that a different part of the transcripts were assayed in the two studies and that differences could be due to the documented differences in the populations used, which were neither clinically nor therapeutically homogeneous, with small validation sets71 (Table 18, Appendix I, Evidence Tables 10, 11 and 13).

Goetz et al., 2006.62 To investigate the prognostic performance of the two-gene ratio, this study analyzed FFPE samples from 206 ER-positive patients treated in the tamoxifen-only arm of a Phase III randomized trial of tamoxifen alone versus tamoxifen plus fluoximesterone conducted through the NCCTG (North Central Cancer Treatment Group).64 RT-PCR expression values for each gene were normalized using a standard curve (Appendix D) obtained by analyzing the human universal total RNA (Stratagen, La Jolla, CA), rather than the standard reference gene method, although the authors stated that control genes were not necessary to assess the expression ratio. The following end points were considered: RFS (time from randomization to any event of recurrence, contralateral breast cancer or death), DFS (time from randomization to any event of recurrence, or contralateral breast cancer, or other cancer, or death), and OS (time from randomization to death).

Cutoffs points that best predicted RFS, DFS and OS were identified: the optimal cut-off for the entire cohort was -1.85, corresponding to the 58th percentile, whereas the 59th percentile (-1.34) was used for the node-negative group (n = 130), and the 90th percentile (4.4) best discriminated in the node positive group (n = 86).

The ratio showed modest outcome prediction value in the entire cohort, with cross-validated hazard ratios near 1.5 and P values around 0.05, with the predictive value being restricted to the node-negative subset of patients (hazard ratios 1.7 to 2, P values = 0.04–0.06). In the node-positive group the ratio had no relationship to relapse or survival. The authors concluded that a high 2-gene expression ratio is associated with increased relapse and death in patients with node-negative, ER positive breast cancer treated with tamoxifen.

Overall this study provided some support of the two-gene ratio signature's prognostic value in ER positive, lymph node negative patients, but both the magnitude of that effect and the statistical support were modest, and the relevant cutoffs used for discrimination between high and low risk were optimized for each endpoint and patient subgroup. Hence, this is closer to a training than validation exercise (Table 18, Appendix I, Evidence Tables 10, 11 and 13).

Ma et al., 2006.61 This study examined a consecutive series of patients from Baylor University diagnosed between 1973 and 1993 with stage I or II breast cancer. The patients did not have distant spread, and non-relapsed cases had a median followup of 6.8 years. The authors reported data on the clinical validity of the two-gene expression index (HOXB13:IL17BR), which is the base of the H/I assay. A different normalization strategy (Table 1) from Ma et al., 200464 was applied to obtain this index. FFPE samples only yielded 852 analyzable cases out of 1,002 patients.

This population had 72 percent node negative, 73 percent ER positive, and 16 percent HER-2 positive patients, with an overall recurrence rate of 31 percent. A higher HOXB13:IL17BR index was associated with a higher risk of relapse (hazard ratio=1.5, P<0.001). In a stratified analysis, univariate Cox regression indicated that the HOXB13:IL17BR index was only significant in node-negative patients (hazard ratio = 1.6, P<0.001 vs. hazard ratio=1.2, P=0.1,) and further subsetting indicated that the interaction with node status was statistically significant for the HOXB13:IL17BR index (P= 0.02) only in ER positive patients. The HOXB13:IL17BR index correlated significantly with predictors of poor prognosis (i.e., HER-2 amplification, S-phase fraction, and number of positive lymph nodes) and correlated inversely with ER and PR expression.

The authors identified the optimal cut-off point for the index by analyzing a training set of ER-positive untreated patients (n=205), in order to obtain the smallest P value from a log-rank test in Kaplan-Meier survival analysis. The selected threshold (of about 1.0) was validated in a separate test set of untreated patients (n=103), and was also applied in the analysis of the tamoxifen-treated group of patients (n=122). Kaplan-Meier curves and univariate Cox regression analysis indicated that this cut point stratified patients into significantly different risk groups. Results from the Kaplan-Meier plots suggested that the prognostic power of the two-gene index was independent of tamoxifen therapy. The hazard ratio obtained in multivariate Cox proportional hazards regression, incorporating age, tumor size, S-phase fraction, PR status, and tamoxifen therapy, confirmed the prognostic role of the HOXB13:IL17BR index (hazard ratio=3.9, 95 percent CI = 1.5 to 10.3, P value = 0.007), in ER positive, node negative, patients irrespective of tamoxifen treatment. The index was also demonstrated to be a continuous predictor of DFS in untreated patients. The authors concluded that the two-gene index was a significant predictor of clinical outcome in ER positive, node-negative, patients regardless of tamoxifen therapy.

This study validated the two-gene ratio gene expression profile, developed the two-gene index, and provided preliminary evidence for its prognostic value. Classification probabilities were not presented, and its incremental value over conventional predictors was not reported, although some components of such predictors were included in the multivariate analyses (Table 18, Appendix I, Evidence Tables 10, 11 and 13).

Jansen et al., 2007.72 This clinical study evaluated the ability of the HOXB13-to-IL17BR expression ratio to predict DFS in breast cancer patients treated with tamoxifen. The HOXB13 and IL17BR expression levels were measured by RT-PCR in 1,252 primary breast tumor patients and normalized with respect to 3 housekeeping genes73. The study population was a mix of ER-positive (73 percent), lymph node-positive (52 percent), tamoxifen-treated (14 percent), and chemotherapy-treated (17 percent) patients, with additional patients treated with tamoxifen or chemotherapy after relapse (55 percent). Patients with ER-positive tumors with node negative primary breast cancer (N = 468) were followed for DFS. Patients with recurrent breast cancer treated with first-line tamoxifen monotherapy (N = 193) were followed for progression free survival (PFS). This study used different populations, protocols, normalization strategy, and ratio thresholds than Ma et al. 2006.61

The study evaluated the relation between the HOXB13-to-IL17BR ratio and tumor aggressiveness in lymph node negative, ER positive patients who did not receive adjuvant systemic chemotherapy (N=468). Of these patients, 46 percent had a relapse during the followup period. The HOXB13-to-IL17BR ratio, as a univariate continuous variable, was significantly associated with a poor DFS (hazard ratio=1.6, P=0.02) and a poor OS (P<0.001, data not reported). When traditional factors were added to the model, the HOXB13-to-IL17BR ratio continued to contribute significantly to DFS and OS prediction, either as a continuous variable or after dichotomization according to published pre-specified thresholds61 (Table 18).

The same analysis was performed on ER-positive, lymph node-positive tumors from untreated patients, who were mainly enrolled in the early 1980's (n=151). Univariate analysis of the continuous HOXB13-to-IL17BR ratio was associated with a poor DFS and a poor OS. In the multivariate model for this population, the index was significantly associated with OS (P value = 0.001), but less strongly with DFS (P value = 0.065). The dichotomized index was not related to DFS (data not shown).

Finally, the authors evaluated the prognostic performance of the HOXB13-to-IL17BR ratio in 193 ER-positive primary breast tumors in relapsed patients treated with first-line tamoxifen monotherapy. Both univariate and multivariate analyses revealed that the ratio, continuous and dichotomized, was strongly associated with PFS (Table 18).

This study is by far the largest done so far concerning the potential value of the 2-gene ratio. It provided evidence of the clinical validity of the HOXB13-to-IL17BR ratio in ER positive, node negative patients who did not receive systemic adjuvant therapy, and also in ER positive relapsing patients whose relapse was treated with tamoxifen. However, the study was calculated and dichotomized in a somewhat different manner than in Ma et al., 2006.61 Additionally, comparisons were not provided with conventional combination risk indices, nor were classification probabilities provided for the models with and without the ratio. Therefore, incremental predictive values could not be accurately assessed. Although qualitative conclusions are not affected, there are some differences between the quantitative results reported in the text and tables (Table 18, Appendix I, Evidence Tables 10, 11 and 13).

Jerevall et al., 2007.63 In this paper the authors investigated whether the two-gene ratio can predict the benefit of 2 versus 5 years of tamoxifen treatment in postmenopausal breast cancer patients, and also predict the ratio's prognostic value in systematically untreated pre-menopausal patients. Expression of HOXB13 and IL17BR were quantified by RT-PCR in tumors from 264 randomized postmenopausal patients and 93 systemically untreated premenopausal patients. The two study populations were collected as part of a collaborative study between two centers in Sweden, and 72 percent of the randomized patients were lymph node positive and 74 percent ER positive. To stratify the patients into risk groups the authors dichotomized the ratio using the median, a procedure and dichotomization differing from the approach used by Ma 2006.61 The results from the prediction of prolonged treatment benefit are reported under Key Question 4, Clinical Utility.

The ratio proved to be significantly correlated to tumor size, ER, PR, HER-2, Nottingham histologic grade (NHG), ploidy, and S-phase. ER, HER-2, S-phase and NHG correlations were mostly due to IL17BR, while PR and ploidy correlations showed contribution from both genes. The authors concluded that a lower expression of IL17BR, but not HOXB13, was correlated to several factors related to poor prognosis, and thus IL17BR might be an independent prognostic factor in breast cancer, and that HOXB13 may be correlated with tamoxifen resistance. However, the ratio had no prognostic value in ER negative postmenopausal patients and they were excluded from subsequent analyses.

In summary, this study produced additional developmental evidence of the prognostic value of the HOXB13-to-IL17BR ratio, and of the two individual genes, in ER positive breast cancer patients who received systemic adjuvant therapy. However, neither the patient profile nor the mode of calculation of the ratio were identical to previous studies, and the results differed from previous reports, as the ratio predicted for worse outcome in lymph node positive patients (Table 18, Appendix I, Evidence Tables 10, 11 and 13).

Key Question 4. What is the clinical utility of these tests?

The clinical utility of a test tells us whether the test helps discriminate between those who will have more or less benefit from a therapeutic intervention. This can only be assessed in the context of randomized clinical trials, where benefit can be measured in terms of an improvement of clinical outcomes such as overall survival, disease-free survival, chemotherapy toxicity, or quality of life.

The prognostic estimates provided in the previous section, however—have a relationship to clinical utility—providing an upper limit on the degree of clinical benefit that can be provided by chemotherapy for a given endpoint. For example, if the 10-year cancer recurrence rate without adjuvant chemotherapy is estimated to be 5 percent, the maximum absolute benefit to be derived from chemotherapy cannot exceed 5 percent. Furthermore, knowledge that chemotherapy generally only prevents a minority of recurrences tells us that the absolute benefit in terms of recurrence in that situation will be likely less than 2 percent. So while prognostic estimates are not direct estimates of benefit per se, they provide enough information that could be used to crudely estimate benefit and be sometimes relevant for patient decision-making.

Oncotype DX

Currently a prospective randomized clinical trial, TAILORx, is underway with the goal of assessing the value of adjuvant chemotherapy among patients with mid-range RS results. However, one other published study does address the potential value of the RS in predicting chemotherapy benefit.

Table 19

Clinical Utility, Oncotype DX™
Study, YearPopulation size, NEnd Points and Major FindingsComments
Chang, 200755Population: 72/97 eligible patients, (mean age 48.5 y, ER+: 69.1%;TG-Well: 2.8%; TG-Poor: 56.9%, LN-: 90%, HER-2+: 13.5%, treated with docetaxel)End points: prediction of clinical response (by the RECIST method) to docetaxel treatment in women with breast cancerThe authors concluded that Oncotype DX can be potentially be used as a predictive test of chemosensitivity using small amounts of routinely processed specimens
Exclusion criteria: NRClinical validity and utility results:
Clinical CR was more likely in the high RS risk group (P = 0.008); A 50 units increase in the RS showed an OR = 5 (95% CI = 1.3–6.0); AUROC = 0.73;
Gianni, 200549Population: 89/95 patients (mean age 49.9 y, stage-T4b: 79%, stage-T4d: 18%, stage-T2: 1%, stage-T3: 2%, ER+: 58%, TG-1: 24%, TG-3: 21%, LN-: 16%, adjuvant with doxorubicin/paclitaxel followed by paclitaxel)End points (goal): to examine the correlation between RS and pCR, and to identify additional genes associated with pCRThe authors showed that the RS was strongly correlated with pCR, and identified a set of genes, whose expression correlated with pCR to neoadjuvant doxorubicin and paclitaxel
Exclusion criteria: NRClinical validity and utility results:
The global likelihood ratio test assessing probit regression based models with and without the incorporation of the RS resulted in a P value of 0.005
Habel, 200650Population:End points: The risk of breast cancer-specific mortality among women with ER+, LN- breast cancer. Patients were matched by age, race, year of diagnosis and tamoxifen treatment.The authors showed that the RS was strongly associated with risk of breast cancer death among:
220/234 eligible cases (TG-Well: 11%, TG-Poor: 47%, TS < 2: 64%, TS >2cm: 36%, ER+: 76%, ER-: 24%)Clinical validity and utility results: ER+ patients treated with tamoxifen;
570/631 eligible control patients (TG-Well: 31%, TG-Poor: 23%, TS < 2: 79%: TS >2cm: 21%, ER+: 90%)10 y death risk according to RS (with tumor size and grade): ER+ patients not treated with tamoxifen
Exclusion criteria: LN+, age >75 y, initially treated with chemotherapy, inflammatory or bilateral cancer, metastases, prior invasive cancer, unknown/unconfirmed tamoxifenRS <18 / ER+, tamoxifen: 2.8%, 95%CI: 1.7–3.9; ER- patients
RS <18 / ER+, no tamoxifen 6.2%, 95%CI: 4.5–7.9;Such associations remained after accounting for tumor size and grade. Moreover the RS was able to identify a larger subset of patients with low risk of breast cancer death than it was possible with either of these standard prognostic indicators
RS 18–30 / ER+, tamoxifen 10.7%, 95%CI: 6.3–14.9;
RS 18–30 / ER+, no tamoxifen 17.8%, 95%CI: 11.8–23.3;
RS ≥31 / ER+, tamoxifen 15.5%, 95%CI: 7.6–22.8;
RS ≥31 / ER+, no tamoxifen 19.9%, 95%CI: 14.2–25.2;
Mina, 200651Population: 45/70 eligible patients (mean age 49 y, median TS 6.8 cm, TG-Well: 24%, TG-Poor: 49%, ER+: 57%, HER2+: 18%, LN+: 47%, adjuvant with doxorubicin / docetaxel, tamoxifen in ER+);End points: complete pathological response (pCR) to primary chemotherapy with anthracycline- and taxanes;Though the Oncotype DX RS correlated with pCR in the INT Milan cohort of the Gianni et al study,49 this association was not found in the present study
Exclusion criteria: NRClinical validity and utility results:
No correlation between Oncotype DX RS and pCR;
Paik, 200653Population: 651/670 eligible patients (TS < 2: 66%, TG-Well: 13%, TG-Poor: 28%, TS >2cm: 34%, ER+: 100%, LN-: 100%, tamoxifen treatment arm of NSABP B-20)End points: freedom from distant recurrence in women with ER-positive, node-negative breast cancer from NSABP B-20.The RS assay predicts the magnitude of chemotherapy benefit in women with node-negative, ER-positive breast cancer;
Exclusion criteria: specimen shows <5% invasive tumor, insufficient RNA extracted from specimen, weak RT-PCR signal (average cycle threshold for reference genes >35)Clinical validity and utility results:If RS risk groups are considered:
20.6% of the patients RS<18, Tamoxifen, 96.8 93.7% to 99.9%; a minimal benefit from chemotherapy is seen in the low risk group, however with large intervals;
33.5% of the patients RS<18, Chemotherapy, 95.6 92.7% to 98.6%; benefit is not assessable in the Intermediate risk group due to the uncertainty in the estimates;
7% of the patients RS >18 <31, Tamoxifen: 90.9 82.5% to 99.4%; a large chemotherapy benefit is seen in the high risk group
13.7% of the patients RS >18 <31, Chemotherapy: 89.1 82.4% to 95.9%;
7.2% of the patients RS>31, Tamoxifen: 60.5 46.2% to 74.8%;
18% of the patients RS>31, Chemotherapy: 88.1 82.0% to 94.2%;

ER = estrogen receptor; TG = tumor grade; LN = lymph node; HER = human epidermal growth factor receptor; NR = not reported: RECIST = response evaluation criteria in solid tumors; CR = complete response; RS = recurrence score; OR = odd ratio; CI = confidence interval; AUROC = area under Receiver operator curve; pCR = pathological complete response; TS = tumor size; RT-PCR = reverse transcriptase polymerase chain reaction; INT = Italian National Cancer Institute of Milan, Italy; NSABP = The National Surgical Adjuvant Breast and Bowel Project.

A synopsis of the clinical utility evidence presented in the following section is reported in Table 19.

Paik et al., 2006.53 The authors used the Oncotype DX assay to investigate whether the RS was a predictor of the benefit from chemotherapy in ER-positive, lymph node negative, breast cancer patients. This study used 651 patients from the NSABP B-20 randomized trial and compared a group treated with both tamoxifen and chemotherapy with a group of patients who were randomized to tamoxifen only. Gene expression analysis was found to be correlated with chemotherapy benefit, defined in terms of 10-year distant recurrence-free survival (DRFS).

Kaplan-Meier analysis on all patients showed a significant benefit from the use of chemotherapy (P value = 0.02), however when the data was stratified by RS risk groups, only the high RS risk group of patients benefited from using chemotherapy (P value = 0.001).

When the authors used multivariate Cox proportional hazard analysis, findings about the benefit from chemotherapy use were unclear due to large confidence intervals in the low and intermediate RS risk groups (low RS risk group, RR=1.31; 95 percent CI: 0.46–3.78; intermediate RS risk group, RR = 0.61; 95 percent CI, 0.24 to 1.59). Patients classified in the high RS risk group, however, showed a significant benefit from the use of chemotherapy (RR=0.26; 95 percent CI: 0.13–0.53).

The authors also looked for interaction between each variable and chemotherapy treatment using separate likelihood ratio tests. The RS was the only significant interaction (P=0.038), with only slight statistical weakening when age, tumor size, tumor grade and site were added to the model individually (P values from 0.035 to 0.068). When RS was fit as a continuous score, there was not a clear threshold that predicted no benefit for chemotherapy.53

Overall, this study produced preliminary, high-quality evidence that the RS from the Oncotype DX assay has clinical utility, i.e. predictive power in assessing the benefit of chemotherapy usage in ER-positive, lymph node negative breast cancer patients. The embedding of this study within a large, well conducted RCT was a strength. However, some patients from the tamoxifen-only arm of the NSABP B-20 trial were in the training data sets for the Oncotype DX assay. While the algorithm was trained for the outcome of recurrence and not chemotherapy benefit, optimization of recurrence prediction in one arm of this study could translate into a somewhat enhanced estimate of chemotherapy benefit, although it is unlikely to account for the large effect seen here. Finally, while the models could not sustain the inclusion of all possible clinical variables, they could have included a composite score, either standard risk predictors, or one tailored for the data set (Table 19, Appendix I, Evidence Tables 1, 2 and 4).

Correlation between RS and chemotherapy response

Gianni et al., 2005.49 This study focused on the complete pathological response (pCR) to preoperative chemotherapy in node negative and positive patients, looking at the correlation between pCR and RS. Two independent cohorts of patients were used, the cohort from the Italian National Cancer Institute of Milan, Italy, and the M.D. Anderson Cancer Center cohort from the M.D. Anderson Cancer Center of Houston, U.S. (Appendix I, Evidence Table 2), and were evaluated by two different technologies (RT-PCR and the Affymetrix hgu133a array). The study also identified additional genes that are associated with pCR and allowed the development of a new gene panel associated with pCR, as well as the evaluation of the association of Oncotype DX RS with pCR.

Results of the Oncotype DX assay in the Milan cohort. Three hundred and eighty-four genes were analyzed by RT-PCR in the Milan cohort of patients, including the 21 genes assessed by the Oncotype DX assay. Data showed good discrimination of pCR by RS. Probit regression-based models with and without the incorporation of the RS resulted in a P value of 0.005 in a global likelihood ratio test.

Preliminary evidence that the RS from the Oncotype DX assay has predictive power in assessing the likelihood of pCR after pre-operative chemotherapy was obtained in this study. (Table 19, Appendix I, Evidence Tables 1, 2 and 4).

Mina et al., 2006.51 In this study paraffin-embedded pre-treatment core biopsies from a completed phase II trial of 70 patients with newly diagnosed stage II or III breast cancer who were treated with sequential doxorubicin and docetaxel were used to identify genes that correlate with response to pCR. Gene expression was investigated by RT-PCR in 45 patients, using the same procedures of the Oncotype DX assay. A total of 192 genes (187 candidate genes and 5 reference genes) were tested, including those used to compute the Oncotype DX Recurrence Score.

Individual genes, as well as groups of biologically related genes, were found to be associated with pCR, however no correlation between Oncotype DX RS and pCR was found (P = 0.67). A total of 22 individual genes had an uncorrected P value of less than 0.05 in a likelihood ratio test derived from logistic regression models; however 13 genes would be expected to correlate with pCR at the P value level of 0.05 level by chance alone.

This study provides preliminary evidence that the RS from the Oncotype DX assay cannot predict pCR after primary chemotherapy in advanced breast cancer patients (with variable ER and HER-2 status, lymph node involvement, tumor size, and tumor grade) (Table 19, Appendix I, Evidence Tables 1, 2 and 4).

Chang et al., 2007.55 This study is currently in press for Breast Cancer Research Treatment. The authors investigated if expression of the 21 genes of the Oncotype DX assay and other candidate genes in locally advanced breast cancer tumors could be used to predict response to docetaxel treatment. The 97 women in this study were diagnosed and were enrolled into three phase II studies with the neoadjuvant docetaxel at Baylor College of Medicine, Houston, U.S. Clinical response was assessed by Response Evaluation Criteria in Solid Tumors (RECIST) criteria: clinical complete response (CR) was defined as complete disappearance of the tumor, while partial response (PR) was defined as at least 30 percent decrease in unidimensional size. An increase of more than 25 percent was defined as clinical progressive disease (PD). Any response that did not meet the definition of CR, PR, or PD was defined as stable disease (SD). All patients received primary surgery and standard adjuvant therapy. Core biopsies from 97 patients were obtained before treatment and RNA levels of expression for the selected genes were studied by RT-PCR, following the specified protocols for the Oncotype DX assay.

Of the selected 97 patients, 81 (84 percent) had sufficient invasive cancer, 80 (82 percent) had sufficient RNA to perform the RT-PCR based assay, and 72 (74 percent) had known clinical response data. The mean age was 48.5 years, while the median tumor size was 6 cm. A clinical CR was observed in 12 patients (16.7 percent) a partial response in 41 (56.9 percent), a stable disease in 17 (23.6 percent), while progressive disease was present in 2 patients (2.8 percent). Pathologically, pCR was observed in 2 patients (3.2 percent), ‘incomplete’ responses were observed in 61 patients (96.8 percent), and pathologic response was unknown for 9 patients.

The authors found that a CR was more likely associated with a high RS (P = 0.008). When the RS was used as continuous variable, a 50 unit increase in the RS was associated with a five-fold increase in the odds of achieving clinical CR (95 percent CI 1.3, 6.0). Moreover, the logistic model for the RS indicated that a 14-unit increase in the RS (the difference between low and high risk groups, as defined by the standard thresholds) was associated with a complete clinical response odds of 1.7 (95 percent CI 1.15, 2.60). The authors concluded that a high risk patient is at least 1.7 times more likely to achieve a clinical CR with neoadjuvant chemotherapy compared to a low risk patient. Finally, the accuracy of the Oncotype DX RS in predicting the response to adjuvant chemotherapy with docetaxel throughout the range of RS values was judged to be at least moderate, with AUC of 0.73.

Overall, this study provided preliminary evidence that the RS from the Oncotype DX assay has predictive value in assessing the likelihood of a clinical CR to primary chemotherapy with docetaxel. However the small cohort patients points to the need for further confirmation (Table 19, Appendix I, Evidence Tables 1, 2 and 4).

Oncotype influence on decisionmaking

Oratz et al., 2007 (in press).56 This study investigated whether the Oncotype DX RS had influenced both clinicians' treatment recommendations and the actual treatment administered in patients with ER positive, lymph node negative, early (stage I or II) breast cancer. A retrospective analysis was performed on 74 patients from a community-based oncology practice for whom RS was determined. Treatment recommendations prior to RS knowledge were compared with treatment recommendations after RS knowledge, and to the treatment eventually administered.

Knowledge of RS changed the clinicians' treatment recommendations in 21 percent of patients, and the actual administered treatment in 25 percent of the patients. In particular, the decision to add chemotherapy to the hormonal therapy was generally associated with the high-risk group, whereas the decision to change from chemotherapy to hormonal therapy was associated, in general, with low RS.

While this study produced preliminary evidence that knowledge of the RS from the Oncotype DX assay can have an impact on the clinical management of patients diagnosed with ER positive, lymph node negative, early breast cancer, it did not report specifically what the patients (or doctors) were told or understood about their risk of recurrence. Because it is unknown whether absolute risks were a factor in decision-making, the study is minimally informative as to the actual risk thresholds used by women and their treating physicians (Appendix I, Evidence Tables 1, 2 and 4).

Economic studies

Hornberger et al., 2005.67 The objectives of this study were twofold. First, the authors sought to estimate the incremental benefits, costs, and cost-effectiveness of using Oncotype DX to better assign risk of distant recurrence-free survival associated with early stage breast cancer. Secondly, the authors wanted to assess the factors that most influence potential benefits and efficient use of the 21-gene RT-PCR recurrence score. The outcomes of interest to the study included overall survival, relevant costs of breast cancer care, and distant recurrence-free survival.

Table 20

Comparison of economic studies
StudyTest evaluatedComparison (guidelines)Economic outcomes evaluatedEstimated cost differenceEstimated difference in mean QALYsConfidence in the analysis
Hornberger67Oncotype DXNCCNCost, QALY, DRFS, OS$2,028 in favor of RS0.086 in favor of RSModerate
Lyman75Oncotype DXNCCNCost, LYS, C/E, ΔCost, ΔLYS, ΔC/E LYS, QALYRS = $4,272 vs. TamRS = +0.97 vs. TamWeak
RS = -$2,255 vs. ChemoRx+TamRS = +1.71 vs. ChemoRx+Tam
Oestreicher76GEPNIHCost, QALY$2,882 in favor of GEP0.22 in favor of NIHStrong

QALY = Quality Adjusted Life Years; NCCN = National Comprehensive Cancer Network; DRFS = Distant Recurrence-Free Survival; OS = Overall Survival; RS = Recurrence Score; LYS= Life Years Saved; C/E=Cost Effectiveness; Δ= Change in; ChemoRx= Chemotherapy; Tam=Tamoxifen; GEP = Gene Expression Profiling; NIH = National Institutes of Health.

Cost-utility analyses used a Markov model to forecast overall survival, quality of life, costs, and cost-effectiveness. Two scenarios were considered, based on NCCN classification of patients with lymph node negative, ER positive, early stage breast cancer who were expected to receive 5 years of hormonal therapy into a low risk (T1a N0-1mi) group that did not receive chemotherapy versus a high risk (T1b with unfavorable features or T1c) group that did receive chemotherapy. Patients were then reclassified using the RS. Annual risks of recurrence and survival were obtained from published meta-analyses of clinical trials, and the study model included costs of the assay and drugs, including chemotherapy (Table 20, Appendix I, Evidence Tables 1, 2, and 5).

Summary of study findings. The analysis reported that using the 21-gene RT-PCR assay to reclassify patients who were defined by NCCN criteria as low risk (to intermediate or high risk) would lead to an average gain in overall survival per reclassified patient of 1.86 years. Total cost estimates increased by about $25,000. This amount included $12,190 to identify intermediate- or high-risk patients and at least $15,000 for chemotherapy, and was offset by savings of $2,344 because of the lower risk of recurrence. The cost-utility of RS testing for this cohort was $31,452 per quality-adjusted life-year (QALY) gained.

The authors also reported that reclassifying patients defined as high risk (by 2005 NCCN criteria) to low risk (using the 21-gene RT-PCR assay) was cost saving. The added cost of testing ($7,073) to identify 1 reclassified patient was offset by an estimated $15,000 in savings for eliminating the need for chemotherapy.

Using the 21-gene RT-PCR assay was expected to improve quality-adjusted survival by a mean of 8.6 QALYs and reduce overall costs by about $203,000 in a hypothetical population of 100 patients with characteristics similar to those of the NSABP B-14) participants, more than 90 percent of whom were NCCN-defined as high risk. The estimated cost-effectiveness was most influenced by the propensity to administer chemotherapy based on the RS, and by the very small proportion of patients at low risk as defined by 2005 NCCN guidelines. The 2007 NCCN guideline indicates that the use of chemotherapy in these patients is now considered optional, thereby diminishing the utility of this model.

Critical appraisal of the analysis. The EPC team appraised the analysis using published guidelines for good practice in decision-analytic modeling in health technology assessment, Philips 2004.41 The appraisal took into consideration the domains of structure, data, and consistency (Table 20, Appendix I, Evidence Table 5).

Structure and Data. The authors provided a clear description of many aspects of the structure of the analysis, including the decision problem, objectives of the evaluation, perspective of the analysis, rationale for the model structure, and structural assumptions. However, the model inputs were not entirely consistent with the stated perspective of the analysis. For instance, the model did not include all costs that are relevant from a societal perspective such as decreased productivity and days lost from work. Also the authors did not address the limitations in how utility estimates were derived. This is an important limitation because utility estimates can vary a lot depending on the methods that are used to derive the estimates. The authors also did not justify extrapolating beyond the 10-year followup period for which recurrence data is available. Finally, the authors did not report much information about their assessment of methodological and structural uncertainties. Without such information it is difficult to determine how their projections might differ if different assumptions were made in the decision model.

The authors correctly pointed out that the 2005 version of the NCCN breast cancer guideline recommends chemotherapy for all node-negative tumors greater than 1 cm (T1a).74 Since 84 percent of the patients included in the Paik study28 had tumors larger than 1 cm (T1c), it is unsurprising that a very large proportion of patients overall would be spared chemotherapy (gene expression profiling data expected to identify approximately half of these patients to have a low RS). However, by 2007 the NCCN panel had refined its criteria for recommending chemotherapy6, now considered optional (adjuvant hormonal therapy ± chemotherapy) for those with ER-positive HER-2-negative disease and tumors greater than 1cm (T1c). Therefore, it is reasonable to speculate that approximately half of these patients might opt for no chemotherapy. This is a similar proportion of patients that would be found to have a low RS, although these two groups of patients may not necessarily be the same.

Consistency. Appendix I, Evidence Table 5 notes that the authors did not report information about the internal and external consistency of their analysis. The analysis would be more convincing if it gave more information on whether the mathematical logic of the model had been tested (internal consistency) or if results from other models were available for comparison (external consistency). Nevertheless, the results of the model make intuitive sense and seem to be consistent with published data on the performance characteristics of the 21-gene RT-PCR recurrence score.

Summary of critical appraisal. Overall, the EPC team concluded that this economic analysis met most of the standards set by the rigorous guidelines of Phillips et al., 200441. It is not clear whether the limitations noted above biased the results for or against the 21-gene RT-PCR assay, but extension of the timeframe beyond 10 years could overstate the benefits of using the assay. Given that this study was sponsored in part by the manufacturer of the 21-gene RT-PCR assay (Genomic Health, Inc., Redwood City, California), the EPC team would have had more confidence in the results if the authors had provided more information about methodological and structural uncertainties as well as other potential sources of bias such as the derivation of the utility estimates. The generalizability of these results to patients in 2007 is also limited, as the 2005 NCCN guidelines have since been updated. Thus, the team has only moderate confidence that the results of the economic analysis provide reasonable estimates of the potential cost-effectiveness of using the 21-gene RT-PCR assay to guide treatment of early stage breast cancer

Lyman et al., 2007.75 The main objective of the second study7 was to estimate the cost-effectiveness of 21-gene RT-PCR assay-guided treatment of patients with ER positive, lymph node-negative, early-stage breast cancer with either tamoxifen alone or the combination of chemotherapy and tamoxifen.

This analysis incorporated data that validated the prognostic accuracy for distant RFS using a 21-gene RT-PCR assay in 668 lymph node-negative, ER positive women with early-stage breast cancer receiving tamoxifen on NSABP B-14. The analysis also incorporated data that validated the predictive accuracy for treatment efficacy in 651 patients randomized in NSABP B-20, and 645 patients in NSABP B-14.

The study design involved cost-utility analyses using a “clinical decision model” designed to compare clinical, economic, and quality of life outcomes for three adjuvant treatment strategies: 1) tamoxifen alone, 2) chemotherapy followed by tamoxifen, or 3) therapy based on the results of the 21-gene RT-PCR assay. Using the RS from the 21-gene RT-PCR assay, patients were classified as high risk (RS ≥ 31), intermediate risk (RS 18–30), or low risk (RS < 18) for distant recurrence at 10 years. The third strategy assumed that low-risk patients would receive tamoxifen, and intermediate or high-risk patients would receive chemotherapy and tamoxifen. Clinical outcomes were estimated in terms of life expectancy or life-years saved as derived from NSABP B-20 and B-14 data. Economic outcomes included selected costs of cancer care, including the costs of chemotherapy, surveillance without recurrence, use of the 21-gene RT-PCR assay, and treatment of recurrence. Quality of life outcomes were estimated based on the utility associated with use of chemotherapy. The treatment strategies were compared in terms of the additional cost of one strategy over another (marginal cost), the additional clinical benefit (marginal efficacy), and the additional quality-adjusted clinical benefit (marginal utility) (Table 20, Appendix I, Evidence Tables 1, 2 and 5).

Summary of study findings. The lowest expected mean cost per life-year saved was associated with treatment with tamoxifen alone ($11,890), whereas the greatest expected mean cost was associated with treatment with both chemotherapy and tamoxifen ($18,418). The expected cost of each strategy increased as the assumed cost of treating distant recurrence increased. Above a cost of $100,759 for treating recurrence, therapy guided by the RS provided a net cost savings compared with other strategies and was always cost-saving compared with the chemotherapy and tamoxifen strategy. The tamoxifen strategy was associated with the lowest costs for all reasonable followup cost assumptions among those without recurrence. Therapy guided by the RS was favored over chemotherapy and tamoxifen for total chemotherapy costs exceeding $5,822. The use of therapy guided by the RS was more costly for low-cost chemotherapy regimens not requiring additional supportive care, whereas a net cost savings between $500 and $10,000 was estimated with RS guided therapy for other commonly used and higher-cost adjuvant chemotherapy regimens.

Compared to tamoxifen alone, the expected incremental cost associated with RS-guided therapy was $4,272. The expected incremental cost associated with chemotherapy and tamoxifen was $6,527. The incremental cost-effectiveness ratio compared with tamoxifen alone favored the use of RS-guided therapy ($1,944 per life-year saved) over the use of chemotherapy and tamoxifen ($3,385 per life-year saved). When the analysis considered increases in healthy life expectancy, the incremental life-years saved increased for the RS-guided therapy compared with tamoxifen alone, and the corresponding marginal cost-effectiveness decreased.

Expected QALYs favored RS-guided therapy over chemotherapy and tamoxifen for all health utility values, with increasing incremental QALYs as the impact of chemotherapy on measured utility increased. Recurrence-score-guided therapy had greater expected QALYs compared with tamoxifen alone, until the utility associated with chemotherapy fell below 0.80. At a utility of 0.90 for adjuvant chemotherapy, RS-guided therapy was associated with a gain of 0.97 QALYs, a cost-utility ratio of $4,432 per QALY compared with tamoxifen alone, and a gain of 1.71 QALYs with net cost savings when compared with the chemotherapy and tamoxifen combination.

Critical appraisal of the analysis. The EPC team appraised the analysis using published guidelines for good practice in decision-analytic modeling in health technology assessment Phillips 200441, taking into consideration the domains of structure, data, and consistency (Table 20, Appendix I, Evidence Table 5).

Structure. Although the authors provided a clear description of the decision problem, they did not state the perspective of the model. Moreover, the authors did not provide enough information about the structure of the model to allow an evaluation of the appropriateness of the model type or of the causal relationships described by the model. The authors also did not justify extrapolating beyond the 10-year period for which recurrence data is available.

Data. The authors provided some explanation and justification of the data used in the analysis, citing previous work for some of the details. However, the authors did not include all relevant costs. They included the costs of adjuvant chemotherapy, surveillance, use of the Oncotype DX assay, and treatment of recurrence, but they did not include other treatment-related direct costs (e.g., costs of administration, associated testing, and transportation) or indirect costs (e.g., decreased productivity). Although indirect costs may be implicitly included in utility values assigned to relevant health states, the authors did not provide enough information to determine whether that was done. The analysis would have been stronger if it had estimated cost-effectiveness with and without inclusion of indirect costs and other treatment-related costs. The authors did not mention any health-state utilities other than the utility with chemotherapy, and did not give sufficient detail about how they estimated the utility with chemotherapy. In addition, the authors did not report on the quality of the data. A single study was used as the source of estimates for the relative effects of the treatment strategies. The authors also did not report sufficient information about the sensitivity analysis and alternative assumptions. Finally, the authors did not report much information about their assessment of methodological and structural uncertainties.

Consistency. The authors did not report information about the internal and external consistency of their analysis, but the results of the model make intuitive sense. Generally, the results seem to be consistent with the cited data on the performance characteristics of the 21-gene RT-PCR RS.

Summary of critical appraisal. Overall, the EPC team concluded that this economic analysis did not meet many of the standards set by the rigorous guidelines of Phillips et al., 200441. These limitations are particularly serious because the authors received research support from the manufacturer of the 21-gene RT-PCR assay. Consequently, the EPC team has little confidence in the results of this analysis.

Summary of available studies. Based on the evidence from the stronger of the two available studies, the EPC team concluded that the 21-gene assay, when used to guide treatment for patients previously classified as low risk by NCCN-defined criteria, may be cost-effective compared to standard treatment approaches in women with lymph node-negative, ER positive early-stage breast cancer. Similarly, the EPC team concluded that the 21-gene assay, when used to guide treatment for patients previously classified as high risk by NCCN criteria, may be cost-saving compared to standard treatment. The overall body of evidence on economic outcomes is weak because of the limitations of the two available studies.

MammaPrint

No published studies evaluated the ability of the 70-gene signature for the main MammaPrint assay to predict chemotherapy benefit.

Economic studies

Oestericher et al., 2005.76 The main objective of this study was to compare the cost-effectiveness of the Netherlands Cancer Institute gene expression profiling (GEP) assay to the NIH guidelines for the identification of early stage breast cancer patients who would benefit from adjuvant chemotherapy based on risk of distal recurrence. Although the references cited for the performance characteristics of the GEP assay indicate that the investigators were using data on MammaPrint, the article does not clearly state that they were analyzing MammaPrint.

The study design involved a cost-utility analysis. Using a Markov model, the investigators estimated the incremental cost and QALYs associated with use of the GEP assay as compared to use of the NIH guidelines in a hypothetical cohort of premenopausal women averaging 44 years of age newly diagnosed with stage I/II breast cancer. The performance characteristics of the tests were based on data from the Netherlands Cancer Institute cohort.25 In the Markov model, the investigators assumed that the results of the GEP assay would be used to classify patients as having a “good prognosis” or a “poor prognosis” based on a test cutoff derived from the first validation study of the GEP assay.21 They also assumed that the NIH guidelines would be used to classify patients as having a “good prognosis” or a “poor prognosis,” that women with a “poor prognosis” would receive adjuvant chemotherapy, and that women with a “good prognosis” would not receive chemotherapy. The model considered the following clinical events: distant recurrence of breast cancer, mortality due to distant recurrence, and mortality from other causes. The economic outcomes included the cost of the GEP assay, the cost of adjuvant chemotherapy, and the cost of managing distant recurrence of breast cancer. Quality of life outcomes were estimated in terms of QALYs, with utility estimates for specific health states derived from previous publications. The two strategies were compared in terms of the number of cases of distant recurrence prevented, costs, and QALYs. (Table 20, Appendix I, Evidence Table 5).

Summary of study findings. The NIH guidelines identified 96 percent of the cohort as high risk whereas the GEP identifies 61 percent of patients as high risk with sensitivities of 98 percent for the NIH guidelines and 84 percent for GEP. Specificities were 51 percent for GEP and 5 percent for the NIH guidelines. Since there is a 35 percent risk reduction in distant recurrence from use of chemotherapy, using NIH guidelines to identify high-risk women and treat with chemotherapy prevented 34 percent of distant recurrences compared to 29 percent for GEP. After including the negative impact on life expectancy and quality of life from chemotherapy and distant recurrence, the NIH guidelines and GEP yielded 10.08 and 9.86 QALYs respectively. Total costs were $32,636 for the NIH guidelines and $29,754 for GEP.

Although the GEP assay was projected to identify 35 percent fewer women for chemotherapy than NIH guidelines, quality of life benefits in the women who did not need chemotherapy were outweighed by the decrease in life expectancy in the women who needed chemotherapy but did not receive it because of GEP's lower sensitivity.

The authors concluded that, in order to improve quality of life by allowing women to safely avoid chemotherapy while not missing women whose survival is compromised by avoiding therapy, GEP's sensitivity would have to increase to at least 95 percent while maintaining a specificity of 51 percent. The GEP assay did not attain a sensitivity of 95 percent regardless of the test cutoff used in the analysis.

Critical appraisal of the analysis. The EPC team appraised the analysis using published guidelines for good practice in decision-analytic modeling in health technology assessment,41 taking into consideration the domains of structure, data, and consistency (Table 20, Appendix I, Evidence Table 5).

Structure. As indicated in Table 20, the authors provided a clear description of most aspects of the structure of the analysis, including the decision problem, objectives of the evaluation, perspective of the analysis, rationale for the model structure, and structural assumptions. The model inputs were consistent with the stated perspective of the analysis. The authors did not justify using a timeframe beyond the 6.7-year period for which recurrence data is available.

Data. The article was very strong in providing explanation and justification of the data used in the analysis. Limitations were that the authors did not justify extrapolation of data beyond 6.7 years of followup and that they only compared their model to the NIH guideline. In addition, although the authors listed a number of references for their use of utilities, they did not provide any explanation of how they derived specific utility estimates from these references. They also did not provide any explanation of the methods or scaling techniques that were used to derive the utility estimates. Thus, we can not determine whether the utilities were based on the standard gamble techniques, which is the gold standard, or on other scaling techniques. This is important because the standard gamble techniques generally yields utility values that are higher than the values derived using other techniques. The estimates used in this study seem low compared to the values assigned to most serious health conditions.77,78 Also, these references for the utility estimates are significantly more dated than some of the references used to obtain cost data.

Consistency. The authors discussed the internal and external consistency of their analysis, and the results of the model make intuitive sense.

Summary of critical appraisal. Overall, the EPC team concluded that this economic analysis met most of the rigorous standards set by Phillips et al., 2004.41 The EPC team therefore has confidence in the results of this analysis. Although we had some uncertainty about the utilities used in the analysis, the EPC team believes that this limitation is unlikely to have changed the overall conclusion of the authors, which is based on the lack of sensitivity of the GEP assay.

H/I Ratio

Jerevall et al., 2007.63 This paper investigated whether the two-gene ratio can predict the benefit of 2 years versus 5 years of tamoxifen treatment in postmenopausal breast cancer patients, and also predict the prognostic value in systematically untreated premenopausal patients. Expression of HOXB13 and IL17BR were quantified by RT-PCR in tumors from 264 randomized postmenopausal patients and 93 systemically untreated premenopausal patients. The two study populations were collected as part of a collaborative study between two centers in Sweden, and 72 percent of the randomized patients were lymph node positive and 74 percent ER positive. To stratify the patients into risk groups the authors dichotomized the ratio using the median. Thus the normalization procedure and dichotomization differed from the approach used by Ma.61 The prognostic results from this study are reported under Key Question 3 (clinical validity).

Kaplan-Meier analysis of data from postmenopausal ER-positive patients demonstrated that a low HOXB13-to-IL17BR ratio was associated with a benefit to receiving 5 vs. 2 years of tamoxifen treatment (univariate P= 0.021; in KM analysis). There was no benefit (P=0.9) in patients who had a high ratio, which mainly appeared due to the low expression of HOXB13 genes (P= 0.010, in Kaplan-Meier analysis). The predictive significance of both the two-gene ratio and the HOXB13 gene alone was maintained using a Cox proportional hazard modeling, adjusting for tumor size, PR status, and lymph node status.

The authors concluded that the ratio, or even HOXB13 alone, could predict the benefit of prolonged endocrine therapy, and that a lower expression of IL17BR, given its correlation to poor prognosis, could be an independent prognostic factor.

Table 21

Clinical Utility, two-gene signature and H/I ratio
Study, YearPopulation size, NEnd Points and Major FindingsComments
Jerevall, 200763Population: 357 patients analyzed, 264 post-menopausal, and 93 pre-menopausal.End points: Relapse-free survival (RFS), defined as the time from diagnosis to local, regional, or distant recurrence or death due to breast cancer; OS, defined as the time elapsed from diagnosis to the date of death due to breast cancerIn this study the expression levels were normalized to b-actin using fresh frozen samples. Patients were collected from two distinct institutions; of 373 tumor samples analyzed, RNA expression data were obtained from 357 tumors
 Postmenopausal patients: randomized clinical trial, comparing 2 years (163 patients, 62%) and 5 years (101 patients, 38%) of adjuvant tamoxifen treatment.Clinical validity and utility results:The ratio or HOXB13 alone can predict the benefit of endocrine therapy, with a high ratio or a high expression rendering patients less likely to respond
Exclusion criteria: NR Post-menopausal ER+ patients, low ratio: benefit from prolonged tamoxifen (P = 0.021; in KM analysis for RFS) due to the low expression of HOXB13 genes (P = 0.010, in KM analysis for RFS)
Postmenopausal ER+ patients (n=179), multivariate Cox proportional hazard model analysis:
 Recurrence Rate (5y vs 2y), low ratio: 0.39 (CI 95% = 0.17–0.91), P value = 0.030
 Test for interaction: P value = 0.035
 Recurrence Rate (5y vs 2y)
Postmenopausal ER+, node negative, patients (n=134), multivariate Cox proportional hazard model analysis:
 Recurrence Rate (5y vs 2y), low ratio: 0.27 (CI 95% = 0.10–0.72), P value = 0.0087

BCP=breast cancer profiling; NR = not reported; RFS = relapse-free survival; OS = overall survival; ER= estrogen receptor; KM=Kaplan-Meier; CI= confidence interval.

Neither the patient profile nor the mode of calculation of the ratio were identical to previous studies (Table 21, Appendix I, Evidence Tables 10, 11 and 13). However this study produced additional developmental evidence about the prognostic utility of the HOXB13-to-IL17BR ratio, and of the two individual genes, in ER positive breast cancer patients who received systemic adjuvant therapy.

Ongoing Studies

TAILORx (Trial Assigning IndividuaLized Options for Treatment (Rx))

The primary objective of TAILORx is to compare the DFS of women with previously-resected axillary-node-negative breast cancer who have an Oncotype DX RS of between 11 and 25 when treated with both adjuvant chemotherapy and hormonal therapy versus hormonal therapy alone. It should be noted that this range is lower on both ends than the standard “Intermediate” RS range, viz. 18–30. This represents a more conservative approach to the use of the RS than is suggested by current categories, in that subjects who agree to forego chemotherapy in this trial will be at lower risk than those in the current “low risk” RS group. The secondary objective is to determine if adjuvant hormonal therapy alone is sufficient treatment (i.e., 10-year distant DFS of at least 95 percent) for patients with an RS of less than or equal to 10.

This study will not provide direct evidence for the value of Oncotype DX, as all patients in the trial will receive the test. The trial results will indicate whether adjuvant chemotherapy is of value within the trial's intermediate RS range, and will serve as further validation of the absolute risk of recurrence in subjects with scores above and below the range. This will provide better estimates of the degree of benefit from utilization of the test, but will not directly examine what therapeutic choices would have been made and clinical outcomes incurred if only standard risk prediction tools were used. However, since standard risk prediction indices will be calculable, that information may be inferred. First results from this trial are expected in approximately 2013.

MINDACT (Microarray for Node-Negative Disease may Avoid Chemotherapy)

MINDACT is a multi-center, prospective, phase III randomized study comparing use of the MammaPrint assay with a common clinical-pathological prognostic tool, Adjuvant! Online, to select patients for adjuvant chemotherapy in node-negative breast cancer. Patients at low risk by both MammaPrint and standard clinical-pathological criteria will not receive chemotherapy, patients at high risk by both criteria receive chemotherapy, and patients with discordant criteria will be randomized to use either MammaPrint only or standard criteria to decide treatment (i.e., randomized to receive adjuvant chemotherapy or not). This will directly test whether the choice of chemotherapy guided by MammaPrint provides benefit over that guided by the Adjuvant! criteria.

Other Relevant Studies

Fan et al., 2006.79 No key questions relevant to the evaluation of gene expression-based prognostic estimators was directly addressed in this study, but the agreement between gene-expression tests and other predictors was evaluated, as well as their individual performance on a common dataset. In particular, the 70-gene signature, the gene panel used in Oncotype DX, the 2-gene ratio, and other gene expression signatures were considered. This investigation was carried out on the 295 samples from stage I–II breast cancer patients, which had been used to develop the 70-gene test21. The Oncotype DX RS and the 2-gene ratio were estimated from microarray gene expression data (i.e., not RT-PCR), and thus were not obtained according to the protocols and methods used in the corresponding marketed assays. These are therefore described as “derived” scores below.

All tests except the 2-gene ratio (hazard ratio of about 1) were highly significant predictors of OS and DFS. The agreement between MammaPrint and derived RS was 81 percent (239/295). However the intermediate and high risk groups, as defined by the RS gene panel, were considered as one group in this paper and compared to the poor prognosis group of patients, as defined by the MammaPrint signature. ER status, tumor grade, tumor size, and lymph node involvement also proved to be significant univariate predictors. The coefficients of clinical predictors were allowed to vary between models in this analysis. All the analyses were repeated for the ER positive (N=225) subset with qualitatively similar results. Good, but not perfect correlation between predictions was found. This was surprising since classification was obtained using different gene sets. The degree of prediction over and above “standard” clinical stratifiers was not clear in the paper and the reclassification of samples was not done.

This study is of interest since it compared 5 different classifiers. However, it should not be regarded as a validation of either the Oncotype DX or the H/I ratio assays, since actual tests were not used on these patients and the RS and the two-gene index estimates were obtained from microarray data. In addition, since this was the same dataset used in the development of the 70-gene signature, it would be expected to perform better than the RS, for which this was a true test set.

Espinosa et al., 2005.80 In this paper the authors developed an RT-PCR based version of the 70-gene expression signature21,25 RT-PCR was used to measure, in breast cancer biopsy specimens, the expression of the 70-gene signature, as well as four additional genes (HER-2, EGFR, PLAT, and MUC-1) related to prognosis. The study population was 96 patients diagnosed between 1991 and 1997 for whom samples and followup were available and who were seen in a single Madrid hospital. Half of the patients were lymph node positive, 75 percent ER positive, and 25 relapses were observed after a median of 70 months of followup. Eighty percent of ER positive patients received tamoxifen, and 74 percent of patients overall received adjuvant chemotherapy.

The objective of the authors was to reproduce the results obtained with the 70-gene profile through an alternative technology. However, for technical reasons only 60 of the 70 genes could be investigated. For this reason, the study cannot be considered a validation of the 70-gene signature. According to the results obtained by RT-PCR, Kaplan-Meier estimates for RFS and OS in the good and poor profiles patients' groups were as follows:

  • RFS for Good vs. Poor prognosis profile 70 months after surgery: 85 percent vs. 62 percent.

  • OS for Good vs. Poor prognosis profile 70 months after surgery: 97 percent vs. 72 percent.

Univariate and multivariate Cox proportional regression analyses were performed to compute a hazard ratio for the risk groups for both endpoints. Only the lymph node status (hazard ratio, 1.2; 95 percent CI, 1.09 to 1.36) and the gene profile (hazard ratio, 6.3; 95 percent CI, 1.28 to 31.07) proved to be independent prognostic variables for OS. Only the number of positive lymph nodes (≤ 3 versus >3) (hazard ratio, 1.13; 95 percent CI, 1.05 to 1.25) and again the gene profile (hazard ratio, 2.74, 95 percent CI, 1.13 to 6.61) were independent prognostic variables for RFS.

In subgroup analyses, the signature did not predict significantly in lymph node negative patients (many of whom received adjuvant chemotherapy), or in women >52 years of age.

The profile predicted both local and distant relapses in the general population of women with breast cancer. In the poor-prognosis group, most patients survived less than 2 years after relapse, regardless of the site of first relapse. In contrast, patients in the good prognosis group usually had low-risk relapses and survived longer than 2 years after relapse.

This study cannot be considered an independent validation of the MammaPrint assay, since only 60 out of 70 genes were considered, the genes were assessed by a different technology (RT-PCR rather than microarray), and the population was far more heavily treated with adjuvant chemotherapy than previously-tested populations. It therefore did not test a population in whom these results would have a clear implication for therapeutic decisions.

Studies Excluded Upon Complete Review

Eden et al., 2004.81 This paper was excluded because it did not provide new information on the assays investigated. The gene expression markers identified by van't Veer and colleagues21 were compared to both conventional markers and newly constructed indices to predict distant metastases. However, analysis was conducted in the same van't Veer cohort patients, and therefore was not a new validation of the 70-gene signature.

Weigelt et al., 2005.82 This paper was excluded because it does not include prognostic information for the investigated assays, although it does provide some useful biologic insights. These authors showed that distant metastases display both the same molecular breast cancer subtype and 70-gene prognosis signature as their primary tumors. These results suggest that the capacity to metastasize is an inherent feature of most breast cancers, implying that poor-prognosis breast carcinomas, as classified by the intrinsic gene set or the 70-gene profile, represent distinct disease entities. These findings support the hypothesis that molecular subtypes might originate from different cell types within the breast, therefore reflecting different biological entities and maintained throughout the multistep metastatic process. Indeed the metastatic nature of poor-prognosis breast carcinomas, which are depicted by the 70-gene profile or the luminal B, HER-2 positive, or basal-like molecular subtype, is an inherent feature of breast cancers that remain stable with time and across distinct tumor outgrowth locations within the same individual.

Nuyten et al., 2006.83 This paper was excluded because the authors used a subset of the van de Vijver25 data set and looked at local recurrence.

This group searched for gene expression signatures that predict the risk of local recurrence after breast-conserving therapy (BCT) in a series of 161 early-stage breast cancer patients who were a subset of the original van de Vijver25 cohort. The 70-gene signature, originally designed to predict metastasis, failed to predict local recurrence after BCT.

In this paper other gene signatures were evaluated. The supervised wound-response signature22,84 is the only gene expression profile that could predict a local recurrence after BCT, while both the 70-gene and the primary hypoxia signatures85 failed to predict metastases.

Naderi et al., 2007.86 This study was excluded because it was not related to the assays investigated for this review. The authors developed a Cox-ranked 70-gene signature, which is a ‘new’ signature, and it is not related to the MammaPrint test.

Sun et al., 2007.87 This paper was also excluded because it is not related to the assays investigated for this review. The author developed a new predictor (with only 3 genes from the 70-gene profile) for recurrence based on the van't Veer data set and used the 70-gene signature for comparison: the new signature performed better than 70-gene signature.

Chapter 4. Discussion

Using the analytical framework described, we evaluated the evidence available on three commercially available gene expression based assays, and on the gene expression profiles underlying these tests. Specifically, our review focused on the MammaPrint® assay, based on the 70-gene prognostic signature developed by van't Veer and colleagues,21,25,58,59 on the Oncotype DX™ assay, based on the 21-gene profile developed by Paik and colleagues, 28,50,53 and on the Breast Cancer Profiling (BCP) assay, based on the two-gene ratio signature developed by Ma et al.61,64

The first question, (is there any direct evidence that these tests in breast cancer patients lead to improvement in outcomes?) is defined as randomized clinical trials comparing the outcomes of patients following standard management to those of patients managed with the aid of the expression-based assays. No such studies have been conducted. Two prospective randomized trials are in progress: TAILORx35 and MINDACT36 were recently initiated to prospectively evaluate the clinical utility of Oncotype DX and MammaPrint, respectively. As described in Chapter 3, TAILORx will provide information on the appropriate RS threshold for recommending adjuvant chemotherapy, and will not directly assess the effect of clinical decisionmaking with and without the test. The data generated may allow indirect inferences to be made. MINDACT will allow more direct inferences on the clinical utility, since its will be compared directly to the use of a conventional risk index. For both trials, patient health outcomes will be endpoints.

The evidence available on the subsequent key questions allowed us to draw conclusions about the specific tests, as well as about the methodology of test development and current and future clinical uses of gene expression assays. Currently established methods for risk stratification of patients with breast cancer rely on a combination of prognostic factors like tumor size, grade, lymph node status, and presence of hormone receptors and the human growth factor receptor 2 (HER-2), such as the St. Gallen Consensus Guidelines5 or Adjuvant! Online.7 The latter also incorporates a nomogram to generate estimates of benefit from specific therapies. A critical question is how much gene expression-based tests add to standard risk assessment methods or guidelines. A second question is how clearly does the current evidence relate to the test's proposed use in a decisionmaking context, i.e., how well defined or homogeneous are the patient populations, in terms of their current therapy and decisions about future therapy? Is it clear how the test information should be implemented, i.e., using cutoffs, as a continuous score, or in combination with other indices? When viewed through the prism of clinical decisionmaking, the current evidence base for these technologies leaves many uncertainties.

Many aspects of expression-based predictors differ in qualitative ways from other kinds of risk predictors. First, the mechanism by which the expression of any particular gene, or combination thereof, is related to outcome is generally less well understood than with standard predictors, as are the methods by which the combinations are chosen. Gene expression levels are markers of activation or inactivation of complex biological processes. As Fan et al.79 demonstrated, similar risk classifications can be achieved with predictors having few or no overlapping genes. Second, there is no “gold standard” for gene expression values; the technologies used here - RT-PCR and microarrays - represent the state of the art. In the end it is less analytic validity (i.e., proximity to a true value) but analytic variability (i.e., variation in the calculated value) that must be understood to predict whether investigational results are likely to be similar to those produced in practice, and whether the results in practice are likely to be stable over time and with broader use. Third, we know little about the stability of the predictive value of such markers over populations with different genetic profiles. Arguments can be made that genetic predictors (particularly from tumors) are likely to be either more or less universal than physiologic ones, so there is still much to be learned about the generalizability of these rules. However, in spite of these differences, the latter half of the developmental pathway for these tests must follow the same principles and procedures as those for any multivariate clinical prediction rule. These have been outlined in detail in the clinical literature,94,95 enshrined in reporting guidelines,96 and articulated with specific respect to expression-based predictors in a series of articles by Simon.68,71,97

The three signatures and assays considered differ not only in the technologies used and their implementations, but also in the nature of the validation studies. An important distinction for all expression-based tests is that between the signature and the licensed test, as offered to a patient. Data about the actual tests offered to the patients are available only for MammaPrint and Oncotype DX, albeit more limited for the former. There is only one published study that used the two-gene index as it is implemented in its marketed version, the BCP assay,61 although it is not clear whether the lab performing the assay in this report was the same as the one with current rights to perform the test. The remaining reports considered the signature, with the expression of the two genes measured and combined in varying ways.6163,72

Recent publications have begun to address the analytic validity of the tests. There is now evidence about several aspects of gene expression measurements for two of the tests (MammaPrint and Oncotype DX.44,45,57,58 The public release of these data is useful as it supports the rationale behind two of the currently available assays and encourages development and publication of similar information for future assays. However, evidence about analytic features of the assays does not obviate the need for continuous monitoring of the experimental procedures involved with such testing. In this regard it is worth mentioning that the U.S. FDA Office of In Vitro Diagnostic evaluation and Safety (OIVD) is developing a Guidance Document on In Vitro Diagnostic Multivariate Index Assays (IVDMIAs) that will affect the development of future assays in the U.S. Moreover, the laboratories offering such assays, as any other laboratory providing diagnostic services must adhere to the Clinical Laboratory Improvement Act (CLIA).

Below follows the discussion of the specific tests and key questions considered in the present report, along with recommendations and conclusions.

Oncotype DX

Oncotype DX, the basis for the “recurrence score,” was first developed, then applied and used as an assay in investigational settings. All evidence about the RS (apart from the comparison study by Fan and colleagues79 and the development studies44)48 were obtained using the same assay that is offered to patients, with sample processing done in the same manner by the same laboratory.

Analytic Validity

Analytic validity evidence now exists for some of the operational/laboratory characteristics/procedures of this test, as well as about its reproducibility, although information about this latter point is limited to a few repeated analyses. These studies demonstrated that the reproducibility of the test across different samples of the same block, and samples from different blocks, is reasonably high.45 The test involves not only the simple assessment of the RNA levels by RT-PCR, but also the preparation of the RNA, following a central review of the specimens shipped to Genomic Health to check for tumor content. No direct evidence is available about the sample preparation aspect of the test, although there is indirect evidence from peer-reviewed literature in the form of the overall success rate of extracting analyzable mRNA, which appears to be fairly high. Centralization is a current strength of Oncotype DX with regard to reproducibility, but additional scrutiny may be needed if other laboratories offer such testing in the future.

Clinical Validity

“Clinical validity” is defined here as the ability of a prediction test to accurately predict risk. Whether or not those risk predictions differ enough to justify its use in a clinical setting (i.e., whether discrimination is sufficient) is a second issue. The clinical validity of Oncotype DX has been evaluated in various settings. The first validation study28 used tamoxifen-treated women with ER positive, lymph node negative breast cancer, from the randomized clinical trial NSABP B-14. This study independently validated the prognostic value of the RS, which had been previously tested in the tamoxifen-treated population of NSABP B-20. Perhaps the most important aspect of this population is that it was clinically and prognostically well defined, in that everyone was presumed to be eligible for chemotherapy, and all subjects had similar treatment (i.e., tamoxifen), making for a relatively clear interpretation of the results in terms of both treatment biology and clinical decisionmaking. Predictors of response on a specified therapy are not necessarily prognostic factors independent of that therapy, so studies which mix treated and untreated patients, or patients differently treated, can produce results that do not apply well to either. While this study took place in the past, all measurements were done concurrently independently of the outcome, and so has evidential value quite close to that of a concurrent prospective study. The main issues raised by the non-concurrency are whether the 668 subjects examined were a representative sample of the more than 2000 in the original study, and the degree to which the findings in tamoxifen-treated women will apply to aromatase inhibitor treated patients today, the role of HER-2 testing and treatment, and whether there was anything clinically relevant about how the early stage cancer was diagnosed (e.g., clinically or by mammogram) that might differ today.

While this study reported hazard ratios for the RS in the presence of clinical predictors, it did not provide predictions by the RS cross-classified by those of standardized combination risk predictors to see exactly how many women would be re-classified in risk strata that might change decisions regarding chemotherapy. This information was however presented in poster form in 200465 and it showed that the RS had considerable predictive power beyond that of the St. Gallen or NCCN risk stratification guidelines (n.b. St. Gallen did not include HER-2 at that time). Another poster66 showed the same information for Adjuvant! Online (Tables 13 and 14, Chapter 3). All of these cross-classifications showed that the greatest contribution of the test was likely to be in the reclassification of patients from high to low risk, i.e., in reducing the number of patients who might unnecessarily undergo chemotherapy. It also showed that optimal predictions probably would be achieved with a combination of both expression and clinical predictors. It must be stressed that the cross-classified risk of patients in the low-risk RS groups in these posters does not represent the lowest attainable risk; they are an artifact of the “low risk” category threshold. Patients who had low absolute RS scores would be predicted to have lower risks than the “low risk” category average, and those lower risks are probably low enough for many women classified as high risk by other indices to change their treatment decisions. One very important remaining question is the degree to which the absolute observed risks in this population, particularly in these lowest risk groups, are similar to other populations, and to those whom it is currently being used, i.e. whether the calibration of the test predictions will vary. The low risk arm of the TAILORx trial, in which patients with a RS less than10 will not be treated with adjuvant chemotherapy, will help address this question.

A second large study looked at the clinical performance of the Oncotype DX assay to predict breast cancer death (at 10 years) in a community-based population of ER positive, node-negative patients treated with tamoxifen, confirming the B14 results among the tamoxifen treated patients, and showing predictive value, albeit lesser, in ER positive patients not treated with tamoxifen.50 The Esteva study,48 which showed no predictive value of RS in a small population of patients who received neither tamoxifen nor chemotherapy, showed such anomalous results with standard predictors (i.e. higher grade predicting better prognosis) that its results cannot be regarded as reliable. Finally, the Fan study,79 while testing the RS signature measured by microarrays and not the actual Oncotype DX test based on RT-PCR, showed good discriminatory power in a relatively large, independent dataset, albeit with a heterogeneous mix of treatments, receptor and nodal status. This is the same dataset on which the MammaPrint signature was developed.

These studies in combination provide fairly strong support for the clinical validity of the Oncotype DX test over and above standard predictors, in a well defined population (ER positive, lymph node negative, tamoxifen treated) with clear treatment indication (adjuvant chemotherapy). Exactly how much it adds, however, exactly what proportion of these patients would benefit from its use, and the stability of the observed risk in the various risk categories in other (or current) populations, is not as clear. Discussion will continue below about its use in a clinical setting.

Clinical Utility

Clinical utility is the degree to which a test is predictive of treatment benefit, and hence is a critical foundation for the use of a test in clinical decisionmaking. Prognostic ability itself speaks to this to some degree, as it puts a ceiling on the degree of clinical benefit. For example, if the 10 year distant relapse rate is 5 percent, by definition additional treatment cannot provide more than a 5 percent absolute benefit, and background knowledge about treatment efficacy tells us it will be less. So if the risk of distant recurrence can be reliably established as low enough, this has clinical utility in itself.

However, it is of considerable value to have a direct estimate of the degree of treatment benefit. This can only be done reliably in the context of RCTs, prospectively or retrospectively, as they assess treatment effect in an unbiased manner. This was addressed by Paik et al. in their study of the correlation of the RS with the degree of adjuvant chemotherapy benefit in the context of the NSABP-20 trial.53 This showed that the chemotherapy benefit in ER positive, node negative patients randomized to tamoxifen versus tamoxifen plus chemotherapy was almost entirely restricted to those in the high risk RS category. The CIs in the low and intermediate risk categories were wide and included the possibility of benefit whereas the CI for the high risk group was narrow and showed clear benefit. A statistical interaction was also found with patient age, although those data were not reported. The only caveat is that the tamoxifen arm of this population was part of the training set for the assay, although the outcome measure used in the training set was not treatment benefit. It is not clear whether the information in clinical predictors was optimally used (i.e., as continuous rather than dichotomous variables), but that is unlikely to have accounted for the degree of differential effect predicted by the RS. HER-2 positivity reportedly had no effect on the results. Several other studies evaluated the value of the RS information in different populations of patients to predict other correlates of treatment effect. For example, evaluation of pathologic response after preoperative chemotherapy49,55 supports clinical utility, although that was evaluated in patients in whom chemotherapy was already determined to be necessary.

The NSABP-20 study probably represents as strong evidence as can be derived from already existing data regarding the clinical utility of the Oncotype DX test. While prospective confirmation of these findings are definitely needed as well as analysis of existing patient samples from other completed trials, this provides reasonable justification in the interim for the use of the test by women in this specific population.

Use in clinical decisionmaking. One published study has reported the impact of using the RS on clinical management,56 and there have been examinations of the economic implications of testing.75 In general, studies showing that physicians change recommendations or that woman change treatment decisions in response to their Oncotype DX risk category are minimally informative if the study is not designed to specifically explore the woman's risk thresholds for making that decision. The reported study does not specify what information was conveyed to the patients, i.e., a risk score, the risk category, or the risk itself. If the latter, the number they were told is important to know. In the absence of this information, it is not possible to know the threshold of risk below which most women (or any given proportion) would forego chemotherapy, or conversely, the risks at which they would choose it. In the absence of such information, it cannot be known whether the study is effectively examining compliance with physician recommendations, careful weighing of risks and benefits, or the effect of test marketing.

There are still uncertainties about the optimal use of this test in practice. First, while the cut-offs are valuable for test validation purposes, it is not clear whether the current thresholds actually correspond to the cutoffs that would be derived using a formal decision-analytic approach based on utility assessments. For an individual woman, a risk based on her exact RS value would be preferable, since by definition, those with RS scores near the upper boundary of the “low risk” range have a predicted risk higher than the average of the group, and those with low scores have lower risk. The fact that the boundaries used in the studies may not be optimal for decisionmaking is seen in the different cut-offs used by the TAILORx trial, in which the low-risk group is defined as RS less-than or equal-to10 instead of 18, and the high risk group is defined as greater than 25 instead of 30.

The second uncertainty is the optimal use of conventional predictors. While the RS has been shown to have more value than most predictors, the same studies show that clinical predictors retain predictive value, and clinical prediction models continue to evolve and improve. An improved prediction tool would involve a combination of the expression-based and clinical predictors, but this has not been systematically explored in any study, and absolute risks produced by regression models or stratified tables with all predictors included are generally not reported. As noted previously, cross-classification data using the most updated standardized clinical indices would be one form of such data, although those do not show the risk from combinations of the exact RS and clinical predictor score.

Cost-effectiveness. While our review highlights many gaps in what is known about the clinical utility of using gene expression profiling in women diagnosed with breast cancer, the review also revealed that little is known about the cost-effectiveness of using these tests. Once studies have demonstrated the clinical utility of these gene expression profiles, policy makers and health care providers will need information about the cost-effectiveness of those tests that have proven utility. Such information will be particularly important given the relatively high expected costs of the tests. Oncotype DX, for example, costs more than $3000 for each use of the test.

In our review, we found three published studies that have addressed economic outcomes associated with use of the breast cancer gene expression tests. One study reported that using the 21-gene RT-PCR assay to reclassify patients would be cost-effective for those who were defined by 2005 NCCN criteria as low risk ($31,452 per quality-adjusted life-year (QALY) gained) and would be cost-saving for those who were defined by NCCN criteria as high risk.67 The EPC team had only moderate confidence in these projections because the study did not provide enough information about potential sources of bias in the analysis, allied with the fact that the study was supported by the manufacturer, which may introduce conflict of interest. The 2007 NCCN guideline now indicates that use of chemotherapy in these patients is optional, further diminishing the value of these projections.

The second study reported that use of the 21-gene RT-PCR assay was associated with a cost-utility ratio of $4432 per QALY compared with use of tamoxifen alone, and a gain of 1.71 QALYs with net cost savings when compared with chemotherapy plus tamoxifen.75 The EPC team had little confidence in this analysis, which was supported by the manufacturer, because it did not meet many of the standards that were used for appraising the quality of the analysis.

The third study compared the cost-effectiveness of the Netherlands Cancer Institute gene expression profiling (GEP) assay (MammaPrint) to the U.S. National Institutes of Health (NIH) guidelines for identification of early breast cancer patients who would benefit from adjuvant chemotherapy. The GEP assay was projected to yield a poorer quality-adjusted survival than the NIH guidelines (9.68 vs. 10.08 QALYs) and lower total costs ($29,754 vs. $32,636). To improve quality-adjusted survival, the GEP assay would need to have a sensitivity of at least 95 percent for detecting high risk patients while also having a specificity of at least 51 percent. The EPC team had confidence in the results of this analysis because it met most of the standards for appraising the quality of an economic analysis.

Since the overall body of evidence is inconclusive about the economic outcomes associated with use of breast cancer gene expression tests, this is an area that will require further investigation. Future economic analyses of validated tests should take into consideration existing guidelines for the performance and reporting of such analyses.41 Ideally, the analyses should be performed by investigators who have not received financial support from manufacturers of the tests.

Questions Regarding the Clinical Validity and Utility of the Oncotype DX Assay

  • 1

    Better information is needed about the predictions from combining the RS with current versions of standardized risk predictors, both in the form of cross-classification tables, and perhaps of regression-based combinations that optimize individual risk predictions. Formal development of cutoffs to optimize patient utility are also needed.

  • 2

    While Oncotype DX exhibits a fair bit of risk discrimination (i.e., separating patients into different risk groups), the stability across different populations of the observed absolute risk in patients with a given risk score (i.e., calibration) needs further study. Of greatest interest is the observed risk in the lowest risk groups, since the absolute level of this risk is critical for informed decisionmaking, and patients may forego chemotherapy on the basis of this information.

  • 3

    Data are currently available mainly for tamoxifen-treated patients and for those treated with cyclophosphamide-methotrexate-5-fluorocil chemotherapy. It is important to assess whether RS applies to other hormonal treatments such as aromatase inhibitors, as well as more contemporary chemotherapy regimens using taxanes and anthracyclines.

  • 4

    It is not clear whether RS can be used to help guide treatment of HER-2 positive patients and additional studies are needed, as most of these patients were classified in the high RS group in the initial trials.

  • 5

    While awaiting the TAILORx results, the findings of the Paik 200653 study predicting treatment benefit need independent confirmation, particularly for low and intermediate risk groups.

  • 6

    Studies examining the use of Oncotype DX should provide women and physicians with quantitative risk information and report how this alters clinical decisionmaking. The manner in which this risk information is presented should also be studied.

MammaPrint

Published evidence includes both reports about MammaPrint,5759 as well as studies about the associated 70-gene signature. The manuscripts that used the signature provide useful information about the validity of the biological correlations underlying the profile and suggest that it can be used in a clinical setting, but cannot be considered to be a direct validation of the assay.

The assay is based on the gene signature first proposed in 200221 by investigators at the Netherlands Cancer Institute, using 789 lymph node negative patients, younger than 55 years old, who did not carry a breast cancer gene (BRCA) mutation, and whose tumors were less than 5 cm in diameter. This signature was validated in a second study by the same group, using a series of 295 consecutive stage I or II breast cancer patients, who were either lymph node negative or positive, and who were younger than 53 years.25 This validation was only partial, since the investigators included 61 of the 78 patients used to develop the prognosis profile. The MammaPrint test itself was further validated in a multicenter European study of 302 patients not treated with chemotherapy or tamoxifen, showing that it provided prognostic information beyond that of standard clinical-pathologic indices for those patients.59 Recently, this signature was implemented as a commercial assay, and RNA available from the original cohorts were reanalyzed, yielding consistent results.58 It is the first prognostic test submitted to the FDA under its new, non-binding IVDMIA guidance, and received approval in February 2007.

Analytic Validity

This assay is the first microarray-based test introduced in the field. Two recent papers addressed issues related to the reproducibility of the test within laboratories, as well as across laboratories. Such evidence however was obtained from a limited number of patients and using a moderate number of replication experiments. Results showed a good reproducibility within a laboratory, and a good degree of agreement across laboratories, although RNA labeling emerged as a possible source of variation capable of affecting the results. Whether this issue has an impact on risk classification was not thoroughly investigated, and thus the portability of the result of the assay from one laboratory to another still remains open. A second relevant point is the fact that the only validation study using the MammaPrint assay showed that only about 80 percent of specimens from the field (in this case 5 different European institutions) were analyzable, raising some concern about the analysis of fresh-frozen specimens. As more patients are analyzed by this test, the overall success rate may change. Finally, it must be noted that although this technology requires fresh rather than paraffin embedded specimens, Agendia performs a central pathologic review of the specimens as is performed with FFPE samples at Genomic Health, before evaluation with the test.

Clinical Validity

Overall, published evidence supports MammaPrint as a better predictor of the 5-year risk of distant recurrence than traditionally used tumor characteristics or algorithms. However, the cohorts in whom it was developed and validated are more clinically heterogeneous than those used for the Oncotype DX test, with a mix of lymph node status, ER status and current treatment. Additionally, evidence was derived only from patients younger than 55 to 60 years of age. Even so, it is interesting that it had 80 percent concordance with the array-based RS classification when applied to the same patients, although it remains to be seen how well it predicts in cohorts with the same degree of clinical and treatment homogeneity as used in the Oncotype DX development, and which differ from its training set. Evidence about its value in comparison with clinical predictors was assessed in a collaborative study among 5 different institutions in Europe, where data were compared to standard clinical predictors like Adjuvant!.59 The area under the receiver operating characteristic curves was 0.68 for MammaPrint and 0.66 for the Adjuvant! Score. Such estimates indicate that both methods have apparently similar and modest discriminatory power in absolute terms. Similar results were obtained also using the ten year overall survival end point. However, when Adjuvant! and MammaPrint were cross-classified against each other, Adjuvant! had no additional predictive value. Adjustment for other predictors (St. Gallen and the Nottingham Prognostic Index) had a minimal effect on the regression coefficient of MammaPrint score or its significance, but no other data were reported on their incremental value. Of note, no significant heterogeneity in the hazard ratio estimates was shown among centers, although original hazard ratio estimates were significantly higher than those obtained in this validation study. The validation cohort had longer time of observation, included older women, and excluded patients who received adjuvant therapy.

Clinical Utility

No studies evaluated clinical utility of this test.

Use in clinical practice. No studies explored the use of this test in clinical practice.

In summary, MammaPrint is the first commercialized microarray-based gene expression profile with a prognostic purpose. The underlying signature has been evaluated in approximately 700 patients, although MammaPrint itself has only been evaluated in one study of 307 untreated patients. A reanalysis of the original training data of the signature using the marketed test showed a net reclassification of only one patient of 78.58 It is unclear what population of patients would derive benefit from use of the test, and what the magnitude of that benefit would be. Prospective data from trials like MINDACT will be extremely valuable. Overall, published evidence supports MammaPrint as a better predictor of the risk of distant recurrence than traditionally used tumor characteristics or algorithms, but its performance in therapeutically homogeneous populations is not yet known with precision, and it is unclear for how many women the lowest predicted risks are low enough to forgo chemotherapy. No evidence is available to permit conclusions regarding the clinical utility of MammaPrint to select women who will benefit from chemotherapy.

To conclude, the literature on the 70-gene signature includes numerous studies that focused more on its biological underpinning and less on the clinical implications of this gene expression profile, although it has now received FDA approval for clinical use. It has been shown that this signature is maintained along the cancer progression process.82 This profile was directly investigated by two different platforms (microarray and RT-PCR,48), and was successfully re-implemented in two distinct microarray platforms, showing that it has a fair degree of analytic robustness.

Here we summarize open questions as well as research gaps found in the evidence about the clinical validity and utility of the MammaPrint assay and the 70-gene signature.

  • 1

    The prognostic value of the 70-gene signature has been assessed in different populations facing different therapeutic choices. In the analysis by van de Vijver and colleagues, 130 of the 295 patients received adjuvant therapy in a non-randomized fashion. Patients in the original development cohort were not treated, and Buyse validated the marketed assay in untreated patients. It is not yet clear which are the optimal patient populations for the use of this test, exactly what its performance is in those populations, and how many of its predictions would result in different therapeutic decisions. Larger independent validation studies in therapeutically homogeneous groups would be very valuable.

  • 2

    Previous comments noted in the Oncotype DX summary apply here as well, including the presentation of data regarding the test in combination with standard predictors, the use of risk categories instead of a continuous risk measure, and the importance of confirming the stability of the test's calibration in different populations.

H/I Ratio Signature and Breast Cancer Profiling (BCP)

This test, licensed by AviaraDX to Quest Diagnostic, Inc. is based on the two-gene ratio signature originally proposed by Ma and colleagues.64 Specifically, the assay is based on the two-gene index that includes normalization to specific reference genes followed by a mathematical transformation.61 Overall, large collections of patients have been investigated using the signature, but its prognostic and predictive value has been inconsistent; strong in some studies, weak or absent in others. In the Fan study, in which the ratio based on the signature (not the marketed BCP test), it was completely non-predictive where both Oncotype DX and the MammaPrint signatures were. The reason for that may have been a technical failure of the array technology used to simulate the test,98 or the test's value may be restricted to certain populations. The populations in which it has been developed have been heterogeneous, although stratified analyses were used. Differences have been found in its ability to predict in various subgroups of those populations, differences that are not consistent across studies. A major limitation of the evidence is that the signature has been formulated in a variety of ways, as a simple ratio, as an index, by normalizing to a different set of reference genes, or to a standard calibration RNA. In the 2006 study in which the index as is currently marketed was tested,61 statistical methods to find optimal cutoffs were applied, meaning that this assay still requires further external validation. We found no analytic validity data for the BCP assay.

In summary, while this test shows some promise, it must be regarded as being in a developmental phase. It was not clear in the Ma 200661 study whether samples were processed by Quest Diagnostics, Inc. which holds the current license. There are a number of intriguing biological insights and plausible mechanisms to support the rationale for the test, but its consistent value in well-defined clinical settings has not yet been firmly established.

General Comments on Analytic Validity and Laboratory Quality Control

Until recently there were no multi-gene RNA-expression-based assay kits approved by the FDA for use in breast cancer. Such tests are currently offered as laboratory services (“home-brew test”) subject to CLIA general laboratory standards. In February 2007, and again in July 2007, the FDA published draft guidelines on regulation of IVDMIAs, which cover tests combining complex algorithms and data from multiple laboratory tests. The release of these draft guidelines suggests that in the future these tests will be subject to FDA evaluation. Under this model, all the assays to be used to make medical decisions about therapeutic options will be regarded as Class II or III devices and will go through a Pre-Market Approval (PMA) process, and will require specific post-market revision. Based on such draft guidelines, MammaPrint receive IVDMIA approval upon their voluntary submission of data.38,39

Nevertheless, analytic validity is an issue related to quality control in the laboratories where the test is carried out, and these data are not in the literature, but in the laboratories' log books. An effort has been made by Genomic Health, Inc. and Agendia to clarify the laboratory procedures and acknowledge critical issues, but periodic review and reporting of the procedures needs to be established to monitor the reproducibility of the procedures, success rates, and quality control indices.

A critical and often underappreciated analytic issue for the success of these tests is the way specimens are handled. Unlike DNA, RNA is unstable, so the length of time from excision to freezing or fixation, prolonged storage, and other factors related to specimen processing can lead to significant variability in the quality of mRNA available for expression profiling. Even if central labs offering the test are certified and use reliable procedures, preanalytic issues at the sending sites such as specimen acquisition and handling can potentially affect the results of the testing. Both the Oncotype DX and BCP use standard formalin fixed specimens, which tends to be stable, whereas MammaPrint requires fresh tissue. The use of fresh tissue required for gene array testing is challenging and, according to on-line information available from the Agendia website, careful procedures must be used when sampling the tumor to avoid necrotic parts and stromal tissue. Samples are reviewed centrally at Genomic Health and Agendia for tumor content, and BCP is performed after laser capture microdissection. Regardless of the technology used, standardized protocols, use of new reagents specifically designed to preserve mRNA for gene expression profiling, and reduction in RNA degradation (during sample processing, storage, and preparation) are important to assuring reliable measurements of mRNA levels for use in gene expression profiling.

Overall Implications and Recommendations

The discussion above covers issues specific to the tests under examination, but there are some larger issues whose consideration is motivated by this analysis that groups involved in assessing the value of these tests should be aware of.

Assay Validation

In general, it is clear that validation studies need to deal with populations for whom the decision-making implications of various risk groupings are clear. The studies examined herein have established the proof-of-concept that tumor gene expression has prognostic value, but for all tests except Oncotype DX, both validation and development studies have been on mixed populations, without sufficient sample sizes to stratify into large enough homogeneous groups to guide clinical decisionmaking. In addition, validation samples are often re-used by other investigators; the pool of such samples in the public domain needs to be greatly expanded.

Potential for Scale Problems

One problem that may be faced in the future is that of the consequences of an increase in demand for these tests. Scaling up the production could represent a challenge for the reproducibility and reliability of the tests in any setting, especially if more than one laboratory will offer the assays, since procedures to warrant inter-lab reproducibility will be needed. Not only analytical aspects will need monitoring, but also procedures involving specimen evaluation prior to testing. With a larger number of tests, for instance, the ability to reliably perform the central pathologic review might become an issue, while in the case of MammaPrint the availability of the current reference RNA could potentially become a limiting factor.

Genetic Variability and Gene Expression

It is unknown whether gene expression profiles are more or less likely than more traditional biomarkers to be generalizable beyond the populations in which they were initially developed. Gene expression may reflect fundamental biological tumor features, and thus be relatively stable across ethnic groups. However, gene expression patterns have also been associated with specific genetic mutations (i.e., BRCA1), indicating that specific DNA mutations or polymorphisms21,99 may affect the performance of a signature. This speaks to the importance of validating these tests in populations with varying genetic background. Biological and genetic evidence potentially addressing these issues is expected to become available in the form of single nucleotide polymorphism (SNP) arrays coupled to expression arrays.

The Need for Databases, Reproducibility, and Standards

MammaPrint® is the first assay based on microarrays that has completed the path from the bench to FDA approval for clinical application. For data storage, the MIAME standards32 represent the basis for the proper collection and storage of microarray data, and should be used to develop procedures going forward for the archiving of the tests performed in real patients, much as databases have been developed to facilitate outcomes research to complement clinical trials. Consideration should be given to the development of databases with complete data on each patient (absent identifiers), including all the analyses performed, laboratory logs, the raw and processed data, and all the information about procedures and analyses that have been performed to produce a risk estimate from a tumor sample. These apply equally to the other two assays, differing only in the type of data that would be stored.

Where Is the Field Going?

The current evidence for the feasibility of such gene expression based tests in clinical settings, along with the demand for better tools to manage patients, is leading to both an evolution of the available tests, and the addition of novel alternative tests. The number of publications is growing, and several alternative signatures not considered here have already been proposed for breast cancer as well as for other neoplasms. We can expect many new tests, as well as new uses for the assays that already exist. More genes might be added to the signatures, and in the particular case of MammaPrint this will be possible without changing the experimental procedures, since the array contains thousands more genes than the ones that are incorporated in the 70-gene signature. In this regard, we might also expect other modifications: subsets of the current signatures might be proposed as alternatives to current clinical risk factors, or be proposed in different populations or for different purposes. For Oncotype DX, a natural evolution could be related to its use as an alternative to immunohistochemistry and/or pathology to evaluate tumor Grade, S-phase index, ER, PR, and HER-2 expression, since such genes are part of the set included in the assay. Reporting of individual gene expression results may also prove useful. A great deal more work needs to be done on the prediction of therapeutic benefit, which is the ultimate goal of all such tests.

“Comparative Effectiveness” Studies

The emphasis in virtually all of the papers and in our evidence assessment is on the establishment of the value of each of these predictors over standard clinical predictors. However, as gene expression tests mature and proliferate, an important question will be how they compare to each other, and whether there is value in their combination. In the therapeutic domain, this has been called “comparative effectiveness” research. Such research has traditionally been difficult to fund by government or by industry, because it may not hold out as much therapeutic promise as new discoveries, and because industry understandably is not anxious to fund head-to-head comparisons with competitive products. This same dynamic could easily take hold in the risk prediction arena, with a proliferation of licensed prediction indices without any clear notion of what new ones are contributing over previous tests. Development of future expression-based predictors should make clear their incremental value over pre-existing methods. In the absence of better oversight of test development, physicians and patient are likely to be awash in new tests that all claim to offer similar guidance, or perhaps new guidance in previously neglected clinical subsets, with no way to sort out those claims.

Conclusion

The introduction of these gene-expression tests have ushered in a new era in which many conventional clinical markers and predictors may be seen merely as surrogates for more fundamental genetic and physiologic processes. The multidimensional nature of these predictors demands both large numbers of clinically homogeneous patients to the used in the validation process, and exceptional rigor and discipline. Every study provides an opportunity to tweak a genetic signature, but we must find the right balance between speed of innovation and development of scientifically and clinically reliable tools. Going forward, it will be important to harness, if possible, as much genetic and clinical information on patients who undergo these tests to facilitate each goal without unduly sacrificing the other.

References and Included Studies
1.
Jemal A, Siegel R, Ward E. et al. Cancer statistics, 2007. CA Cancer J Clin. 2007; 57(1): 4366. [PubMed]
2.
Berry DA, Cirrincione C, Henderson IC. et al. Estrogen-receptor status and outcomes of modern chemotherapy for patients with node-positive breast cancer. JAMA. 2006; 295(14): 165867. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
3.
Eifel P, Axelson JA, Costa J. et al. National Institutes of Health Consensus Development Conference Statement: adjuvant therapy for breast cancer, November 1–3, 2000. J Natl Cancer Inst. 2001; 93(13): 97989. [PubMed]
4.
National Institutes of Health (NIH) Consensus Development Criteria web site. Available at: http://consensus.nih.gov/2000/2000AdjuvantTherapyBreastCancer114html.htm. Accessed July 25, 2007.
5.
Goldhirsch A, Glick JH, Gelber RD. et al. Meeting Highlights: International Expert Consensus on the Primary Therapy of Early Breast Cancer 2005. Breast. 2005; 14(6): 643.
6.
Carlson RW, Anderson BO, Burstein HJ. et al. Invasive breast cancer. J Natl Compr Canc Netw. 2007; 5(3): 246312. [PubMed]
7.
Ravdin PM, Siminoff LA, Davis GJ. et al. Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. J Clin Oncol. 2001; 19(4): 98091. [PubMed]
8.
Adjuvant!, Inc. Adjuvant! Online. Available at: http://www.adjuvantonline.com. Accessed July 25, 2007.
9.
Wolff AC, Hammond ME, Schwartz JN. et al. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. J Clin Oncol. 2007; 25(1): 11845. [PubMed]
10.
Braxton S, Bedilion T. The integration of microarray information in the drug development process. Curr Opin Biotechnol. 1998; 9(6): 6439. [PubMed]
11.
Mirnics K. Microarrays in brain research: the good, the bad and the ugly. Nat Rev Neurosci. 2001; 2(6): 4447. [PubMed]
12.
Mirnics K, Middleton FA, Lewis DA. et al. Analysis of complex brain disorders with gene expression microarrays: schizophrenia as a disease of the synapse. Trends Neurosci. 2001; 24(8): 47986. [PubMed]
13.
Schulze A, Downward J. Navigating gene expression using microarrays—a technology review. Nat Cell Biol. 2001; 3(8): E1905. [PubMed]
14.
van Berkum NL, Holstege FC. DNA microarrays: raising the profile. Curr Opin Biotechnol. 2001; 12(1): 4852. [PubMed]
15.
DeRisi J, Penland L, Brown PO. et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet. 1996; 14(4): 45760. [PubMed]
16.
Alizadeh A, Eisen M, Davis RE. et al. The lymphochip: a specialized cDNA microarray for the genomic-scale analysis of gene expression in normal and malignant lymphocytes. Cold Spring Harb Symp Quant Biol. 1999; 64: 718. [PubMed]
17.
Alizadeh AA, Ross DT, Perou CM. et al. Towards a novel classification of human malignancies based on gene expression patterns. J Pathol. 2001; 195(1): 4152. [PubMed]
18.
Rew DA. DNA microarray technology in cancer research. Eur J Surg Oncol. 2001; 27(5 ): 5048. [PubMed]
19.
Hu Z, Fan C, Oh DS. et al. The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006; 7: 96. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
20.
Sorlie T, Perou CM, Tibshirani R. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A. 2001; 98(19): 1086974. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
21.
van 't Veer LJ, Dai H, van de Vijver MJ. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871): 5306. [PubMed]
22.
Chang HY, Nuyten DS, Sneddon JB. et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci U S A. 2005; 102(10): 373843. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
23.
Sotiriou C, Wirapati P, Loi S. et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006; 98(4): 26272. [PubMed]
24.
Huang E, Cheng SH, Dressman H. et al. Gene expression predictors of breast cancer outcomes. Lancet. 2003; 361(9369): 15906. [PubMed]
25.
van de Vijver MJ, He YD, van't Veer LJ. et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002; 347(25): 19992009. [PubMed]
26.
Bustin SA. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J Mol Endocrinol. 2000; 25(2): 16993. [PubMed]
27.
Schena M, Shalon D, Davis RW. et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995; 270(5235): 46770. [PubMed]
28.
Paik S, Shak S, Tang G. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004; 351(27): 281726. [PubMed]
29.
Baldwin D, Crane V, Rice D. A comparison of gel-based, nylon filter and microarray techniques to detect differential RNA expression in plants. Curr Opin Plant Biol. 1999; 2(2): 96103. [PubMed]
30.
Watson A, Mazumder A, Stewart M. et al. Technology for microarray analysis of gene expression. Curr Opin Biotechnol. 1998; 9(6): 60914. [PubMed]
31.
Schena M, Heller RA, Theriault TP. et al. Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. 1998; 16(7): 3016. [PubMed]
32.
Brazma A, Hingamp P, Quackenbush J. et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001; 29(4): 36571. [PubMed]
33.
Bammler T, Beyer RP, Bhattacharya S. et al. Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods. 2005; 2(5): 3516. [PubMed]
34.
Irizarry RA, Warren D, Spencer F. et al. Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005; 2(5): 34550. [PubMed]
35.
National Cancer Institute. The TAILORx Breast Cancer Trial. Available at: http://www.cancer.gov/clinicaltrials/digestpage/TAILORx. Accessed August 14, 2007.
36.
TransBIG. MINDACT. Available at: http://www.breastinternationalgroup.org/TransBIG/Mindact.aspx. Accessed August 15, 2007.
37.
Food and Drug Administration, Center for Devices and Radiological Health. Available at: http://www.fda.gov/cdrh/. Accessed July 25, 2007.
38.
Food and Drug Administration, Center for Devices and Radiological Health. 510(k) Substantial Equivalence Determination Decision Summary, No. k062694. Available at: http://www.fda.gov/cdrh/reviews/K062694.pdf. Accessed July 25, 2007.
39.
Food and Drug Administration, Center for Devices and Radiological Health. 510(k) Submission for MammaPring Service in the U.S. Summary, No. k070675. Available at: http://www.fda.gov/cdrh/pdf7/K070675.pdf. Accessed July 25, 2007.
40.
Berlin JA. Does blinding of readers affect the results of meta-analyses? University of Pennsylvania Meta-analysis Blinding Study Group. Lancet. 1997; 350(9072): 1856. [PubMed]
41.
Philips Z, Ginnelly L, Sculpher M et al. Review of guidelines for good practice in decision-analytic modelling in health technology assessment. Health Technol Assess 2004;8(36):iii–iv, ix–xi, 1–158.
42.
Bossuyt PM, Reitsma JB, Bruns DE. et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Fam Pract. 2004; 21(1): 410. [PubMed]
43.
McShane LM, Altman DG, Sauerbrei W. et al. REporting recommendations for tumour MARKer prognostic studies (REMARK). Br J Cancer. 2005; 93(4): 38791. [PubMed]
44.
Cronin M, Pho M, Dutta D. et al. Measurement of gene expression in archival paraffin-embedded tissues: development and performance of a 92-gene reverse transcriptase-polymerase chain reaction assay. Am J Pathol. 2004; 164(1): 3542. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
45.
Cronin M, Sangli C, Liu ML. et al. Analytical Validation of the Oncotype DX Genomic Diagnostic Test for Recurrence Prognosis and Therapeutic Response Prediction in Node-Negative, Estrogen Receptor-Positive Breast Cancer. Clin Chem. 2007; 53(6): 108491. [PubMed]
46.
Braxton S, Bedilion T. The integration of microarray information in the drug development process. Curr Opin Biotechnol. 1998; 9(6): 6439. [PubMed]
47.
Cobleigh MA, Tabesh B, Bitterman P. et al. Tumor gene expression and prognosis in breast cancer patients with 10 or more positive lymph nodes. Clin Cancer Res. 2005; 11(24 Pt 1): 862331. [PubMed]
48.
Esteva FJ, Sahin AA, Cristofanilli M. et al. Prognostic role of a multigene reverse transcriptase-PCR assay in patients with node-negative breast cancer not receiving adjuvant systemic therapy. Clin Cancer Res. 2005; 11(9): 33159. [PubMed]
49.
Gianni L, Zambetti M, Clark K. et al. Gene expression profiles in paraffin-embedded core biopsy tissue predict response to chemotherapy in women with locally advanced breast cancer. J Clin Oncol. 2005; 23(29): 726577. [PubMed]
50.
Habel LA, Shak S, Jacobs MK. et al. A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients. Breast Cancer Res. 2006; 8(3): R25. [PubMed]
51.
Mina L, Soule SE, Badve S et al. Predicting response to primary chemotherapy: gene expression profiling of paraffin-embedded core biopsy tissue. Breast Cancer Res Treat 2006.
52.
Bertone P, Stolc V, Royce TE. et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004; 306(5705): 22426. [PubMed]
53.
Paik S, Tang G, Shak S. et al. Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol. 2006; 24(23): 372634. [PubMed]
54.
Sorlie T, Tibshirani R, Parker J. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A. 2003; 100(14): 841823. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
55.
Chang JC, Makris A, Gutierrez MC et al. Gene expression patterns in formalin-fixed, paraffin-embedded core biopsies predict docetaxel chemosensitivity in breast cancer patients. Breast Cancer Res Treat 2007.
56.
Oratz R and Dev P. Impact of Oncotype DXTM Recurrence Score on Decision Making in Early-Stage Breast Cancer. Journal of Oncology Practice in press.
57.
Ach RA, Floore A, Curry B. et al. Robust interlaboratory reproducibility of a gene expression signature measurement consistent with the needs of a new generation of diagnostic tools. BMC Genomics. 2007; 8(1): 148. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
58.
Glas AM, Floore A, Delahaye LJ. et al. Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics. 2006; 7: 278. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
59.
Buyse M, Loi S, van't Veer L. et al. Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst. 2006; 98(17): 118392. [PubMed]
60.
Hughes TR, Marton MJ, Jones AR. et al. Functional discovery via a compendium of expression profiles. Cell. 2000; 102(1): 10926. [PubMed]
61.
Ma XJ, Hilsenbeck SG, Wang W. et al. The HOXB13:IL17BR expression index is a prognostic factor in early-stage breast cancer. J Clin Oncol. 2006; 24(28): 46119. [PubMed]
62.
Goetz MP, Suman VJ, Ingle JN. et al. A two-gene expression ratio of homeobox 13 and interleukin-17B receptor for prediction of recurrence and survival in women receiving adjuvant tamoxifen. Clin Cancer Res. 2006; 12(7 Pt 1): 20807. [PubMed]
63.
Jerevall PL, Brommesson S, Strand C et al. Exploring the two-gene ratio in breast cancer-independent roles for HOXB13 and IL17BR in prediction of clinical outcome. Breast Cancer Res Treat 2007.
64.
Ma XJ, Wang Z, Ryan PD. et al. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell. 2004; 5(6): 607616. [PubMed]
65.
Paik S, Shak S, and Tang G. Risk classification of breast cancer patients by teh recurrence score assay: caomparison to guidelines based on patient age, tumor size, and tumor grade. Abstract presented at Annual San Antonio Breast Cancer Symposium; December 8–11, 2004. San Antonio, TX. Absrtract 104.
66.
Bryant,et al., 2005. Toward a More Rational Selection ofTailored Adjuvant Therapy. Poster Presentation. St. Gallens Conference, March 2005.
67.
Hornberger J, Cosler LE, Lyman GH. Economic analysis of targeting chemotherapy using a 21-gene RT-PCR assay in lymph-node-negative, estrogen-receptor-positive, early-stage breast cancer. Am J Manag Care. 2005; 11(5): 31324. [PubMed]
68.
Simon Z, Sipka S, Gergely L. et al. Investigation of monoclonal gammopathy of undetermined significance: a single-centre study. Clin Lab Haematol. 2006; 28(3): 1649. [PubMed]
69.
Reid JF, Lusa L, De Cecco L. et al. Limits of predictive models using microarray data for breast cancer clinical treatment outcome. J Natl Cancer Inst. 2005; 97(12): 92730. [PubMed]
70.
Sotiriou C, Neo SY, McShane LM. et al. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci U S A. 2003; 100(18): 103938. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
71.
Simon R. Development and validation of therapeutically relevant multi-gene biomarker classifiers. J Natl Cancer Inst. 2005; 97(12): 8667. [PubMed]
72.
Jansen MP, Sieuwerts AM, Look MP. et al. HOXB13-to-IL17BR expression ratio is related with tumor aggressiveness and response to tamoxifen of recurrent breast cancer: a retrospective study. J Clin Oncol. 2007; 25(6): 6628. [PubMed]
73.
Sieuwerts AM, Meijer-van Gelder ME, Timmermans M. et al. How ADAM-9 and ADAM-11 differentially from estrogen receptor predict response to tamoxifen treatment in patients with recurrent breast cancer: a retrospective study. Clin Cancer Res. 2005; 11(20): 731121. [PubMed]
74.
Carlson RW, Anderson BO, Burstein HJ. et al. Breast cancer. J Natl Compr Canc Netw. 2005; 3(3): 23889. [PubMed]
75.
Lyman GH, Cosler LE, Kuderer NM. et al. Impact of a 21-gene RT-PCR assay on treatment decisions in early-stage breast cancer: an economic analysis based on prognostic and predictive validation studies. Cancer. 2007; 109(6): 10118. [PubMed]
76.
Oestreicher N, Ramsey SD, Linden HM. et al. Gene expression profiling and breast cancer care: what are the potential benefits and policy implications? Genet Med. 2005; 7(6): 3809. [PubMed]
77.
Brauer CA, Rosen AB, Greenberg D, Neumann PJ. Trends in the measurement of health utilities in published cost-utility analyses. Value Health. 2006; 9(4): 2138. [PubMed]
78.
Torrance GW, Feeny D. Utilities and quality-adjusted life years. Int J Technol Assess Health Care. 1989; 5(4): 55975. [PubMed]
79.
Fan C, Oh DS, Wessels L. et al. Concordance among gene-expression-based predictors for breast cancer. N Engl J Med. 2006; 355(6): 5609. [PubMed]
80.
Espinosa E, Vara JA, Redondo A. et al. Breast cancer prognosis determined by gene expression profiling: a quantitative reverse transcriptase polymerase chain reaction study. J Clin Oncol. 2005; 23(29): 727885. [PubMed]
81.
Eden P, Ritz C, Rose C. et al. “Good Old” clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers. Eur J Cancer. 2004; 40(12): 183741. [PubMed]
82.
Weigelt B, Hu Z, He X. et al. Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. Cancer Res. 2005; 65(20): 91558. [PubMed]
83.
Nuyten DS, Kreike B, Hart AA. et al. Predicting a local recurrence after breast-conserving therapy by gene expression profiling. Breast Cancer Res. 2006; 8(5): R62. [PubMed]
84.
Chang HY, Sneddon JB, Alizadeh AA. et al. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2004; 2(2): E7. [PubMed]
85.
Chi JT, Wang Z, Nuyten DSA. et al. Gene expression programs in response to hypoxia: Cell type specificity and prognostic significance in human cancers. PLoS Med. 2006; 3(3): 395409.
86.
Naderi A, Teschendorff AE, Barbosa-Morais NL et al. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene 2006.
87.
Sun Y, Goodison S, Li J. et al. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007; 23(1): 307. [PubMed]
88.
Esteva FJ, Sahin AA, Rassidakis GZ. et al. Jun Activation Domain Binding Protein 1 Expression Is Associated with Low p27Kip1 Levels in Node-Negative Breast Cancer. Clin Cancer Res. 2003; 9(15): 56525659. [PubMed]
89.
Wang J, Buchholz TA, Middleton LP. et al. Assessment of histologic features and expression of biomarkers in predicting pathologic response to anthracycline-based neoadjuvant chemotherapy in patients with breast carcinoma. Cancer. 2002; 94(12): 310714. [PubMed]
90.
Harvey JM, Clark GM, Osborne CK. et al. Estrogen receptor status by immunohistochemistry is superior to the ligand-binding assay for predicting response to adjuvant endocrine therapy in breast cancer. J Clin Oncol. 1999; 17(5): 147481. [PubMed]
91.
Mohsin SK, Weiss H, Havighurst T. et al. Progesterone receptor by immunohistochemistry and clinical outcome in breast cancer: a validation study. Mod Pathol. 2004; 17(12): 154554. [PubMed]
92.
Allred DC, Harvey JM, Berardo M, Clark GM. Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod Pathol. 1998; 11(2): 15568. [PubMed]
93.
Allred DC, Harvey JM, Berardo M. et al. Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod Pathol. 1998; 11(2): 15568. [PubMed]
94.
Laupacis A, Sekar N, Stiell IG. Clinical prediction rules. A review and suggested modifications of methodological standards. JAMA. 1997; 277(6): 48894. [PubMed]
95.
Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996; 15(4): 36187. [PubMed]
96.
McShane LM, Altman DG, Sauerbrei W. et al. Reporting recommendations for tumor marker prognostic studies (REMARK). J Natl Cancer Inst. 2005; 97(16): 11804. [PubMed]
97.
Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007; 99(2): 14757. [PubMed]
98.
Goetz MP, Ingle JN, Couch FJ. Gene-expression-based predictors for breast cancer. N Engl J Med. 2007; 356(7): 752. author reply 752–3. [PubMed]
99.
Hedenfalk I, Duggan D, Chen Y. et al. Gene-expression profiles in hereditary breast cancer. N Engl J Med. 2001; 344(8): 53948. [PubMed]
100.
Auer H, Lyianarachchi S, Newsom D. et al. Chipping away at the chip bias: RNA degradation in microarray analysis. Nat Genet. 2003; 35(4): 2923. [PubMed]
101.
Baldwin D, Crane V, Rice D. A comparison of gel-based, nylon filter and microarray techniques to detect differential RNA expression in plants. Curr Opin Plant Biol. 1999; 2( 2): 96103. [PubMed]
102.
Klein JP. Small sample moments of the estimators of the variance of the Kaplan-Meier and Nelson-Aalen estimators. Scand J Stat. 1991; 18: 33340.

Appendix A: List of Acronyms

AcronymDefinition
AHRQAgency for Healthcare Research and Quality
ANOVAAnalysis of variance
AUCArea under the receiver-operating-characteristic curve
BCPBreast Cancer Profiling
BCTBreast conserving therapy
BIGBreast International Group
BRCABreast cancer gene
CCDCharge Coupled Devices
CDCCenters for Disease Control and Prevention
cDNAComplementary DNA
CENTRALThe Cochrane Central Register of Controlled Trials
CIConfidence Interval
CLIAClinical Laboratory Improvement Act
CMTPCenter for Medical Technology Policy
CRComplete response
CTCycle threshold
CVCoefficient of variation
DFSDisease free survival
DNADeoxyribonucleic acid
DRFSDistant recurrence-free survival
EGAPPEvaluation of Genomic Applications in Practice and Prevention
EPCThe Evidence-based Practice Center
EREstrogen receptor
FDAFood and Drug Administration
FFPEFormalyn-fixed paraffin-embedded
FISHFluorescent in situ hybridization
FRETFörster Resonance Energy Transfer
GEPGene expression profiling
HER2Human epidermal growth factor receptor 2
HRHormone Receptors
IHCImmunohistochemical
INTItalian National Cancer Institute of Milan, Italy
IVDMIAsIn Vitro Diagnostic Multivariate Index Assays
JHUJohns Hopkins University
LCMLaser-capture micro dissected
LMCLaser micro-dissection
LODLimit of detection
LOQLimit of quality
MeSHMedical subject heading
MIAMEMinimum Information About a Microarray Experiment
MINDACTMicroarray for Node-Negative Disease may Avoid Chemotherapy
mRNAMessenger ribonucleic acid
NCCNNational Comprehensive Cancer Network
NCCTGNorth Central Cancer Treatment Group
NCINational Cancer Institute
NHGNottingham Histologic Grade
NIHNational Institutes of Health
NPINottingham Prognostic Index
NSABPNational Surgical Adjuvant Breast and Bowel Project
OROdds Ratio
OSOverall survival
OVIDOffice of In Vitro Diagnostic evaluation and Safety
pCRComplete pathological response
PCRPolymerase chain reaction
PDProgressive disease
PFSProgression free survival
PMAPre-Market Approval
PRProgesterone receptor
PRPartial response
QALYQuality-adjusted life-year
RCTRandomized Controlled Trial
RECISTResponse Evaluation Criteria in Solid Tumors
REMARKReporting recommendations for tumour MARKer prognostic studies
RFSRelapse Free Survival
RNARibonucleic acid
ROCReceiver operating characteristic
RRRelative risk
RSRecurrence Score
RT-PCRreverse transcriptase polymerase chain reaction
SDStandard deviation
SNPSingle nucleotide polymorphisms
STARDStandards for Reporting of Diagnostic Accuracy
TAILORxTrial Assigning Individualized Options for Treatment
TBCIThe North American Breast Cancer Intergroup
TRANSBIGTranslating molecular knowledge into early breast cancer management: building on the BIG (Breast International Group) network for improved treatment tailoring
TTMTime to distant metastases
VEGFVascular epithelial growth factor

Appendix B: Glossary

Cycle Threshold (Ct, CT, Ct)

In an RT-PCR reaction template, the relative ratios of products and reagents vary. At the beginning of the process, reagents are in excess, and template and products are at low concentrations and do not compete with primer binding, so that the amplification proceeds at a constant, exponential rate. After this initial phase, the process enters a linear phase of amplification, due to competition of product renaturation with primer binding. In late reaction cycles, the amplification reaches a plateau phase and no more products accumulate. To achieve accuracy and precision, it is necessary to collect quantitative data during the exponential phase of amplification, since in this phase amplification is extremely reproducible. In RT-PCR, this process is automated and measurements are made at each cycle. The ‘cycle threshold’ is the cycle of the RT-PCR reaction corresponding to the beginning of the exponential phase of amplification.

DNA Microarray

A DNA microarray (also commonly referred to as “gene chip,” “DNA chip”) is a collection of microscopic DNA spots (defined “features”), commonly representing single genes or transcripts, arrayed on a solid surface by covalent attachment to chemically suitable matrices, or directly synthesized on them. DNA microarrays use DNA as part of their detection system. Qualitative or quantitative measurements with DNA microarrays use the selective nature of DNA-DNA or DNA-RNA hybridization under high-stringency conditions and fluorophore-based detection. DNA arrays are commonly used for gene expression profiling, i.e., monitoring expression levels of thousands of genes simultaneously, or for comparative genomic hybridization.

Gene Annotation

Gene annotation is the body of information that is associated with genes, as well as the process involved with the generation and maintenance of such information. Molecular biology and bioinformatics have faced the need for DNA annotation since the 1980s. Today a number of genomic and proteomic annotation projects have made this information publicly available.

Gene Expression

Gene expression refers to the translation of the information encoded in a gene into an RNA transcript. Expressed transcripts include messenger RNAs (mRNA) translated into proteins, as well as other types of RNA, such as transfer RNA (tRNA), ribosomal RNA (rRNA), micro RNA (miRNA), and non-coding RNA (ncRNA), that are not translated into protein. Gene expression is a highly specific process by which cells switch genes on and off in a timely manner, according to their state. The study of mRNA expression in a cell is an indirect way to study the proteins counterpart.

Gene Expression Classifier

The term classifier is derived from the field of machine learning. The goal of classification is to group items that have similar feature values into groups. Usually, in the context of gene expression analysis, a classifier is a composite algorithm that achieves patients classification by using gene expression measurements.

Gene Expression Profiling

This term refers to any genomic techniques that measure the fraction of the genes that is expressed in a specific sample. This definition refers to techniques that allow the assessment of more than one gene at a time, especially microarray and real time RT-PCR.

Gene expression profile: This is any set of genes for which the expression in a specific sample is known. A gene expression profile may account for a variable number of genes, and the corresponding expression values may be obtained by different techniques. Gene expression profiles can be associated, by various techniques, to phenotypes.

Gene expression pattern: This is an equivalent term currently in use to refer to “gene expression profile.”

Gene expression signature: This is an equivalent term currently in use to refer to a specific “gene expression profile,” usually associated with a specific phenotype.

Genome

In biology the genome of an organism is its whole hereditary information and is encoded in the DNA (for some viruses, RNA). This includes both the genes encoding for proteins, as well as the non-coding sequences of the DNA. The term, coined in 1920 by Hans Winkler, is the fusion of the words gene and chromosome. The study of the global properties of genomes is usually referred to as ‘genomics’, which distinguishes it from genetics, which generally studies the properties of single genes or groups of genes.

Laser Capture Microdissection

Laser Capture Microdissection (LCM) is a method for isolating pure cells of interest from specific regions of tissue sections. In this procedure a special film is applied on tissue sections that are analyzed under the microscope. When the cells of choice are identified, the operator can use a laser to dissect the cells and transfer them off of the film leaving all unwanted cells behind in the tissue section. LCM does not alter or damage the morphology and chemistry of the sample collected from which is possible to prepare DNA, RNA and/or protein. LCM can be performed on a variety of tissue samples, including blood smears, cytologic preparations, cell cultures and frozen and paraffin embedded archival tissue.

MIAME

MIAME (Minimum Information About a Microarray Experiment) is a standard for reporting microarray experiments. It is intended to specify all the information necessary to interpret the results of the experiment unambiguously and to reproduce the experiment. While the standard defines the content desired for reports, it does not specify the format in which this data should be presented. There are a number of file formats for representing this data, and both public and subscription-based repositories for such experiments.

Normalization

In an experimental context, normalizations are used to standardize data to enable differentiation between real (biological) variations and variations due to the measurement process. In gene expression analysis (by DNA microarray or RT-PCR), normalization refers to the process of identifying and removing the systematic effects, bringing the data from different samples onto a common scale. Several alternative methods and approaches to perform normalization exist both for RT-PCR and DNA microarray.

Oligonucleotide

Oligonucleotides are short sequences of nucleotides (RNA or DNA), typically with twenty or fewer bases, although automated synthesizers allow the synthesis of oligonucleotides up to 200 bases. The length of a synthesized base is usually denoted by the suffix ‘mer’: for example, a fragment of 25 bases would be called a 25-mer. Oligonucleotides are used as probes to detect complementary DNA or RNA molecules. Specific DNA oligonucleotides are used in the PCR, and in this instance, they are referred to as “primers,” since they generate a place for the DNA polymerase to bind and extend the primers themselves, by the addition of nucleotides to make a copy of the target sequence. Oligonucleotides are may be referred to as “oligos.”

Platform

In the context of gene expression profiling analysis the term “platform” is often used to refer to the technology, instruments, and protocols used to measure gene expression. In this sense real time RT-PCR, cDNA microarrays, and oligonucleotide microarrays represent different platforms.

Polymerase Chain Reaction (PCR)

PCR is a molecular biology technique for isolating and exponentially amplifying a DNA sequence of interest in vitro via enzymatic replication. This technique has been extensively modified to perform a wide array of tasks, and it is now a common tool used in medical and biological research. PCR is now used to obtain the sequence of genes, to diagnose hereditary diseases, identify genetic fingerprints (forensics medicine), detect infectious diseases, and create transgenic organisms. Coupled to “reverse transcription” it is used to amplify RNA molecules.

Primer

A primer is a nucleic acid strand or a related molecule that serves as a starting point for DNA replication. A primer is required because most DNA polymerases cannot begin synthesizing a new DNA strand from scratch, but can only add to an existing strand of nucleotides. In most natural DNA replication, the ultimate primer for DNA synthesis is a short strand of RNA. This RNA is produced by “primase,” and is later removed and replaced with DNA by a DNA polymerase. Many laboratory techniques of biochemistry and molecular biology that involve DNA polymerases, such as DNA sequencing and polymerase chain reaction, require primers. The primers used for these techniques are usually short, chemically synthesized DNA molecules with a length about twenty bases.

Probe

In molecular biology, a hybridization probe is a fragment of DNA of variable length, which is used to detect the presence of nucleotide sequences that are complementary to the sequence in the probe. The complementary sequences are referred to as “targets.” The hybridization probe is usually labeled radioactively, or with immunological or fluorescent markers. The labeled probe is then denatured (by heating) into single DNA strands and hybridized to target DNA (Southern blotting) or RNA (Northern blotting) immobilized on a membrane or in situ. In a DNA microarray the hybridization scheme is reversed and the probes are attached to a solid surface, while the labeled targets are in the reaction solution. Similarly, in real time RT-PCR, probes are fragments of DNA that fluoresce when hybridized to the complementary investigated RNA molecule.

Proteome

The term proteome was coined by Mark Wilkins in 1994, as the fusion between proteins and genome. This term refers to the entire set of proteins expressed by a genome, cell, tissue or organism at a given time under defined conditions. The proteome is larger and more complex than the genome, especially in eukaryotes, in the sense that there are more proteins than genes. This is due to alternative splicing of genes and post-translational modifications like glycosylation or phosphorylation.

Real Time Reverse Transcriptase Polymerase Chain Reaction (RT-PCR)

Real-time RT-PCR is a molecular biology technique that allows the amplification and the quantification in real time of defined RNA molecules from specific specimens. This technology has been used for several years in research and clinical settings to measure RNA molecules. In the first step DNA, copies of the investigated RNA molecules present in the template are obtained by a reaction named reverse transcription. Then DNA amplification is obtained using PCR, while the quantification of the accumulating DNA product is accomplished by the use of specific fluorescent reagents. The quantification of the target RNA molecule is based on the analysis of the accumulation curve of the complementary DNA, as measured by the fluorescence detected at each cycle of the reaction.

Reverse Transcription

In biochemistry, reverse transcription is the enzymatic reaction induced on by the RNA-dependent DNA polymerase. This enzyme, also known as reverse transcriptase, is a DNA polymerase enzyme that copies single-stranded RNA into DNA. This process is the reverse of normal transcription, which involves the synthesis of RNA from DNA.

Ribonuclease

This type of enzyme, abbreviated commonly as RNase, is a nuclease that catalyzes the hydrolysis of RNA molecules into smaller components. They are divided into endonucleases (can cut RNA molecules in the middle) and exonucleases (degrades RNA from the extremities of the molecules).

Target

In gene expression profiling analysis, a target is the RNA transcript that is under investigation using its complementary counterpart, the probe.

Tissue Microarrays

Tissue microarrays (TMA) consist of paraffin blocks in which can be embedded with up to 1000 separate tissue cores, assembled in array fashion to allow simultaneous histological analysis.

Transcription

Transcription is the process by which DNA sequences are copied into complementary RNA molecules by the enzyme RNA polymerase. This reaction represents the transfer of genetic information from DNA into RNA, which is from “storing” to “function.” The DNA sequence that is transcribed into an RNA molecule is called a “transcript.”

Trascriptome

The transcriptome is the set of all RNA molecules, or “transcripts,” produced in one or a population of cells. The term can be applied to the total set of transcripts in a given organism, or to the specific subset of transcripts present in a particular cell type. Unlike the genome, which is roughly fixed for a given cell line (excluding mutations), the transcriptome can vary from cell to cell, and with external environmental conditions. Because it includes all RNA transcripts in the cell, the transcriptome reflects the genes that are being actively expressed at any given time. The study of the trascriptome examines the expression level of RNAs in a given cell population, often using high-throughput techniques based on DNA microarray technology, or RT-PCR.

Appendix C: Description of Genes

ONCOTYPE™: the 21-gene signature
AccUGClusterNameSymbolEGIDUGRepAccLLRepProtAccChromosomeCytoband
NM_001101Hs.520640Actin, betaACTB60AK125561NP_00109277p15-p12
NM_002046Hs.544577Glyceraldehyde-3-phosphate dehydrogenaseGAPDH2597BF983396NP_0020371212p13
NM_001002Hs.546285Ribosomal protein, large, P0RPLP06175BQ051850NP_4445051212q24.2
NM_000181Hs.255230Glucuronidase, betaGUSB2990AK096764NP_00017277q21.11
NM_003234Hs.529618Transferrin receptor (p90, CD71)TFRC7037BC001188NP_00322533q29
NM_002417Hs.80976Antigen identified by monoclonal antibody Ki-67MKI674288NM_002417NP_0024081010q25-qter
NM_003600Hs.250822Aurora kinase AAURKA6790NM_198433NP_9408392020q13.2–q13.3
NM_001168Hs.514527Effector cell peptidase receptor 1EPR18475NM_0010122711717q25
NM_031966Hs.23960Cyclin B1CCNB1891NM_031966NP_11417255q12
NM_002466Hs.179718V-myb myeloblastosis viral oncogene homolog (avian)-like 2MYBL24605BX647151NP_0024572020q13.1
NM_004448Hs.446352V-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian)ERBB22064NM_001005862NP_0044391717q11.2–q12|17q21.1
NM_005310Hs.86859Growth factor receptor-bound protein 7GRB72886NM_005310NP_0053011717q12
NM_000043Hs.244139Fas (TNF receptor superfamily, member 6)FAS355AB209361NP_6906161010q24.1
NM_000926Hs.368072Progesterone receptorPGR5241X51730NP_0009171111q22–q23
NM_000633Hs.150749B-cell CLL/lymphoma 2BCL2596NM_000633NP_0006481818q21.33|18q21.3
NM_020974Hs.523468Signal peptide, CUB domain, EGF-like 2SCUBE257758NM_020974NP_0660251111p15.3
NM_005940Hs.143751Matrix metallopeptidase 11 (stromelysin 3)MMP114320NM_005940NP_0059312222q11.2|22q11.23
NM_001333Hs.660866Cathepsin L2CTSL21515BC067289NP_00132499q22.2
NM_000561Hs.301961Glutathione S-transferase M1GSTM12944BQ880398NP_66653311p13.3
NM_001251Hs.647419CD68 moleculeCD68968NM_001251NP_0012421717p13
NM_004323Hs.377484BCL2-associated athanogeneBAG1573NM_004323NP_00431499p12
MAMMAPRINT®: the 70-gene signature
ORIGINALIDAccUGClusterNameSymbolLLIDUGRepAccLLRepProtAccChromosomeCytoband
AA555029_RCAA555029Hs.100691Hypothetical protein LOC286052LOC286052286052AK09510488q24.13
AF052162AF052162Hs.368853Acyltransferase like 2AYTL279888AK090444NP_07910655p15.33
NM_007203NM_007203Hs.591908PALM2-AKAP2 proteinPALM2-AKAP2445815NM_053016NP_67149299q31–q33
AL080059AL080059Hs.173094TSPY-like 5TSPYL585453NM_033512NP_27704788q22.1
AL137718AL137718Hs.283127Diaphanous homolog 3 (Drosophila)DIAPH381624NM_001042517NP_1121941313q21.2
NM_003748NM_003748Hs.77448Aldehyde dehydrogenase 4 family, member A1ALDH4A18659NM_003748NP_73384411p36
NM_001282NM_001282Hs.514819Adaptor-related protein complex 2, beta 1 subunitAP2B1163NM_001030006NP_0012731717q11.2–q12
U82987U82987Hs.467020BCL2 binding component 3BBC327113AF332558NP_0552321919q13.3–q13.4
NM_004702NM_004702Hs.567387Cyclin E2CCNE29134NM_057735NP_47709788q22.1
NM_020974NM_020974Hs.523468Signal peptide, CUB domain, EGF-like 2SCUBE257758NM_020974NP_0660251111p15.3
NM_001809NM_001809Hs.1594Centromere protein ACENPA1058BM911202NP_00180022p24-p21
AF201951AF201951Hs.530735Membrane-spanning 4-domains, subfamily A, member 7MS4A758475NM_032597NP_9968231111q12
X05610X05610Hs.508716Collagen, type IV, alpha 2COL4A21284NM_001846NP_0018371313q34
Contig20217_RCAA834945Hs.604604Transcribed locus, moderately similar to XP_001091104.1 similar to lin-9 homolog [Macaca mulatta]AA834945AA8349451
Contig24252_RCAW024884Hs.528605PP12104LOC643008643008XM_9280531717q25.1
Contig28552_RCAA992378Hs.283127Diaphanous homolog 3 (Drosophila)DIAPH381624NM_001042517NP_1121941313q21.2
Contig32125_RCAA404325Hs.523036CDNA FLJ38245 fis, clone FCBBF2007186AA404325AK0955641
Contig32185_RCAI377418Hs.657472G protein-coupled receptor 180GPR180160897NM_180989NP_8513201313q32.1
Contig35251_RCAI283268Hs.634333CDNA: FLJ22719 fis, clone HSI14307AI283268AK0263727
Contig38288_RCAI554061Hs.657864Quiescin Q6 sulfhydryl oxidase 2QSOX2169714AJ318051NP_85905299q34.3
Contig40831_RCAI224578Hs.595493Full-length cDNA clone CS0DI029YM01 of Placenta Cot 25-normalized of Homo sapiens (human)AI224578BF6754858
Contig46218_RCAI813331Hs.283127Diaphanous homolog 3 (Drosophila)DIAPH381624NM_001042517NP_1121941313q21.2
Contig46223_RCAA528243Hs.22917Reticulon 4 receptor-like 1RTN4RL1146760NM_178568NP_8486631717p13.3
Contig48328_RCAI694320Hs.655005Zinc finger protein 533ZNF533151126BC092423NP_68973322q31.2–q31.3
Contig51464_RCAI817737Hs.567582F-box protein 31FBXO3179791AF318348NP_0790111616q24.2
Contig55377_RCAI918032Hs.632255RUN domain containing 1RUNDC1146923BC039247NP_7751021717q21.31
Contig55725_RCAI992158Hs.470654Cell division cycle associated 7CDCA783879AL834186NP_66580922q31
Contig56457_RCAI741117Hs.530272Chromosome 9 open reading frame 30C9orf3091283AK092292NP_54238699q31.1
Contig63102_RCAI583960Hs.55918Likely ortholog of mouse D11lgp2LGP279132AK021416NP_0770241717q21.2
Contig63649_RCAW014921Hs.446388CDNA FLJ41489 fis, clone BRTHA2004582AW014921AK12348311
NM_020188NM_020188Hs.388255Chromosome 16 open reading frame 61C16orf6156942BM463756NP_0645731616q23.2
NM_000788NM_000788Hs.709Deoxycytidine kinaseDCK1633CD014015NP_00077944q13.3–q21.1
AL080079AL080079Hs.318894G protein-coupled receptor 126GPR12657211NM_020455NP_94097166q24.1
Contig25991AI738508Hs.518299Epithelial cell transforming sequence 2 oncogeneECT21894AY376439NP_06056833q26.1–q26.2
NM_007036NM_007036Hs.129944Endothelial cell-specific molecule 1ESM111082X89426NP_00896755q11.2
NM_000127NM_000127Hs.492618Exostoses (multiple) 1EXT12131NM_000127NP_00011888q24.11–q24.13
NM_003862NM_003862Hs.87191Fibroblast growth factor 18FGF188817AF075292NP_38749855q34
NM_018354NM_018354Hs.516834Chromosome 20 open reading frame 46C20orf4655321AK126837NP_0608242020p13
NM_002019NM_002019Hs.654360Fms-related tyrosine kinase 1 (vascular endothelial growth factor/vascular permeability factor receptor)FLT12321NM_002019NP_0020101313q12
NM_003875NM_003875Hs.591314Guanine monphosphate synthetaseGMPS8833NM_003875NP_00386633q24
NM_002073NM_002073Hs.584760Guanine nucleotide binding protein (G protein), alpha z polypeptideGNAZ2781BC037333NP_0020642222q11.22
NM_000849NM_000849Hs.2006Glutathione S-transferase M3 (brain)GSTM32947NM_000849NP_00084011p13.3
NM_006101NM_006101Hs.414407NDC80 homolog, kinetochore complex component (S. cerevisiae)NDC8010403NM_006101NP_0060921818p11.32
NM_018401NM_018401Hs.133062Serine/threonine kinase 32BSTK32B55351AY358353NP_06087144p16.2-p16.1
AF055033AF055033Hs.635441Insulin-like growth factor binding protein 5IGFBP53488NM_000599NP_00059022q33–q36
NM_000599NM_000599Hs.635441Insulin-like growth factor binding protein 5IGFBP53488NM_000599NP_00059022q33–q36
NM_014791NM_014791Hs.184339Maternal embryonic leucine zipper kinaseMELK9833NM_014791NP_05560699p13.2
AK000745AK000745Hs.377155MetadherinMTDH92140BC045642NP_84892788q22.1
AB037863AB037863Hs.471955Early B-cell factor 4EBF457593XM_0449212020p13
NM_016448NM_016448Hs.656473Denticleless homolog (Drosophila)DTL51514NM_016448NP_05753211q32.1–q32.2
NM_016359NM_016359Hs.615092Nucleolar and spindle associated protein 1NUSAP151203AK222819NP_0609241515q15.1
NM_020386NM_020386Hs.36761HRAS-like suppressorHRASLS57110BC048095NP_06511933q29
NM_005915NM_005915Hs.444118Minichromosome maintenance complex component 6MCM64175NM_005915NP_00590622q21
NM_004994NM_004994Hs.297413Matrix metallopeptidase 9 (gelatinase B, 92kDa gelatinase, 92kDa type IV collagenase)MMP94318NM_004994NP_0049852020q11.2–q13.1
NM_014889NM_014889Hs.528300Pitrilysin metallopeptidase 1PITRM110531CR749279NP_0557041010p15.2
NM_006681NM_006681Hs.418367Neuromedin UNMU10874BF034907NP_00667244q12
NM_014321NM_014321Hs.49760Origin recognition complex, subunit 6 like (yeast)ORC6L23594NM_014321NP_0551361616q12
NM_000436NM_000436Hs.2782773-oxoacid CoA transferase 1OXCT15019NM_000436NP_00042755p13.1
NM_006117NM_006117Hs.15250Peroxisomal D3, D2-enoyl-CoA isomerasePECI10455AB209917NP_99666766p24.3
AF257175AF257175Hs.15250Peroxisomal D3, D2-enoyl-CoA isomerasePECI10455AB209917NP_99666766p24.3
NM_003607NM_003607Hs.35433CDC42 binding protein kinase alpha (DMPK-like)CDC42BPA8476NM_003607NP_05564111q42.11
NM_003981NM_003981Hs.567385Protein regulator of cytokinesis 1PRC19055NM_003981NP_9554461515q26.1
NM_016577NM_016577Hs.12152RAB6A, member RAS oncogene familyRAB6A5870NM_016577NP_942599311q13.3
NM_002916NM_002916Hs.518475Replication factor C (activator 1) 4, 37kDaRFC45984NM_002916NP_85355133q27
AF073519AF073519Hs.658079Small EDRK-rich factor 1A (telomeric)SERF1A8293AF073519NP_06880255q12.2–q13.3
NM_006931NM_006931Hs.419240Solute carrier family 2 (facilitated glucose transporter), member 3SLC2A36515AB209607NP_0088621212p13.3
Contig2399_RCW90004Hs.444450Egl nine homolog 1 (C. elegans)EGLN154583AF229245NP_07133411q42.1
NM_003239NM_003239Hs.592317Transforming growth factor, beta 3TGFB37043AK122902NP_0032301414q24
NM_015984NM_015984Hs.591458Ubiquitin carboxyl-terminal hydrolase L5UCHL551377AK225794NP_05706811q32
NM_003882NM_003882Hs.492974WNT1 inducible signaling pathway protein 1WISP18840AF100779NP_54302888q24.1–q24.3
BCP - H/I assay: 2-gene signature and normalizing genes
SymbolUGClusterNameEGIDUGRepAccLLRepProtAccChromosomeCytoband
ACTBHs.520640Actin, beta60AK125561NP_00109277p15-p12
HMBSHs.82609Hydroxymethylbilane synthase3145BU168137NP_0001811111q23.3
SDHAHs.440475Succinate dehydrogenase complex, subunit A, flavoprotein (Fp)6389AK131478NP_00415955p15
UBCHs.520348Ubiquitin C7316AB209436NP_0662891212q24.3
HOXB13Hs.66731Homeobox B1310481AY937237NP_0063521717q21.2
IL17RBHs.654970Interleukin 17 receptor B55540NM_018725NP_75843433p21.1

Symbol: official gene symbol

UGCCluster: Unigene cluster identifier

Name: gene name according to Unigene

EGID: Entrez Gene identifier

UGRepAcc: representative GeneBank accession number according to Unigene

LLRepProtAcc: representative Protein accession number according to Entrez Gene

Chromosome: chromosomal location

Cytoband: cytogenetic band

Appendix D: Technologies

Reverse Transcription-Polymerase Chain Reaction (RT-PCR) and Real-Time RT-PCR

Reverse transcription polymerase chain reaction (RT-PCR) is a molecular biology technique for amplifying a specific piece of a ribonucleic acid (RNA) molecule. The RNA molecule is first reverse transcribed into complementary DNA (cDNA), followed by amplification of the resulting DNA by polymerase chain reaction (PCR), which is the common method used to amplify specific parts of a DNA molecule, via the temperature-mediated enzyme DNA polymerase. PCR uses specific short oligonucleotides, defined as primers, complementary to the target sequence to be amplified that serve to prime the polymerase reaction. The sequence of such oligonucleotides is responsible for the specificity of the reaction for the target nucleic acid fragment under analysis. PCR proceeds through subsequent amplification cycles determined by controlled temperature shifts of the reaction mixture. Real-time polymerase chain reaction is a laboratory technique that allows amplifying and quantifying simultaneously the specific part of the nucleic acid sequence under analysis. In this technique, the DNA quantity produced after each round of amplification is obtained by alternative methods. The most common quantification protocols are based on the use of fluorescent dyes that intercalate with double-strand DNA, or on modified DNA oligonucleotide probes that fluoresce when hybridized with the complementary DNA.

Real-time RT-PCR is the combination of the described techniques and enables gene expression evaluation at a particular time, or in a particular cell or tissue type. This technique is extremely sensitive and has been used to measure RNA from a single cell. The development of novel chemistries and instrumentation platforms has led to widespread use of this approach to measure gene expression changes. Moreover, this technique has become the preferred way to validate results obtained from microarray analyses and other techniques that evaluate gene expression changes on a global scale.

RT-PCR procedures

During PCR amplification, template, product and reagent relative ratios vary. At the beginning of the reaction, reagents are in excess, template and products are at low concentrations. In this phase they do not compete for primer binding, so that the amplification proceeds at an exponential rate. Following this initial phase the reaction enters a linear phase of amplification, in which annealing of the PCR products competes with primers for binding. Following this phase, in late reaction cycles, the amplification reaches a plateau and no more PCR products accumulate. Accurate and precise quantitative data are collected during the exponential phase of the amplification, in which amplification is extremely reproducible. In real-time PCR this process is automated and measurements are made at each cycle.

Several options are currently available to perform RT-PCR and real time RT-PCR: TaqMan® (Applied Biosystems, Foster City, CA, USA), Molecular Beacons, Scorpions® and the use of SYBR® Green (Molecular Probes). In all of these technologies PCR products are detected by generation of a fluorescent signal. TaqMan® probes, Molecular Beacons and Scorpions® rely on Förster Resonance Energy Transfer (FRET): a dye molecule and a quencher moiety are bound to the same or different oligonucleotide substrates and fluorescence is emitted when they are separated. SYBR Green is a fluorogenic dye that emits a strong fluorescent signal upon binding to double-stranded DNA.

TaqMan probes

TaqMan technology depends on the 5′- nuclease activity of the DNA polymerase used for PCR. This activity is used to separate the quencher and the dye, releasing FRET and thus producing fluorescence. During the reaction, this enzyme hydrolyzes the oligonucleotide probes that are hybridized to the target sequence, decoupling occurs, and fluorescence arises, increasing at each cycle, proportional to the amount of probe cleavage.

Molecular beacons

Molecular Beacons also is based on FRET, although the design of the probes is different. In this chemistry, a dye is attached to the 5′ end and a quencher is bound to the 3′ end of an oligonucleotide substrate. The 5′- nuclease activity of the DNA polymerase is not required since Molecular Beacons probes in solution form a loop structure that prevents fluorescing, while after hybridization to the target sequence, the dye and quencher are separated, FRET is release, and light is emitted upon irradiation.

Scorpions

Scorpion technology assembles the amplification primer and the reporter sequence into the same oligonucleotide. In solution, the dye is attached to the 5′ end of the probe and is quenched by a moiety coupled to a complementary sequence, linked to the primer at the 3′ end, through a non-amplifiable monomer. During PCR, after extension of the Scorpion primer, the two specific probe sequences are able to bind each other, thus opening up the hairpin loop, releasing quenching and causing signal emission.

SYBR Green

SYBR Green binds double-stranded DNA, and upon excitation fluoresces. The more PCR products accumulate, the more light emission increases. SYBR® Green is sensitive, inexpensive, and easy to use. However it binds to any double-stranded DNA molecule in the reaction, including primer-dimers and other non-specific reaction products, and this may result in an overestimation of the target molecule concentration. Since this dye binds to double-stranded DNA, there is no need to design specific probes for any particular target under analysis.

Real-time reporters for Multiplex PCR

Several implementations of this technique (TaqMan, Molecular Beacons and Scorpions) allow multiple DNA species to be measured in the same sample (multiplex PCR). Fluorescent dyes with different emission spectra, indeed, may be coupled to the different probes assaying different targets. This approach allows the use of internal controls, which can be co-amplified along with the target sequence under analysis in the same reaction tube. Multiplex is not possible with SYBR Green.

Quantitation of results

Two methods are commonly used to quantify the results obtained by real-time RT-PCR:

  • 1

    The standard curve method;

  • 2

    Comparative threshold method;

The standard curve method

In this method, a standard curve is obtained from a nucleic acid template of known concentration, serially diluted. This curve is subsequently used as a reference to extrapolate quantitative information about mRNA targets of unknown concentrations. Such standards can be RNA molecules transcribed in vitro from cDNA plasmids, or other nucleic acid templates prepared at the purpose. cDNA plasmids are the preferred standards used to obtain the standard curve, however, their use will not allow inferences about the efficiency of the reverse transcription reaction, or about possible differences in the RNA template inputs. For this reason normalization to one or more housekeeping genes is often used.

Comparative threshold method

This approach involves the comparison of the cycle threshold (CT) values of the samples of interest to the CT values of a control RNA sample, after internal normalization of each CT to an appropriate endogenous housekeeping gene. For this method to be valid, the amplification efficiencies of the target and the endogenous reference must be similar. If a housekeeping gene cannot be found, whose amplification efficiency is similar to the target, then the standard curve method is better.

Instrumentation for Real-time RT-PCR

Real-time RT-PCR requires platforms consistency of a thermal cycler, a computer, optics for fluorescence excitation and emission collection, and data acquisition and analysis software. Such instrumentation, available from several manufacturers, varies in term of sample capacity (single tubes, 96-well, 384-well formats), excitation method, and overall sensitivity.

DNA Microarrays

The introduction of automated large scale sequencing, supported by adequate computational tools and bioinformatics development, has greatly increased our general knowledge on genomic sequences organization and function. This knowledge is the basis for gene expression investigation on a global scale by parallel analysis of thousands of genes in a single assay. In microarray analysis, the Northern blotting scheme is reversed: the labeled moiety is obtained from the RNA sample and a certain number of immobilized known sequences are used as probes (Baldwin, Crane et al. 1999). The advances made in attaching nucleic acid sequences to glass supports and robotics allowed investigators to miniaturize the scale of the reactions. Modified microscope slides could be used to deposit thousands of nucleic acid sequences. The same result was also obtained by borrowing photolithography techniques from semiconductor manufacturing to synthesize oligonucleotides directly onto a solid support (Watson, Mazumder et al. 1998). Altogether, these advances led to in 1995, to the first papers in which the term “microarray” was used in its current meaning (Schena, Heller et al. 1998).

Principles of microarray analysis

All the different technical solutions that have been so far developed to perform microarray analysis are miniaturized hybridization assays that allow investigators to simultaneously query thousands of nucleic acid fragments. All microarray systems share the following key components:

  • The array, which contains the immobilized nucleic acid sequences, known as “probes”;

  • One or more labeled samples or “targets”, that are hybridized against the microarray;

  • A detection system that quantify the hybridization signals

Microarrays and DNA-chips

Spotted microarrays consist of a collection of preformed nucleic acid sequences immobilized onto the solid support so that each unique sequence forms a tiny feature called “spot” or feature. These nucleic acids are obtained in numerous ways, and there are different methods for depositing them onto microarray slides (by simple contact, by ink-jet technology, or by micro-syringe pumping for instance). In general, nucleic acid prepared for deposition on microarrays consist of cDNA clones amplified by polymerase chain reaction (cDNA microarrays), or of synthesized oligonucleotides of various length (oligonucleotide microarrays, i.e., microarrays from Agilent Technologies). The size of the spots differs from one system to another, but it is usually less than two hundred micrometers in diameter. A modified glass slide or glass wafer acts as the solid support onto which up to tens of thousands of spots can be arrayed in a total area of a few square centimetres. On the contrary, DNA-chips are produced by a proprietary technology (GeneChip®, Affymetrix) quite different from the spotted one, as it is based on direct photolithography synthesis of short oligonucleotides (20–25 base pairs) on the solid support.

Target labeling

Whatever the kind of microarray used, DNA probes present on the arrays are interrogated by nucleic acid hybridization with a labeled target. The sample may be mRNA for a gene expression study or genomic DNA for other purposes (promoter usage analysis: CHIP-on-Chip, genomic rearrangements: FISH-on Chip). The sample is converted to a labeled population of nucleic acids, known as the target. These moieties consist of several thousands of different labeled nucleic acid fragments and the final complexity is much greater than the one usually encountered in other routine molecular biology experiments. Therefore, these hybridizations should be carried out under conditions that do not promote annealing of non-complementary fragments. Fluorescent dyes, and especially the cyanine dyes Cy3 and Cy5, have been widely adopted as the predominant labels in microarray analysis. Fluorescence has the advantage of permitting the detection of two or more different signals in one experiment. This has thus allowed investigators to perform comparative analysis of two or more samples on one microarray. The described scheme is usually adopted for cDNA microarray analysis, while single channel experiments are the best-suited choice for GeneChip® technology, thanks to the high manufacturing reproducibility of the chips. The use of fluorescence has also increased the accuracy and throughput of microarray analysis over filter-based macroarrays, in which only one radioactively labeled sample can be conveniently analyzed at a time.

Microarray hybridization

In a microarray hybridization, the labeled fragments in the target are expected to form duplexes with their immobilized complementary probes. This requires that the nucleic acids are single-stranded and accessible to each other. The number of duplexes formed reflects the relative number of each specific fragment in the target, as long as the amount of immobilized nucleic acid probe is in excess and not restraining the kinetics of hybridization. Two or more samples labeled with different fluorescent dyes can be hybridized simultaneously, resulting in simultaneous hybridization taking place at each spot. By measuring the different fluorescent signals associated with each feature, the relative abundance of specific sequences in each of the samples can be determined.

Scanning and data analysis

Microarray scanners typically contain two different lasers that emit light at wavelengths that are suitable for exciting the fluorescent dyes used as labels. A detector system attached to a confocal microscope records the emitted light from each feature of the array, permitting high-resolution detection of the hybridization signals. Alternative solutions use CCD-camera devices to detect the fluorescence. Despite their small size, microarrays allow the generation of a large amount of data even from a single hybridization. For these reasons the use of computerized data processing is necessary in order to handle the amount of generated data and to gain maximum information from the experiment. This is usually achieved by specialized software that extracts primary data from scanned microarray slide images, normalizes this data to remove the influence of experimental variation, and finally manipulates the data so that biologically meaningful conclusions can be made.

Applications of microarray analysis

The versatility of microarray analysis is confirmed by its rapid emergence as a general molecular biology analytical technique. An increasing number of researchers are now exploiting this technology in diverse biomedical disciplines. In fact, microarrays have not replaced established techniques, but rather represent a high-power approach to perform analyzes that were previously time consuming. By using information derived from the several complete or near complete genome sequences, including the human genome, it is now possible to perform genome-wide experiments using microarray technology. This has already been demonstrated for Saccaromyces Cerevisiae where all the expressed genes are known (Chu, DeRisi et al. 1998; Spellman, Sherlock et al. 1998). Due to the availability of millions of data points at once, microarrays enabled global analysis of fundamental biological processes: gene expression analysis, genome analysis, and drug discovery have been three of the main areas in which microarray analysis has been applied so far.

Gene expression analysis

Gene expression analysis examines the composition of cellular messenger RNA populations. The identity of transcripts that make up these populations and their expression levels are informative of the cell state and of the activity of the genes and, as the precursors of translated proteins, changes in mRNA levels are related to changes in the proteome. In the simplest scheme a typical microarray gene expression experiment compares the relative expression levels of specific transcripts in two samples. Usually one of the samples is a control while the other is obtained from cells whose response or status is being explored. Each one of the two samples is labeled with a different fluorescent dye, and equal amounts of the labeled samples are combined and hybridized with the microarray. After hybridization, two grey scale images (usually in a 16-bit TIFF format) corresponding to the fluorescent signals of the two dyes are independently obtained by scanning the microarray and fluorescence intensity from each feature is subsequently quantified by a specific software. After normalization, the intensity of the two hybridization signals can be compared: equal signal from both samples suggests equal expression of the considered genes in both samples, while signals' disparity is suggestive of differential expression.

One of the most important remarks that has to be taken into account is that microarray analysis does not give any information about absolute gene expression levels in the samples. This is because the intensity of the fluorescent signals is not only proportional to the number of hybridized fragments, but also to the length of these fragments and the number of fluorescent labels each fragment carries (specific activity of the target or labeling density). These parameters are determined by the unique nucleotide sequence of each transcript, so that they will vary from gene to gene. If the two samples have been labeled under similar conditions, the length and labeling density of specific transcripts will be similar, allowing the comparison of the relative abundance of the transcripts in the analyzed targets. For these reasons a strong hybridization signal from microarray analysis does not necessarily correspond to a highly expressed gene, as it could be derived, for instance, from a gene that is expressed at a relatively low level but yields highly labeled target fragments.

Gene expression analysis with microarrays has been applied to numerous mammalian tissues, plants, yeast, and bacteria (Braxton and Bedilion 1998; Mirnics 2001; Mirnics, Middleton et al. 2001; Schulze and Downward 2001; van Berkum and Holstege 2001). These studies have examined the effects of treating cells with chemicals, the consequences of over-expression of regulatory factors in transfected cells, and compared mutant strains with parental strains to delineate functional pathways. In cancer research, microarrays have been used to find gene expression changes in transformed cells and metastases, to identify diagnostic markers, and to classify tumors based on their gene expression profiles (DeRisi, Penland et al. 1996; Alizadeh, Eisen et al. 1999; Alizadeh, Ross et al. 2001; Rew 2001, van't Veer et al. 2002).

Genomic analysis

In addition to gene expression analysis, microarrays are now also established tools for genomic analysis (Shoemaker, Schadt et al. 2001). Microarrays, in fact, can be used to reveal transcription factor interactions with specific sequences and motifs regulating gene expression. For example, by combining immunoprecipitation of transcription factor-DNA complexes to microarray identification of DNA fragments on a genomic microarray, it was possible to identify functional regulatory elements in the yeast genome (Lieb, Liu et al. 2001). Furthermore, microarrays were used to predict splice variants of transcripts and investigate genomic fragments derived from genetic analysis methods, such as genomic mismatch scanning and representational difference analysis (Hu, Madore et al. 2001; Meltzer 2001) and specific oligonucleotide microarrays have been applied to the analysis of known single nucleotide polymorphisms (SNPs) and mutations (Sapolsky, Hsie et al. 1999; Larsen, Christiansen et al. 2001). Moreover, microarray hybridization can also be used to sequence DNA samples, thus providing a suitable mean for identifying new genetic variants (Drobyshev, Mologina et al. 1997).

Drug discovery

A typical drug discovery process needs several years of research and only a few candidate compounds result at the end in approved drugs. For these reasons methods that increase the efficiency of the process and improve the probability of developing effective drugs are needed. In this perspective microarray analysis proved useful in different stages of drug discovery (Lockhart and Winzeler 2000; Meltzer 2001; van Berkum and Holstege 2001). For instance, the identification of potential therapeutic compounds can be achieved by elucidating metabolic pathways by looking for co-expressed. Once the drug candidates have been selected, microarrays can be subsequently used to define their toxic properties by examining expression profiles induced by drug treatments (Jain 2000). Moreover, the gene expression changes elicited by different drug treatments were also recently used to recognize their mechanisms of action (Jain 2000).

Specific Gene Expression Based Assays

Oncotype DX™ process

Oncotype DX is a multi-gene assay, designed to provide a quantitative assessment of the likelihood of breast cancer distant recurrence. Oncotype DX is offered by Genomic Health, where the assay was developed. The assay accounts for the following procedures: RNA is extracted and purified from the tumor specimen, then the level of expression of 21 genes (16 cancer related and 5 control genes) is obtained by RT-PCR, finally the Recurrence Score™ is calculated from the gene expression results.

In the current implementation of the assay, a pathologist at Genomic Health reviews the tumor content of the specimens to be processed, then RNA is extracted from formalin-fixed, paraffin-embedded (FFPE) specimens and contaminant DNA is removed by DNase I treatment. Total RNA yield is measured and the absence of DNA contamination is verified. Real time RT-PCR is the performed by TaqMan® technology in 384-well plates. The expression of the 16 cancer genes is measured in triplicate then normalized to the expression levels of the 5 reference genes. Finally, normalized gene expression levels of the 16 cancer related genes are used to compute the Recurrence Score (RS), on a scale form 1 to 100. Clinical studies showed the correlation of the RS with the likelihood of distant recurrence at 10 years, which increases continuously with increase of the RS, however three distinct group of risk were defined: low-risk (RS < 18), intermediate-risk (RS 18–30), and high-risk (RS ≥ 31) (Paik, Shak et al. 2004). The Oncotype DX test is offered to patients who meet the following criteria:

  • Newly diagnosed

  • Will be treated with tamoxifen;

  • Stage I invasive breast cancer with ER positive;

  • Stage II invasive breast cancer with ER positive and lymph node negative.

MammaPrint®

MammaPrint is a multi-gene microarray-based, diagnostic assay, designed to provide a quantitative prediction of risk of metastasis in breast cancer patients. The assay measures in triplicate the expression levels of 70 distinct genes, which were originally identified in a research performed at the Netherlands Cancer Institute (Amsterdam, The Netherlands) (van 't Veer, Dai et al. 2002; van de Vijver, He et al. 2002). According to this test, patients are divided into two risk groups, with different prognosis, by measuring the cosine correlation between the 70-gene expression profile of each individual patient to the original signature developed, according to a pre-specified threshold.

The CE-marked, FDA cleared assay is offered by the certified (QSR/GMP, ISO 17025, CLIA (#99D1030869) and CAP) Agendia laboratory (Amsterdam, The Netherlands), with the following features:

  • The assay is performed from fresh (non-FFPE) specimens;

  • A validated sampling and transportation method of fresh tissue on ambient temperature;

  • A histologic review of the shipped specimens;

  • RNA extraction and quality evaluation prior to microarray analysis;

  • Triplicate gene expression measurements and duplicate sample measurements, in a dye-swap design;

  • Use of a constant, standardized reference RNA in each hybridization;

The MammaPrint test is offered to patients who meet the following criteria:

  • Below age 61;

  • Stage I invasive breast cancer with ER positive or ER negative;

  • Stage II invasive breast cancer with ER positive or ER negative and lymph node negative;

  • Tumor size less than 5 cm.

Breast Cancer Profiling (BCP or H/I ratio)

The Breast Cancer Profiling (BCP) assay is based on the two-gene expression index (HOXB13/IL17BR) developed by Ma and colleagues (Ma, Wang et al. 2004; Ma, Hilsenbeck et al. 2006). Gene expression levels for the two genes are measured by real time RT-PCR, normalized to a specific set of reference genes, prior the index computation. The two-gene index is a continuous marker of recurrence risk in untreated ER-positive, node negative patients. This assay is licensed by AviaraDX to Quest Diagnostic, and it is offered as a laboratory service, with the following features:

  • The assay is performed from FFPE specimens;

  • Laser capture microdissection is performed if the specimen content is <30% cancer cells;

  • RNA preparation and quality evaluation;

  • Real-time PCR analysis of HOXB13 and IL-17BR gene expression;

  • Formulation of the normalized two-gene expression index;

  • Result formulation with 5-year recurrence risk;

The BCP assay is offered to patients who meet the following criteria:

  • Treatment-naïve individuals with ER-positive/lymph node-negative breast cancer

Bibliography
Alizadeh A, Eisen M. et al. The lymphochip: a specialized cDNA microarray for the genomic-scale analysis of gene expression in normal and malignant lymphocytes. Cold Spring Harb Symp Quant Biol. (1999); 64: 718. [PubMed]
Alizadeh AA, Ross DT. et al. Towards a novel classification of human malignancies based on gene expression patterns. J Pathol. (2001); 195(1): 4152. [PubMed]
Baldwin D, Crane V. et al. A comparison of gel-based, nylon filter and microarray techniques to detect differential RNA expression in plants. Curr Opin Plant Biol. (1999); 2(2): 96103. [PubMed]
Braxton S, Bedilion T. The integration of microarray information in the drug development process. Curr Opin Biotechnol. (1998); 9(6): 6439. [PubMed]
Chu S, DeRisi J. et al. The transcriptional program of sporulation in budding yeast. Science. (1998); 282(5389): 699705. [PubMed]
DeRisi J, Penland L. et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet. (1996); 14(4): 45760. [PubMed]
Drobyshev A, Mologina N. et al. Sequence analysis by hybridization with oligonucleotide microchip: identification of beta-thalassemia mutations. Gene. (1997); 188(1): 4552. [PubMed]
Hu GK, Madore SJ. et al. Predicting splice variant from DNA chip expression data. Genome Res. (2001); 11(7): 123745. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Jain KK. Applications of biochip and microarray systems in pharmacogenomics. Pharmacogenomics. (2000); 1(3): 289307. [PubMed]
Larsen LA, Christiansen M. et al. Recent developments in high-throughput mutation screening. Pharmacogenomics. (2001); 2(4): 38799. [PubMed]
Lieb JD, Liu X. et al. Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association. Nat Genet. (2001); 28(4): 32734. [PubMed]
Lockhart DJ, Winzeler EA. Genomics, gene expression and DNA arrays. Nature. (2000); 405(6788): 82736. [PubMed]
Ma XJ, Hilsenbeck SG. et al. The HOXB13:IL17BR expression index is a prognostic factor in early-stage breast cancer. J Clin Oncol. (2006); 24(28): 46119. [PubMed]
Ma XJ, Wang Z. et al. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell. (2004); 5(6): 60716. [PubMed]
Meltzer PS. Spotting the target: microarrays for disease gene discovery. Curr Opin Genet Dev. (2001); 11(3): 25863. [PubMed]
Mirnics K. Microarrays in brain research: the good, the bad and the ugly. Nat Rev Neurosci. (2001); 2(6): 4447. [PubMed]
Mirnics K, Middleton FA. et al. Analysis of complex brain disorders with gene expression microarrays: schizophrenia as a disease of the synapse. Trends Neurosci. (2001); 24(8): 47986. [PubMed]
Paik S, Shak S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. (2004); 351(27): 281726. [PubMed]
Rew DA. DNA microarray technology in cancer research. Eur J Surg Oncol. (2001); 27(5): 5048. [PubMed]
Sapolsky RJ, Hsie L. et al. High-throughput polymorphism screening and genotyping with high-density oligonucleotide arrays. Genet Anal. (1999); 14(56): 18792. [PubMed]
Schena M, Heller RA. et al. Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. (1998); 16(7): 3016. [PubMed]
Schulze A, Downward J. Navigating gene expression using microarrays—a technology review. Nat Cell Biol. (2001); 3(8): E1905. [PubMed]
Shoemaker DD, Schadt EE. et al. Experimental annotation of the human genome using microarray technology. Nature. (2001); 409(6822): 9227. [PubMed]
Spellman PT, Sherlock G. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. (1998); 9(12): 327397. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
van 't Veer LJ, Dai H. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. (2002); 415(6871): 5306. [PubMed]
van Berkum NL, Holstege FC. DNA microarrays: raising the profile. Curr Opin Biotechnol. (2001); 12(1): 4852. [PubMed]
van de Vijver MJ, He YD. et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. (2002); 347(25): 19992009. [PubMed]
Watson A, Mazumder A. et al. Technology for microarray analysis of gene expression. Curr Opin Biotechnol. (1998); 9(6): 60914. [PubMed]

Appendix E: Technical Experts and Peer Reviewers

Technical Experts and Peer Reviewers

  • Katrina Armstrong, M.D., M.S.C.E.*

  • Director of Research

  • Leonard Davis Institute of Health Economics

  • University of Pennsylvania School of Medicine

  • Philadelphia, PA

  • Richard A. Bender, M.D., F.A.C.P.

  • Medical Director-Oncology

  • Quest Diagnostics Nichols Institute-SJC

  • San Juan Capistrano, CA

  • Lyndsay N. Harris, M.D.

  • Assoc Prof Med

  • Dir Breast Cancer Disease Unit and Int Med

  • Medical Oncology

  • 333 Cedar St

  • New Haven, CT

  • Daniel F. Hayes, M.D.

  • University of Michigan

  • Comprehensive Cancer Center

  • Ann Arbor, MI

  • Charles Perou, Ph.D.

  • University of North Carolina, Chapel Hill

  • Department of Genetics

  • Chapel Hill, NC

  • Kathryn Phillips, Ph.D.*

  • Prof. of Health Economics and Health Services Research

  • School of Pharmacy, Institute for Health Policy Studies, and UCSF Comprehensive Cancer Center

  • University of California, San Francisco

  • San Francisco, CA

  • Margaret Piper, M.P.H., Ph.D.*

  • Associate Director

  • Blue Cross/Blue Shield Association

  • Technology Evaluation Center

  • Steve Teutsch, M.D., M.P.H.*

  • Executive Director of Outcomes Research

  • Merck & Co., Inc.

  • West Point, PA

  • Steven Shak, M.D.

  • Chief Medical Officer

  • Genomic Health

  • Redwood City, CA

  • Richard Simon, D.Sc.

  • NCI

  • Chief, Biometric Research Branch

  • Rockville, MD

  • Laura van't Veer, Ph.D.

  • Netherlands Cancer Institute

  • Amsterdam

  • The Netherlands

Internal Technical Experts

  • Giovanni Parmigianni, M.S., Ph.D.

  • Johns Hopkins University

  • School of Medicine, Oncology Bioinformatics

  • 550 Building, 11-03

  • Baltimore, MD

  • Antonio C. Wolf, M.D.

  • Johns Hopkins University

  • School of Medicine, Oncology Center

  • CRB 189

  • 1650 Orleans Street

  • Baltimore, MD

Appendix F: Detailed Electronic Database Search Strategies

MEDLINE Strategy

(((“breast neoplasms”[mh] OR “breast cancer”[tiab] OR (breast[tiab] AND neoplasm[tiab])) AND ((Gene[tiab] AND expression[tiab]) OR “gene expression profiling”[mh] OR “gene expression”[mh]) AND 1990 : 2007[dp] AND Eng[lang]) NOT((animals[mh]NOT humans[mh]) OR review[pt])) NOT Tumor Cells, Cultured[mh]3356

Cochrane Library (Reviews and CENTRAL) Strategy

(“breast neoplasms” or “breast cancer”:ti,ab,kw or (breast AND cancer):ti,ab,kw or (breast AND neoplasm):ti,ab,kw) AND (“gene expression profiling” or “gene expression” or “gene expression” AND profiling:ti,ab,kw or “gene expression” AND (test OR tesing):ab)55

EMBASE Strategy

((((‘breast tumor’/exp) OR (breast:ti,ab AND cancer:ti,ab)) AND (((‘gene expression’/exp) OR (‘gene expression profiling’/exp)) OR (‘gene expression’:ab,ti AND profiling:ab,ti))) NOT ((‘cell culture’/exp) OR (‘validation study’/exp) OR (apoptosis:ab,ti) OR (‘cell death’:ab,ti) OR (transcriptional:ti,ab AND mechanism:ti,ab) OR (transcriptional:ti,ab AND machinery:ti,ab)) AND [english]/lim AND [humans]/lim AND [1990-2007]/py) NOT (review:it)7531

CINAHL

((MH “breast neoplasms” or TX “breast cancer” ) OR (TX ( Breast AND cancer ) or TX ( breast AND neoplasm ))) AND (TX “gene expression profiling” or TX ( “gene expression” AND profiling ) or TX ( gene AND profiling ) )73

MEDLINE (targeted authors search)

(((van't veer LJ[au] OR Dai H[au] OR van de vijver MJ[au] OR He YD[au] OR Hart AM[au] OR Hart AA[au] OR Mao M[au] OR Peterse HL[au] OR van der kooy K[au] OR Marton MJ[au] OR Witteveen AT[au] OR Schreiber GJ[au] OR Kerkhoven RM[au] OR Roberts C[au] OR Linsley PS[au] OR Bernards R[au] OR Friend SH[au] OR Voskuil DW[au] OR Parrish M[au] OR Atsma D[au] OR Witteveen A[au] OR Glas A[au] OR Delahaye L[au] OR van der velde T[au] OR Bartelink H[au] OR Rodenhuis S[au] OR Rutgers ET[au]) OR (paik S[au] OR shak S[au] OR Tang G[au] OR Kim C[au] OR Baker J[au] OR Cronin M[au] OR baehner FL[au] OR walker MG[au] OR Watson D[au] OR Park T[au] OR Hiller W[au] OR Fisher ER[au] OR Wickerham DL[au] OR Bryant J[au] OR Wolmark N[au]) OR (Ma XJ[au] OR Wang Z[au] OR Ryan PD[au] OR Isakoff SJ[au] OR Barmettler A[au] OR Fuller A[au] OR Muir B[au] OR Mohapatra G[au] OR Salunga R[au] OR Tuggle JT[au] OR Tran Y[au] OR tran D[au] OR Tassin A[au] OR Amon P[au] OR Wang W[au] OR Enright E[au] OR Stecker K[au] OR Estepa-Sabal E[au] OR Smith B[au] OR Younger J[au] OR Balis U[au] OR Michaelson J[au] OR bhan A[au] OR Habin K[au] OR Baer TM[au] OR Brugge J[au] OR Haber AH[au] OR Erlander MG[au] OR Sgroi DC[au])) AND gene[tw]) AND (((Gene[tiab] AND expression[tiab]) OR “gene expression profiling”[mh] OR “gene expression”[mh]) AND 1990: 2007[dp] AND Eng[lang] NOT (animals[mh] NOT humans[mh]) OR review[pt])1947

Appendix G: Title Review Forms

van 't Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002;415(6871):530–6.

Does this article POTENTIALLY apply to the key questions?

() POTENTIALLY eligible

() INELIGIBLE

Abstract Review Form

  • 1

    Record ID: 1781

van 't Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002;415(6871):530–6.

ABSTRACT: Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. The strongest predictors for metastases (for example, lymph node status and histological grade) fail to classify accurately breast tumours according to their clinical behaviour. Chemotherapy or hormonal therapy reduces the risk of distant metastases by approximately one-third; however, 70–80% of patients receiving this treatment would have survived without it. None of the signatures of breast cancer gene expression reported to date allow for patient-tailored therapy strategies. Here we used DNA microarray analysis on primary breast tumours of 117 young patients, and applied supervised classification to identify a gene expression signature strongly predictive of a short interval to distant metastases (‘poor prognosis’ signature) in patients without tumour cells in local lymph nodes at diagnosis (lymph node negative). In addition, we established a signature that identifies tumours of BRCA1 carriers. The poor prognosis signature consists of genes regulating cell cycle, invasion, metastasis and angiogenesis. This gene expression profile will outperform all currently used clinical parameters in predicting disease outcome. Our findings provide a strategy to select patients who would benefit from adjuvant therapy.

Should this article be REVIEWED? (choose one)

[1] YES:indicate the questions that this article might apply to (below)

This article potentially applies to the following key questions(Choose all that apply)

  • 1

    What is the direct evidence that the Mammaprint or OnctotypeDX gene expression profiling tests in women diagnosed with breast cancer (or any specific subset of this population) lead to improvement in outcomes?

  • 2

    What are the sources of and contributions to analytic variability in these two gene expression-based prognostic estimators for women diagnosed with breast cancer?

  • 3

    What is the clinical validity of these tests in women diagnosed with breast cancer?

    • a

      How well does this testing predict recurrence rates for breast cancer compared to standard prognostic approaches? Specifically, how much do these tests add to currently known factors or combination indices that predict the probability of breast cancer recurrence, (e.g., tumor type or stage, age, estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) status)?

    • b

      Are there any other factors, which may not be components of standard predictors of recurrence (e.g., race/ethnicity or adjuvant therapy), that affect the clinical validity of these tests, and thereby generalizability of results to different populations?

  • 4

    What is the clinical utility of these tests?

    • a

      To what degree do the results of these tests predict the response to chemotherapy, and what factors affect the generalizability of that prediction?

    • b

      What are the effects of using these two tests and the subsequent management options on the following outcomes: testing or treatment related psychological harms, testing or treatment related physical harms, disease recurrence, mortality, utilization of adjuvant therapy, and medical costs.

    • c

      What is known about the utilization of Mammaprint and OncotypeDX gene expression profiling in women diagnosed with breast cancer in the United States?

    • d

      What projections have been made in published analyses about the cost-effectiveness of using Mammaprint and OncotypeDX gene expression profiling in women diagnosed with breast cancer?

[2] Unclear/No abstract (promote to article review)

[3] NOT eligible (exclude):indicate reason for exclusion (below)

Reason for EXCLUSION? (choose any that apply)

[1] Study applies only to breast cancer biology

[2] Study only applies to single or multiple gene predictors and does not involve OncotypeDX or Mammaprint profiles

[3] Does not involve OncotypeDX or Mammaprint gene expression profiling tests

[4] Does not involve original data or original data analysis

[5] Does not involve women

[6] Does not involve breast cancer patients

[7] Not English language

[8] Does not apply to the key questions

[9] OTHER______________

[10] Unclear

[4] No, may be useful for BACKGROUND material (pull for hand searching If publish in 2002 or later)

Article Review Form

ARTICLE inclusion/exclusion

Record ID: 750

Reid, J. F., Lusa, L., De Cecco, L., Coradini, D., Veneroni, S., Daidone, M. G., Gariboldi, M., and Pierotti, M. A. Limits of predictive models using microarray data for breast cancer clinical treatment outcome. Journal of the National Cancer Institute 2005;97(12):927–30.

ABSTRACT:

Should this article be REVIEWED? (choose one)

[1] YES:indicate the questions that this article might apply to (below)

This article potentially applies to the following key questions(Choose all that apply)

  • 1

    What is the direct evidence that the Mammaprint or OnctotypeDX gene expression profiling tests in women diagnosed with breast cancer (or any specific subset of this population) lead to improvement in outcomes?

  • 2

    What are the sources of and contributions to analytic variability in these two gene expression-based prognostic estimators for women diagnosed with breast cancer?

  • 3

    What is the clinical validity of these tests in women diagnosed with breast cancer?

    • a

      How well does this testing predict recurrence rates for breast cancer compared to standard prognostic approaches? Specifically, how much do these tests add to currently known factors or combination indices that predict the probability of breast cancer recurrence, (e.g., tumor type or stage, age, estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) status)?

    • b

      Are there any other factors, which may not be components of standard predictors of recurrence (e.g., race/ethnicity or adjuvant therapy), that affect the clinical validity of these tests, and thereby generalizability of results to different populations?

  • 4

    What is the clinical utility of these tests?

    • a

      To what degree do the results of these tests predict the response to chemotherapy, and what factors affect the generalizability of that prediction?

    • b

      What are the effects of using these two tests and the subsequent management options on the following outcomes: testing or treatment related psychological harms, testing or treatment related physical harms, disease recurrence, mortality, utilization of adjuvant therapy, and medical costs.

    • c

      What is known about the utilization of Mammaprint and OncotypeDX gene expression profiling in women diagnosed with breast cancer in the United States?

    • d

      What projections have been made in published analyses about the cost-effectiveness of using Mammaprint and OncotypeDX gene expression profiling in women diagnosed with breast cancer?

[2] Unclear/No abstract (promote to article review)

[3] NOT eligible (exclude):indicate reason for exclusion (below)

Reason for EXCLUSION? (choose any that apply)

[1] Study applies only to breast cancer biology

[2] Study only applies to single or multiple gene predictors and does not involve OncotypeDX or Mammaprint profiles

[3] Does not involve OncotypeDX or Mammaprint gene expression profiling tests

[4] Does not involve original data or original data analysis

[5] Does not involve women

[6] Does not involve breast cancer patients

[7] Not English language

[8] Does not apply to the key questions

[9] OTHER______________

[10] Unclear

[4] No, may be useful for BACKGROUND material (pull for hand searching If publish in 2002 or later)

Data Abstraction Tables

Population Characteristics

Study, YearInterventionGeneral CharacteristicsDiagnosis(es)Treatments and Outcomes

Study Design

Study, YearCountryStudy period (data collection period)Study TypePopulation size, NBlinded (Y/N)Study purpose

Data Extraction Tables

Clinical Validity/Utility

Study, yearContextMethodsResultsConclusions

Analytic Validity

Study, yearMeasureConclusions

Quality Assessment Matrix

SectionMeasure
Patients
  • Describes population characteristics

  • Describes participant recruitment

  • Describes participant sampling

  • Describes inclusion/exclusion criteria

  • Describes treatments received

  • Describes randomization.

Materials and Methods
  • Describes the reference standard.

  • Describes technical specifications of material and methods involved.

  • Describes type of biological material used (including control samples).

  • Includes definition of and rationale for the units, cutoffs and/or categories of the results of the index tests and the reference standard.

  • Describes blinding.

  • Describes methods for calculating or comparing measures.

  • Describes methods for calculating test reproducibility.

  • Describes methods of preservation and storage

  • Specifies the assay method used and provides (or references) a detailed protocol, including specific reagents or kits used, quality control procedures, reproducibility assessments, quantitation methods, and scoring and reporting protocols.

Results
  • Describes the flow of patients through the study, including the number of patients included in each stage of the analysis(both overall and for each subgroup extensively examined).

  • Describes distributions of basic demographic characteristics (at least age and sex), standard (disease-specific) prognostic variables, and tumor marker, including numbers of missing values.

  • Presents univariate analyses showing the relation between the marker and outcome, with the estimated effect (e.g. hazard ratio and survival probability).

  • Provides similar analyses for all other variables being analyzed (for the effect of a tumor marker on a time-to-event outcome, a Kaplan-Meier plot is recommended).

  • For key multivariable analyses, report estimated effects (e.g. hazard ratio) with confidence intervals for the marker and, at least for the final model.

  • Provides estimated effects with confidence intervals from an analysis in which the marker and standard prognostic variables are included, regardless of their significance.

Appendix H: Excluded Articles

References

Blanchard A, Shiu R, Booth S. et al. Gene expression profiling of early involuting mammary gland reveals novel genes potentially relevant to human breast cancer. Front Biosci. 2007; 12: 222132. [PubMed]
Charafe-Jauffret E, Ginestier C, Monville F. et al. Gene expression profiling of breast cell lines identifies potential new basal markers. Oncogene. 2006; 25(15): 227384. [PubMed]
Dolled-Filhart M, Ryden L, Cregger M. et al. Classification of breast cancer using genetic algorithms and tissue microarrays. Clin Cancer Res. 2006; 12(21): 645968. [PubMed]
Drubin D, Smith JS, Liu W. et al. Comparison of cryopreservation and standard needle biopsy for gene expression profiling of human breast cancer specimens. Breast Cancer Res Treat. 2005; 90(1): 936. [PubMed]
Eden P, Ritz C, Rose C. et al. “Good Old” clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers. Eur J Cancer. 2004; 40(12): 183741. [PubMed]
Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Nat Acad Sci U S A. 2006; 103(15): 59238.
Espinosa E, Vara JA, Redondo A. et al. Breast cancer prognosis determined by gene expression profiling: a quantitative reverse transcriptase polymerase chain reaction study. J Clin Oncol. 2005; 23(29): 727885. [PubMed]
Fielden MR, Chen I, Chittim B. et al. Examination of the estrogenicity of 2,4,6,2′,6′-pentachlorobiphenyl (PCB 104), its hydroxylated metabolite 2,4,6,2′,6′-pentachloro-4-biphenylol (HO-PCB 104), and a further chlorinated derivative, 2,4,6,2′,4′,6′-hexachlorobiphenyl (PCB 155). Environ Health Perspect. 1997; 105(11): 123848. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Girault I, Lerebours F, Amarir S. et al. Expression analysis of estrogen receptor (alpha) coregulators in breast carcinoma: Evidence that NCOR1 expression is predictive of the response to tamoxifen. Clin Cancer Res. 2003; 9(4): 12591266. [PubMed]
Gradishar WJ. Hormone therapy in postmenopausal women with breast cancer. Adv Stud Med. 2005; 5(9 B): S817S822.
Jansen MP, Foekens JA, van Staveren IL. et al. Molecular classification of tamoxifen-resistant breast carcinomas by gene expression profiling. J Clin Oncol. 2005; 23(4): 73240. [PubMed]
Kaklamani V. A genetic signature can predict prognosis and response to therapy in breast cancer: Oncotype DX. Expert Rev Mol Diag. 2006; 6(6): 8039.
Kominsky SL. Claudins: Emerging targets for cancer therapy. Expert Rev Mol Med. 2006; 8(18): 111. [PubMed]
Kroll T, Odyvanova L, Clement JH. et al. Molecular characterization of breast cancer cell lines by expression profiling. J Cancer Res Clin Oncol. 2002; 128(3): 125134. [PubMed]
Lipka C, Mankertz J, Fromm M. et al. Impairment of the antiproliferative effect of glucocorticosteroids by 11(beta)-hydroxysteroid dehydrogenase type 2 overexpression in MCF-7 breast-cancer cells. Horm Metabol Res. 2004; 36(7): 437444.
Master SR, Chodosh LA. Evolving views of involution. Breast Cancer Res. 2004; 6(2): 8992. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Mauriac L, Debled M, MacGrogan G. When will more useful predictive factors be ready for use? Breast. 2005; 14(6): 617623. [PubMed]
Molist R, Remvikos Y, Dutrillaux B. et al. Characterization of a new cytogenetic subtype of ductal breast carcinomas. Oncogene. 2004; 23(35): 59865993. [PubMed]
Naderi A, Teschendorff AE, Barbosa-Morais NL. et al. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene. Mar 1;26(10): 150716. [PubMed]
Nuyten DS, Kreike B, Hart AA. et al. Predicting a local recurrence after breast-conserving therapy by gene expression profiling. Breast Cancer Res. 2006; 8(5): R62. [PubMed]
Oehr P. Proteomics as a tool for detection of nuclear matrix proteins and new biomarkers for screening of early tumors stage. Anticancer Res. 2003; 23(2 A): 805812. [PubMed]
Paik S, Kim CY, Song YK. et al. Technology insight: Application of molecular techniques to formalin-fixed paraffin-embedded tissues from breast cancer. Nat Clin Pract Onco. 2005; 2(5): 24654.
Pusztai L. Oncogenomics 2005 - Dissecting cancer through Genome Res.: 2–6 February 2005, San Diego, CA, USA. IDrugs. 2005; 8(3): 215218. [PubMed]
Ray ME, Yang ZQ, Albertson D. et al. Genomic and Expression Analysis of the 8p11–12 Amplicon in human breast cancer cell lines. Cancer Res. 2004; 64(1): 4047. [PubMed]
Rha SY, Jeung HC, Yang WI. et al. Alteration of hTERT full-length variant expression level showed different gene expression profiles and genomic copy number changes in breast cancer. Oncol Rep. 2006; 15(4): 749755. [PubMed]
Rodningen OK, Overgaard J, Alsner J. et al. Microarray analysis of the transcriptional response to single or multiple doses of ionizing radiation in human subcutaneous fibroblasts. Radiother Oncol. 2005; 77(3): 23140. [PubMed]
Rody A, Karn T, Gatje R. et al. Gene expression profiles of breast cancer obtained from core cut biopsies before neoadjuvant docetaxel, adriamycin, and cyclophoshamide chemotherapy correlate with routine prognostic markers and could be used to identify predictive signatures. Zentralblatt fur Gynakologi. 2006; 128(2): 7681.
Ross DT, Perou CM. A comparison of gene expression signatures from breast tumors and breast tissue derived cell lines. Dis Markers. 2001; 17(2): 99109. [PubMed]
Rugo HS. Oncotype DX predicts tamoxifen benefits in ER+ breast cancer. Comment. Oncol Rep. 2005;-(FALL):18–19.
Schneider J, Buness A, Huber W. et al. Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments. BMC Genomics. 2004; 5(1): 29. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Sengupta K, Banerjee S, Saxena NK. et al. Differential expression of VEGF-A mRNA by 17(beta)-estradiol in breast tumor cells lacking classical ER-(alpha) may be mediated through a variant form of ER-(alpha). Mol Cell Biochem. 2004; 262(12): 215224. [PubMed]
Severgnini M, Bicciato S, Mangano E. et al. Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment. Anal Biochem. 2006; 353(1): 4356. [PubMed]
Sgroi DC, Haber DA, Ryan PD. et al. RE: A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell. 2004; 6(5): 445. [PubMed]
Sun Y, Goodison S, Li J. et al. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2007; 23(1): 307. [PubMed]
Timar J, Ladanyi A, Petak I. et al. Molecular pathology of tumor metastasis III. Target array and combinatorial therapies. Pathology Oncology Research. 2003; 9(1): 4972. [PubMed]
Verlinden I, Gungor N, Wouters K. et al. Parity-induced changes in global gene expression in the human mammary gland. Eur J Cancer Prev. 2005; 14(2): 12937. [PubMed]
Weigelt B, Hu Z, He X. et al. Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. Cancer Res. 2005; 65(20): 91558. [PubMed]
Weil MR, Widlak P, Minna JD. et al. Global survey of chromatin accessibility using DNA microarrays. Genome Res. 2004; 14(7): 13741381. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Weisz A, Basile W, Scafoglio C. et al. Molecular identification of ERalpha-positive breast cancer cells by the expression profile of an intrinsic set of estrogen regulated genes. J Cell Physiol. 2004; 200(3): 440450. [PubMed]
Welsh J, Wietzke JA, Zinser GM. et al. Vitamin D-3 receptor as a target for breast cancer prevention. J Nutr. 2003; 133(7 SUPPL.): 2425S2433S. [PubMed]
West RB, Nuyten DS, Subramanian S. et al. Determination of stromal signatures in breast carcinoma. PLoS Biol. 2005; 3(6): e187. [PubMed]
Yared MA, Middleton LP, Bernstam FM. et al. Expression of c-kit proto-oncogene product in breast tissue. Breast J. 2004; 10(4): 323327. [PubMed]
Zhang JY, Casiano CA, Peng XX. et al. Enhancement of antibody detection in cancer using panel of recombinant tumor-associated antigens. Cancer Epidemiol Biomark Prev. 2003; 12(2): 136143.

Appendix I: Evidence Tables

Footnotes
a

Appendixes cited in this report are provided electronically at: http://www.ahrq.gov/clinic/tp/brcgenetp.htm

1

Appendixes cited in this report are provided electronically at: http://www.ahrq.gov/clinic/tp/brcgenetp.htm

*

Evaluation of Genomic Applications in Practice and Prevention (EGAPP) working group member.

Help ǀ Contact Bookshelf
AHRQ Evidence Reports
(navigation arrows) Go to previous chapter Go to next chapter Go to top of this page Go to bottom of this page Go to Table of Contents