The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The Centers for Disease Control and Prevention (CDC) requested and provided funding for this report. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.
To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.
AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.
We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by e-mail to epc@ahrq.gov.
Carolyn M. Clancy, M.D.
Director
Agency for Healthcare Research and Quality
Julie Louise Gerberding, M.D., M.P.H.
Director
Centers for Disease Control and Prevention
Jean Slutsky, P.A., M.S.P.H.
Director, Center for Outcomes and Evidence
Agency for Healthcare Research and Quality
Beth A. Collins Sharp, Ph.D., R.N.
Director, EPC Program
Agency for Healthcare Research and Quality
Gurvaneet Randhawa, M.D., M.P.H.
EPC Program Task Order Officer
Agency for Healthcare Research and Quality
The authors gratefully acknowledge R. Julian Irvine for assistance with project management and copy-editing, Udita Patel for assistance with project management, and Dr. Gurvaneet Randhawa, AHRQ Task Order Officer, for overall assistance.
The authors also acknowledge the contributions of Dr. Ralph Coates of the Centers for Disease Control and Prevention's (CDC) Division of Cancer Prevention and Control and Dr. Linda Bradley in the CDC's Office of Genomics.
Objective: To assess the evidence that the use of genomic tests for ovarian cancer screening, diagnosis, and treatment leads to improved outcomes.
Data Sources: MEDLINE® and reference lists of recent reviews.
Review Methods: We evaluated tests for: (a) single gene products; (b) genetic variations affecting risk of ovarian cancer; (c) gene expression; and (d) proteomics. For tests covered in recent evidence reports (cancer antigen 125 [CA-125] and breast cancer genes 1 and 2 [BRCA1/2]), we added studies published subsequent to the reports. We sought evidence on: (a) the analytic performance of tests in clinical laboratories; (b) the sensitivity and specificity of tests in different patient populations; (c) the clinical impact of testing in asymptomatic women, women with suspected ovarian cancer, and women with diagnosed ovarian cancer; (d) the harms of genomic testing; and (e) the impact of direct-to-consumer and direct-to-physician advertising on appropriate use of tests. We also constructed a computer simulation model to test the impact of different assumptions about ovarian cancer natural history on the relative effectiveness of different strategies.
Results: There are reasonable data on the clinical laboratory performance of most radioimmunoassays, but the majority of the data on other genomic tests comes from research laboratories. Genomic test sensitivity/specificity estimates are limited by small sample sizes, spectrum bias, and unrealistically large prevalences of ovarian cancer; in particular, estimates of positive predictive values derived from most of the studies are substantially higher than would be expected in most screening or diagnostic settings. We found no evidence relevant to the question of the impact of genomic tests on health outcomes in asymptomatic women. Although there is a relatively large literature on the association of test results and various clinical outcomes, the clinical utility of changing management based on these results has not been evaluated. We found no evidence that genomic tests for ovarian cancer have unique harms beyond those common to other tests for genetic susceptibility or other tests used in screening, diagnosis, and management of ovarian cancer. Studies of a direct-to-consumer campaign for BRCA1/2 testing suggest increased utilization, but the effect on “appropriateness” was unclear. Model simulations suggest that annual screening, even with a highly sensitive test, will not reduce ovarian cancer mortality by more than 50 percent; frequent screening has a very low positive predictive value, even with a highly specific test.
Conclusions: Although research remains promising, adaptation of genomic tests into clinical practice must await appropriately designed and powered studies in relevant clinical settings.
Ovarian cancer is the leading cause of cancer death from gynecologic malignancies in the United States, with an annual incidence of over 25,000 and an annual mortality of approximately 14,000. Cancer incidence increases dramatically with age.
The high case-fatality rate has largely been attributed to the fact that most ovarian cancers are diagnosed in advanced stages (Stage III, where the cancer has spread beyond the pelvis to the organs of the upper abdominal cavity, and Stage IV, where the cancer has spread outside the peritoneal cavity). Stage I cancer (limited to the ovaries) has a survival rate of over 90 percent.
There are five potential strategies for prevention of the morbidity and mortality from ovarian cancer. One is primary prevention through either medical or surgical therapy in the general population. Although observational studies suggest that the risk of developing ovarian cancer is reduced in women who used oral contraceptives or underwent tubal ligation, there are no prospective trials to allow estimation of the risks and benefits of these options specifically for ovarian cancer prevention. Although in theory prophylactic oophorectomy at the time of hysterectomy for other diseases should almost eliminate the chances of developing ovarian cancer, there are also no prospective studies of the benefits of this approach, and a recent decision analysis suggested that the harms in terms of other effects might outweigh the benefits. An alternative strategy for primary prevention is identifying groups of women at particularly high risk of developing ovarian cancer, and then using primary prevention strategies. Observational studies suggest that use of oral contraceptives reduces risk of ovarian cancer in women with inherited predisposition to ovarian cancer, but this has not been tested prospectively. Prophylactic oophorectomy does appear to reduce the risk of ovarian cancer in high-risk groups.
Another strategy for prevention of ovarian cancer mortality is screening to detect early stage cancers, either in the general population or in high-risk groups. To date, screening using the available technologies of physical examination, ultrasound, and/or cancer antigen 125 (CA-125) has not been shown to be effective in either situation.
Finally, use of targeted therapy based on the results of tests may identify subgroups of patients for whom specific therapies are likely to be effective; for example, identification of overexpression of human epidermal growth factor 2 (HER 2) in some breast cancers has led to improved survival with the use of a monoclonal antibody targeted against the receptor. To date, similar breakthroughs have not occurred in ovarian cancer.
Continued developments in technology have led to rapidly expanding knowledge about genes, gene expression, and protein patterns in a variety of disease processes. Because currently available strategies for the prevention of ovarian cancer have not proven as effective as interventions targeted against other cancers in women, there has been tremendous interest in using the tools of genomics and proteomics to identify potential new markers which can be used in any of the five classes of strategies. Although the term “genomics” has been used in many different ways, for the purposes of this report we define “genomic tests” as one of the following broad categories: (1) tests for the presence or quantity of the product of a single gene - the classic example of this is radioimmunoassay for CA-125; (2) tests for inherited or acquired mutations in genes which convey an increased risk of developing ovarian cancer, or which predict differential responses to therapy - the classic example is testing for polymorphisms of breast cancer genes 1 and 2 (BRCA 1/2); (3) tests for quantitative expression of either single genes or multiple genes - differential patterns of expression between normal patients and ovarian cancer patients may aid in diagnosis and management, or help identify potential new single gene products for evaluation as screening and diagnostic tools; and (4) tests for protein expression, particularly in serum, which identify differential patterns between normal patients and patients with ovarian cancer.
This report focuses on the current evidence for the clinical utility of genomic tests, as defined above, in any of the five potential strategies for reducing ovarian cancer morbidity and mortality. Because evidence on the use of CA-125 for screening and diagnosis of ovarian cancer and the use of BRCA1/2 testing for identification of high-risk patients has been covered in recent evidence reports, we do not review that evidence directly; we do summarize the results of the earlier reports and discuss relevant studies subsequently published. The results of the present report are intended primarily to: (a) provide a resource for the Evaluation of Genomics Applications in Practice and Prevention (EGAPP) project of the Centers for Disease Control and Prevention (CDC); (b) provide a resource for other clinicians and policymakers developing guidelines on the use of genomic tests in ovarian cancer prevention; and (c) provide a resource for researchers and funding agencies in identifying gaps in our knowledge and research priorities.
Working with the Agency for Healthcare Research and Quality (AHRQ), the CDC, the EGAPP working group, and members of the technical expert panel, we refined six research questions to be addressed, using an analytic framework which incorporated probability of developing ovarian cancer, test results, and management based on those tests results.
We searched MEDLINE® (1966–May 2006). Searches of the databases were supplemented by reviews of reference lists of included articles, relevant review articles, and meta-analyses. We also searched the Food and Drug Administration web site for relevant documents. The searches yielded a total of 1,303 citations. Pairs of readers reviewed each abstract and selected 552 articles for full text review. Specific inclusion criteria were developed for each question, and both readers were required to agree on inclusion. After this review, a total of 113 articles were included for abstraction.
We developed tables to abstract each article, and quality criteria were adapted from the evidence report on omega-3 fatty acids for coronary heart disease prevention. For studies of diagnostic test performance, 2-by-2 tables were constructed for each included article, and sensitivity, specificity, and positive and negative predictive values, with 95 percent confidence intervals, were calculated.
We also further refined a Markov model of the natural history of ovarian cancer; the model is able to closely approximate age-specific incidence and mortality from ovarian cancer under two different assumptions about natural history - one that requires a stepwise progression through all four stages of the disease, and one which allows some cancers to spread directly from the ovaries (Stage I) to the upper abdomen (Stage III). The model is then used to estimate the implications of these different assumptions on the relative effectiveness of different prevention strategies.
Question 1: What is the evidence that ovarian cancer genomic tests performed in a typical clinical laboratory actually measure what they are purported to measure? The published data on clinical laboratory performance suggests that currently available radioimmunoassays for single gene products have acceptable reproducibility and reliability, although even this level of variability may have some impact on clinical interpretation of results, especially when comparing relatively small serial changes, or levels close to the discriminatory threshold.
There is insufficient evidence to estimate how newer technologies such as microarrays or protein profiles would perform in a “typical clinical laboratory.”
Question 2: What is the sensitivity and specificity of genomic tests in detecting ovarian cancer in asymptomatic and symptomatic women, including high-risk women? In general, single gene products other than CA-125 have not been shown to be useful in the diagnosis of ovarian cancer, either in symptomatic or asymptomatic women; the sensitivity of CA-125 in screening populations is approximately 80 percent. Small sample sizes, lack of detail on the prediagnosis history of patients, and an unrealistically high prevalence of ovarian cancer in the majority of studies make it difficult to assess how any of these tests would perform in clinical practice.
Estimating the clinical value of more complex tests, using multiple gene and/or protein markers, is even more difficult. Studies of protein expression, in particular, are limited by lack of consensus on appropriate statistical methods, small sample sizes with substantially higher prevalences of ovarian cancer than would be found in the general population, spectrum bias, lack of reproducibility, and uncertainty about the specificity of the biological processes resulting in the observed protein patterns.
Question 3: What is the evidence that genomic testing to detect ovarian cancer in asymptomatic women, including high-risk women, changes clinical management and leads to improved clinical outcomes? We did not identify any evidence on the value of tests other than CA-125 to detect ovarian cancer in asymptomatic women. CA-125 has not been shown to improve ovarian cancer mortality or quality of life; in series of women with mutations of BRCA1 and BRCA2, screening with CA-125 and transvaginal ultrasound does not appear to prevent development of advanced stage ovarian cancer.
Question 4: What is the evidence that genomic testing in women with clinical suspicion of ovarian cancer or with already-diagnosed ovarian cancer changes clinical management and leads to improved health outcomes? Although there is a reasonable amount of data on the association between genomic tests, particularly CA-125, and the likelihood of different clinical outcomes, we did not identify any studies which provided evidence for changes in management leading to improved outcomes based on the results of the tests, other than for CA-125. Based on the results of another evidence report, CA-125 is helpful in distinguishing malignant from benign masses in postmenopausal women.
Question 5: What are the harms of using genomic tests for ovarian cancer prevention and management? The majority of the available literature focuses on BRCA1/2 testing and rarely describes results specifically for ovarian cancer. In the few studies that did, concerns about the risk of ovarian cancer were considerably less than for breast cancer; it is unclear whether testing for genetic markers of ovarian cancer susceptibility alone has different implications compared to testing for genes which affect both breast and ovarian cancer risk.
Conceptually, the harms of testing for genetic susceptibility for ovarian cancer should be no different than testing for genetic susceptibility of other cancers; the main issues are the effectiveness and potential risks of prevention strategies in those who are identified as high risk (primarily the risks of prophylactic oophorectomy), and issues related to reproduction. Similarly, the qualitative harms of the use of genomic tests for screening, diagnosis, and management - the psychological effect of a potential cancer diagnosis, the risks of diagnostic and therapeutic procedures including laparotomy, the harms of a false negative result leading to delayed or inappropriate management - are not conceptually different for genomic tests than for other types of tests, such as imaging; the main difference lies in the quantitative risks of these events, which in turn are determined by the sensitivity and specificity of the test and the pretest probability of disease.
Question 6: Has direct-to-consumer and direct-to-physician marketing of genomic tests for ovarian cancer increased the “appropriate” use of these tests? We identified two studies which compared utilization of BRCA1/2 tests for breast and ovarian cancer susceptibility before and after an advertising campaign; in both cases, utilization was compared in cities where the campaign was put in place to geographically distant cities where there was no formal campaign. The studies suggested increased utilization of testing, and one study found that the positive predictive value of testing declined after the campaign, but there was no way to judge whether the changes in testing were “appropriate.”
The model is able to approximate reported age-specific incidence and mortality from ovarian cancer under both assumptions about natural history. At a given value for test sensitivity, screening was less effective in reducing mortality in a model assuming direct transition from Stage I to Stage III than one assuming that all cancers progress to Stage II prior to Stage III. However, screening frequency was much more important than test sensitivity; even at a test sensitivity of 99 percent, screening frequencies of less than 12 months are needed to reduce ovarian cancer mortality by more than 50 percent. At these high screening frequencies, positive predictive values are less than three percent, even for a test with specificity of 99 percent.
The report did not include non-English publications. We did not formally attempt to estimate pooled sensitivity and specificity for tests because of heterogeneity of study design. Because many of the parameters in the natural history of ovarian cancer are unknown, any model will require assumptions and imputation of key parameters; calibrating a cohort model to cross-sectional data may result in errors in the imputation of these parameters because of unmeasured cohort effects in the cross-sectional data.
Common limitations of the literature included: failure to adequately describe relevant patient characteristics; small sample size with subsequent wide confidence intervals for estimates of sensitivity and specificity; unrealistically high prevalences of ovarian cancer; a spectrum of disease severity which does not reflect screening populations; lack of reproducibility for complex statistical algorithms; potentially inappropriate choices for cases and controls in initial developmental studies; and underlying assumptions about the natural history of ovarian cancer that may not reflect the actual biology of the disease.
Research priorities include:
A minimal consensus data set on key patient characteristics, with results presented with stratification by those characteristics as appropriate;
Consensus reporting of key laboratory performance characteristics such as reproducibility, with estimates of the impact of reproducibility on test performance in practice;
Documentation of the effect of any biological variability in test results within subjects on interpretation of results, especially for tests designed to be used in a serial fashion;
Better characterization of true “negative” results, with documentation of followup;
Evaluation of tests in realistic clinical situations, especially with regards to pretest probability;
Explicit evaluation of the effect of management changes based on test results on patient outcomes; and
Better understanding of the natural history of ovarian cancer in order to help prioritize research into better prevention strategies.
Despite intensive research efforts, ovarian cancer remains a leading cause of cancer death in women, and efforts at reducing its impact have been noticeably less successful than those for other cancers in women.
The prospect of new strategies for the prevention of ovarian cancer morbidity and mortality based on greater understanding of the molecular biology of the disease is exciting; unfortunately, we did not find any evidence that currently available tests have had a substantial impact on improving patient outcomes. Our modeling work suggests that the natural history of ovarian cancer may make substantial mortality reductions difficult using a strategy based primarily on screening. Although research remains promising, adaptation of genomic tests into clinical practice must await appropriately designed and powered studies in relevant clinical settings.
Malignant tumors of the ovary can either arise in the ovary (primary ovarian cancer) or be the result of metastasis from another site, such as the breast or colon. Primary ovarian tumors, whether benign or malignant, can arise from three broad types of cells: the cells on the surface (epithelial cells); the cells that form eggs (germ cells); and the cells surrounding the eggs, including the cells that produce ovarian hormones (sex cord-stromal cells). Epithelial tumors are the most common type, accounting for 60 percent of all ovarian tumors and up to 90 percent of primary cancers. Sex cord-stromal tumors account for 10 to 15 percent of all tumors, while germ cell tumors account for 25 percent of tumors. In general, sex cord-stromal tumors and germ cell tumors are relatively more common in younger premenopausal women. Thus, although ovarian cancer is relatively rare in younger women, when it does occur it is more likely to be a non-epithelial cancer than cancers in postmenopausal women.4
Within the broad classification of epithelial, sex cord-stromal, and germ cell tumors, tumors are further classified by the individual cell types from which the tumor is derived. For example, the most common epithelial tumors are serous and mucinous tumors, the most common sex-cord stromal tumors are fibromas (arising from the connective tissue surrounding eggs), and the most common germ cell tumors are teratomas. Within each histological class, tumors can be benign or malignant, based on their ability to metastasize.4
Some epithelial tumors are classified as “borderline” or “low malignant potential” (LMP) tumors. These are tumors in which there is no invasion into the ovarian stroma, but histologic evidence of proliferation (increased cell division, changes in the appearance of the cell nucleus). There is controversy over whether these tumors represent preinvasive cancer, and, if untreated, would go on to become a cancer, or whether they represent a subtype of tumor which has a relatively small chance of becoming a cancer.4 In estimating the diagnostic accuracy of tests for determining whether a mass is benign or malignant, whether one classifies LMP tumors as benign or malignant can have an effect on the estimates of test performance, as we will discuss later in the report.
Ovarian cancer spreads primarily by dissemination throughout the peritoneal cavity; common sites of metastasis are the small and large bowel, the omentum, the liver, and the diaphragm. Spread to retroperitoneal lymph nodes is also common.
Treatment for ovarian cancer consists of surgical removal of the ovaries, fallopian tubes, and uterus (if present), along with as much metastatic disease as possible; if there is no obvious spread beyond the ovaries, the lymph nodes are sampled to determine if there has been lymphatic metastasis. Surgery is followed by chemotherapy, with responsiveness to chemotherapy depending on the amount of tumor left after surgical removal and the cell type of tumor, among other factors.4
The high case-fatality rate observed in ovarian cancer has largely been attributed to the fact that most ovarian cancers are diagnosed in advanced stages (Stages III, where the cancer has spread beyond the pelvis to organs of the upper abdominal cavity, and IV, where the cancer has spread outside of the peritoneal cavity), when survival is poor. Stage I cancer (limited to the ovaries) has a survival rate of over 90 percent. Thus, there has long been a clinical and research emphasis on identifying methods for early detection of ovarian cancer, under the rationale that increasing the proportion of cancers detected in early stages will lead to decreases in morbidity and mortality.
Conceptually, there are five basic strategies for reducing ovarian cancer morbidity and mortality; we briefly review the rationale for each below.
Primary prevention can be achieved either through medical or surgical treatment which preserves the ovaries but reduces the incidence of ovarian cancer, or by removal of the ovaries themselves.
Although oral contraceptives and tubal ligation have consistently been associated with reduction in ovarian cancer in epidemiological studies,5 the use of these measures as prophylaxis has never been prospectively tested in an adequately designed and powered trial; given the relative rarity of ovarian cancer, as well as the rarity of some of the serious side effects of oral contraceptives, such as an increased risk of deep vein thrombosis, such a trial may ultimately not be feasible.
Although primary peritoneal carcinomatosis, a condition which histologically and clinically is almost identical to ovarian cancer,6 can occur after removal of the ovaries, it appears to be rare in average-risk women.7 Bilateral oophorectomy in perimenopausal women undergoing hysterectomy for other causes has traditionally been recommended for prevention of ovarian cancer; however, this practice has also not been subjected to rigorous prospective study. A recent decision analysis suggests that, based on the available evidence, the potential harms from the other effects of oophorectomy may outweigh the benefits of ovarian cancer prevention.8
This strategy depends on two things: the availability of a test for ascertainment of individuals at increased risk for developing ovarian cancer, and the availability of effective primary preventive treatment.
Although no randomized trials have been conducted, several observational studies suggest that women with an inherited predisposition to developing ovarian cancer who undergo prophylactic oophorectomy are at reduced risk of developing ovarian cancer compared to the expected incidence in this population.9–11 Observational data also suggests that oral contraceptive use reduces ovarian cancer incidence in high-risk groups.12, 13
Unlike cervical cancer, where screening has proven remarkably effective, no screening test has proven effective in reducing ovarian cancer mortality. Physical examination using the bimanual pelvic examination,14 serum testing using the tumor marker cancer antigen 125 (CA-125), and imaging using vaginal ultrasound15 have all proven ineffective; the U.S. Preventive Services Task Force (USPSTF) gives a D recommendation to current methods for screening for ovarian cancer (at least fair evidence that the practice is ineffective or that harms exceed benefits). Additional studies are currently being conducted.
As with primary prevention, this strategy is dependent on both effective screening methods and the ability to accurately determine who is at “high risk.” Screening, including more frequent screening, has not resulted in a reduced ovarian cancer incidence, or a substantial shift in stage distribution of detected cancers, in high-risk groups.16–19
Identification of women who are particularly likely to respond to specific therapies, or identification of new targets for therapy, could lead to improved survival and quality of life in women with ovarian cancer. Although there is much ongoing research into possible targets for therapy, ovarian cancer therapy lags behind therapy for breast cancer, where identification of particular molecular targets appears to be effective.20 This category could also include tests that help distinguish particular types of ovarian cancer from other types, and to distinguish primary ovarian cancer from cancer metastatic to the ovary from other sites, since misclassification could lead to relatively less effective therapy.
Advances in molecular biology, including the decoding of the human genome, have led to intensive research across the spectrum of human disease. The terms “genomics” or “genetic test” have been used differently in different settings. For the purposes of this report, we include the following types of tests based on the interests of the Agency for Healthcare Research and Quality (AHRQ), the Centers for Disease Control and Prevention (CDC), and the Evaluation of Genomics Applications in Practice and Prevention (EGAPP) program.
These tests measure the concentration or presence/absence of proteins which are associated with the presence of ovarian cancer. The classic example of this type of test is CA-125, a protein for which several validated, commercially available assays are available. Levels of CA-125 are increased in patients with ovarian cancer compared to normal subjects, and the test is useful in discriminating benign from malignant masses in postmenopausal patients.14 Typically, these tests are for proteins detectable in serum, although, in some cases, tests may be peformed in fluid aspirated from an ovarian mass or the peritoneal cavity, or immunohistochemistry stains may be performed on ovarian or tumor tissue.
Tests for inherited or acquired mutations (e.g., breast cancer genes 1 and 2 [BRCA1/2]) in single genes can potentially identify patients at higher risk for developing cancer. Alternatively, mutations in some genes in the cancer itself may indicate greater or lesser likelihood of responding to a given therapy, or of developing side effects with a given therapy. In addition, changes in the overall pattern of the genome, such as loss of heterozygosity, are characteristic of many cancers, and potentially have a role in diagnosis.21 Finally, epigenetic changes (reversible changes to DNA and chromatin, such as the addition or subtraction of methyl groups), are currently under active investigation in a variety of cancers, including ovarian cancer.22–24
Quantitative or semi-quantitive measurement of the expression (either higher or lower than normal) of particular genes in serum or tumor tissue has the potential for help in diagnosis (either as a screening tool or in discrminating particular subtypes of cancer), or potentially to aid in targeted therapy; for example, overexpression of human epidermal growth factor receptor 2 (HER-2) in breast cancer predicts responsiveness to therapy with an antibody against the receptor, trastuzimab.20 Both single genes, and patterns of expression of multiple genes using technologies such as microarray, can be helpful. The introduction of high-throughput technology has facilitated the search for patterns of expression associated with specific outcomes, allowing simultaneous comparison of multiple genes in specimens from patients with and without the outcome. Studies of gene expression may also serve as the basis for identification of single gene products which can subsequently be evaluated as markers for screening, diagnosis, or management guidance.
Finally, quantification of protein patterns, typically in serum, can be performed using mass spectroscopy; one of the more common techniques is surfance-enhanced laser desorption inonization time-of-flight (SELDI-TOF).25 As with multiple gene expression, protein patterns can be compared between patients with and without a given outcome of interest, or used to identify single markers.
Although there is widespread interest in genomic tests for prevention of morbidity and mortality for a wide range of conditions, ovarian cancer has been an area of particular interest on the part of the scientific community and lay public, largely because of the lack of an effective screening test. In particular, efforts to rapidly commercialize a proteomics-based test, OvaCheck™, prior to validation of the test in a large population, has led to a realization of the need for critical evaluation of the validity of these tests.26, 27
| Test | Type of test | Use of test | |||
|---|---|---|---|---|---|
| Increased risk | Screening | Diagnosis | Management | ||
| Commercially available | |||||
| Routine use in ovarian cancer | |||||
| Cancer antigen 125 (CA-125) | Single gene product | X | X | ||
| Beta human chorionic gonadotropin (β-hCG; germ cell tumors) | Single gene product | X | X | ||
| Breast cancer gene 1/2 (BRCA1/2) | Genetic variation | X | |||
| Carcinoembryonic antigen (CEA) | Single gene product | X | |||
| Investigational for ovarian cancer | |||||
| Cancer antigen 27-29 (CA-27-29) | Single gene product | X | X | ||
| Lipid-associated sialic acid (LASA) | Single gene product | X | X | ||
| Human epidermal growth factor receptor 2 (HER2)/neu | Gene expression | X | |||
| Investigational | |||||
| Chromosome 8q gain | Genetic variation | X | |||
| DNA methylation | Genetic variation | X | X | X | |
| Epidermal growth factor receptor (EGFR) | Single gene product | X | X | ||
| Genome-wide loss of heterozygosity | Genetic variation | X | |||
| Lysophospholipids (LSA) | Single gene product | X | X | ||
| Matrix metalloproteinases (MMP) | Single gene product | X | |||
| Protein expression profiles (OvaCheck, etc.) | Protein expression | X | |||
| Urinary plasminogen activator | Single gene product | X | |||
Because the majority of applications for genomic tests are investigational, there are few formal guidelines for their use, other than recommendations for the use of CA-125 as an adjunct to diagnosis of ovarian cancer,29 against the use of CA-125 for routine screening for ovarian cancer,15, 30 and for the use of BRCA1 and 2 testing in women with family histories suggestive of familial breast or ovarian cancer.31
Because the use of BRCA 1 and 2 testing for identifying women at high risk and the use of CA-125 for screening and as a diagnostic test in women with an adnexal mass have been recently covered by AHRQ evidence reports,14, 30, 31 we have summarized the findings of these reports in the appropriate sections, incorporating any additional relevant evidence published subsequent to the reports.
In this review, and particularly in the discussion of the results and suggestions for future research, we will attempt to identify: (a) issues related to evaluation of specific strategies for ovarian cancer prevention; (b) issues related to evaluation of specific classes of “genomic tests;” and (c) where applicable, specific issues related to the evaluation of a given class of genomic test for a given prevention strategy.
This section of the report describes the basic methodology used to develop the evidence report, including topic assessment and refinement, analytic framework, literature search strategies and results, literature screening, quality assessment, data abstraction methods, and quality control procedures.
The Centers for Disease Control and Prevention (CDC) and the Agency for Healthcare Research and Quality (AHRQ) originally identified six key questions to be addressed by the report, which is intended to assess the evidence for the diagnostic accuracy, benefits, and harms of genomic tests in screening and management of ovarian cancer. The Duke research team clarified and refined the overall research objectives and key questions by first consulting with the two study sponsors, AHRQ and CDC, and then convening a national panel of technical experts to serve as advisors to the project. These experts were selected to represent relevant specialties. Members of the technical expert panel were:
Alfred O. Berg, M.D., M.P.H.; Department of Family Medicine, University of Washington; Seattle, WA (member of the CDC Evaluation of Genomic Applications in Practice and Prevention [EGAPP] Working Group)
Katrina Armstrong, M.D., M.S.C.E.; Leonard Davis Institute of Health Economics, University of Pennsylvania School of Medicine; Philadelphia, PA (EGAPP Working Group member)
Jeffrey Botkin, M.D., M.P.H.; Department of Pediatrics and Medical Ethics, University of Utah; Salt Lake City, UT (EGAPP Working Group member)
JoEllen Schildkraut, Ph.D.; Department of Prevention Research, Duke University; Durham, NC
As a result of an initial conference call with the technical experts, AHRQ, and CDC, the Duke research team finalized the key research questions to be included in the report and the approach that would be used to address them. The final key questions are as follows:
Question 1: What is the evidence that ovarian cancer genomic tests performed in a typical clinical laboratory actually measure what they are purported to measure?
Question 2: What is the sensitivity and specificity of genomic tests in detecting ovarian cancer in asymptomatic and symptomatic women, including high-risk women?
Question 3: What is the evidence that genomic testing to detect ovarian cancer in asymptomatic women, including high-risk women, changes clinical management and leads to improved health outcomes?
Question 4: What is the evidence that genomic testing in women with clinical suspicion of ovarian cancer or with already-diagnosed ovarian cancer changes clinical management and leads to improved health outcomes?
Question 5: What are the harms of using genomic tests for ovarian cancer prevention and management?
Question 6: Has direct-to-consumer and direct-to-physician marketing of genomic tests on ovarian cancer increased the “appropriate” use (as defined by study investigators) of these tests?
Note: Numbers refer to key questions
The analytic framework depicted above serves to clarify the relevant key questions as follows:
Genomic tests can detect an inherited predisposition, genes and proteins that are associated with the presence of cancer, or genes and proteins that identify targets for therapy or predict response to therapy. Question 1 addresses whether available tests perform as intended at the level of the laboratory (“analytic validity”).
Genomic tests in the second category, above, may detect ovarian cancer either in women without symptoms (used as a screening test) or as part of the evaluation of women with symptoms (Question 2).
Based on the results of genomic testing, women may have different strategies; women with a predisposition to ovarian cancer may undergo primary or secondary prevention strategies, while, ideally, asymptomatic women detected through genomic tests will have reduced ovarian cancer mortality, without unacceptable levels of harm from testing and diagnosis, than women who do not undergo genetic testing (Question 3).
Genomic testing can potentially serve as a test to help discriminate cancer from benign conditions in women with symptoms, or lead to specific therapies with better outcomes in women who have already had a diagnosis of ovarian cancer (Question 4).
As with any test, there are potential harms associated with genomic testing. These include anxiety about the risk of ovarian cancer and difficult decisions regarding reproduction and possible prophylactic surgery in women with inherited predispositions; additional diagnostic tests, including diagnostic surgery, or use of inappropriate therapy, in women with false-positive tests; and the failure to further evaluate, or appropriately treat, women with false-negative tests (Question 5).
Although not in the formal pathway, marketing to consumers and physicians may make women more likely to undergo testing. Particularly in asymptomatic women, this testing may lead to (a) diagnosis of a predisposition in the absence of clear evidence on appropriate management strategies, or (b) diagnosis of “abnormality,” leading to additional tests, including surgery (Question 6).
The primary source of literature was MEDLINE® (1966–May 2006). Searches of this database were supplemented by reviews of reference lists contained in all included articles and in relevant review articles and meta-analyses.
The basic search strategy used the National Library of Medicine's Medical Subject Headings (MeSH) key word nomenclature developed for MEDLINE.® Searches were limited to articles published in English. The exact search string used is given in Appendix A.*The three searches yielded a total of 1,303 citations, whose records were maintained in a ProCite (Thompson ISI ResearchSoft, Berkeley, CA) database.
Paired researchers from the Duke research team independently reviewed abstracts and classified each as “include” or “exclude” according to study-specific criteria, which they also developed. Abstracts were included if at least one of the paired reviewers recommended that it be included. A total of 552 abstracts were included for the further full-text review stage. Interrater reliability for include/exclude decisions at the abstract screening stage was tested by having seven pairs of readers review 813 abstracts. Agreement was good to excellent (kappa 0.36 to 0.75).
At the full-text review stage, the paired researchers independently reviewed a set of the articles and indicated a decision to “include” or “exclude” the article for the data-abstraction stage. When a pair of reviewers arrived at different opinions about whether to include an article, they were asked to reconcile the difference. Detailed inclusion and exclusion screening criteria were developed by research question and are described immediately below.
Abstracts were included for full-text review if they met the criteria described below, or if insufficient information was provided to judge whether they met the criteria. Articles were included for abstraction if full-text review showed that all criteria were met.
An article was included if it pertained to:
Epithelial ovarian cancer or primary peritoneal carcinomatosis; and
Genomics as defined by AHRQ for this project to mean any gene-based test used for predicting risk of developing disease, screening, diagnosis of disease, disease management, or prognosis only in strategies for the prevention of ovarian cancer morbidity and mortality. These included single gene products (e.g., cancer antigen 125 [CA-125]); genetic variations (e.g., breast cancer genes 1/2 [BRCA1/2] ); gene expression (e.g, human epidermal growth factor receptor 2 [HER2]/neu); and either single or multiple genes (e.g., microarrays) and protein expression (e.g., mass spectroscopy of multiple proteins in sera of patients with ovarian cancer compared with controls).
We included tests that:
Detect the presence of inherited mutations or gene polymorphisms which increase the risk of development of ovarian cancer;
Genes, RNA, or protein markers which are present or produced (or are present or produced in greater quantity) only in cells that have already undergone the transformation to cancer, and which can be used to detect asymptomatic or symptomatic cancers; and
Genes or proteins which may help predict the response to specific types of therapy, or themselves be targets of specific therapies.
We excluded the following:
Studies on BRCA1/2 screening and identification of risk covered in an earlier AHRQ evidence report;31
Studies on CA-125 screening and diagnosis covered in earlier AHRQ evidence reports;14, 30
Studies involving only germ cell or stromal ovarian cancer, or non-ovarian primary;
Studies where patients are not the denominator;
Studies involving a cell line only;
Studies where reported data do not allow construction of a 2-by-2 table.
| Articles identified | 1,303 |
| Abstracts screened | 1,303 |
Included | 552 |
Excluded | 751 |
| Full-text articles screened | 549† |
Included | 113 |
Excluded | 436 |
We were unable to obtain copies of 3 articles that passed the abstract screen.
| Question | Number of articles |
|---|---|
| Question 1: Analytic validity of testing | 32 |
| Question 2: Sensitivity and specificity of tests | 50 |
| Question 3: Impact on clinical management of asymptomatic patients | 0 |
| Question 4: Impact on clinical management of diagnosed patients | 29 |
| Question 5: Harms of testing | 4 |
| Question 6: Impact of direct-to-consumer or physician marketing | 2 |
| Total number of included articles | 113† |
Total does not equal sum of number of articles across questions because some articles were included for more than one question.
| Category of genomic test | Clinical use of test | |||
|---|---|---|---|---|
| Predisposition | Screening | Diagnosis | Management | |
| Single gene products | CA-125 | Alpha-L-fucosidase | Bcl-2 (anti-apoptosis protein) | |
| CA-125, CA-72-4, CA-15-3, CA-19-9 | CA-125 | |||
| CEA | CASA | |||
| c-erb-2 | Cathespin-D | |||
| CYFRA 21-1 | CYFRA 21-1 | |||
| Epithelial cell adhesion molecule | c-erb-B2 | |||
| FAS | hK6, hK10 | |||
| G-CSF | IL-6 | |||
| hK6, hK10 | LRP | |||
| IL-6, IL-8 | Mdm2 | |||
| M-CSF | MDR-1 | |||
| OVX1 | MRP1/2 | |||
| p55, p75 (tumor necrosis factor receptors) | nm23 (metatstasis suppressor) | |||
| Secretory leukocyte protease inhibitor | Pgp | |||
| Serum cadherin | p53 = TP53 (transcription factor) | |||
| Soluble IL-2 alpha | TN | |||
| Soluble intracellular adhesion molecule | TPS | |||
| TPS | ||||
| TATI | ||||
| Urinary gonadotropin peptide | ||||
| VEGF | ||||
| Genetic variations | BRCA1 | p53 = TP53 (transcription factor) | ||
| BRCA2 | ||||
| Gene expression | CK19 | c-erb-B2 | ||
| Multiple genes: ascitic fluid | Multiple genes: microarray | |||
| Multiple genes: immunohisto-chemistry | ||||
| Proteomics | Ciphergen ProteinChips: SAX2, WCX2 | |||
| Mass spectrometry using SELDI | ||||
| (statistical methods varied widely) | ||||
The Duke research team developed data abstraction forms/evidence table templates for abstracting data for the various key questions (Appendix C *). Based on clinical expertise, a pair of researchers was assigned to the research questions to abstract data from the eligible articles. One of the pair abstracted the data, and the second researcher over-read the article and the accompanying abstraction to check for accuracy and completeness. The completed evidence tables are provided in Appendix D.*
At the data abstraction stage, abstractors were asked to evaluate each included article for factors affecting internal and external validity. The quality assessment criteria used for this purpose were previously developed by the Tufts-New England Medical Center Evidence-based Practice Center for an evidence report on “Effects of Omega-3 Fatty Acids on Cardiovascular Disease.”32 Abstractors were instructed to assign a “+” or “-” to each item and provide a brief rationale for their decisions. Quality criteria assessed in this way were:
For Questions 1 and 2:
Reference standard
Verification bias
Test reliability/variability
Sample size
Statistical tests
Blinding
Definition of +/- on screening test
For Questions 3–5 (randomized controlled trials [RCTs]):
Randomization method
Blinding
Dropout rate < 20 percent
Adequacy of randomization concealment
For Questions 3–5 (cohort studies):
Unbiased selection of the cohort (prospective recruitment of subjects)
Large sample size
Adequate description of the cohort
Use of validated method for genomic test (i.e., analytic validity established)
Use of validated method for ascertaining clinical outcomes (e.g., surgical pathology, use of validated quality-of-life instrument, death)
Adequate followup period
Completeness of followup
Analysis (multivariate adjustments) and reporting of results
For Questions 3–5 (case-control studies):
Valid ascertainment of cases
Unbiased selection of cases
Appropriateness of the control population
Verification that the control is free of cancer
Comparability of cases and controls with respect to potential confounders
Appropriateness of statistical analyses
After evaluating each study against its question- and design-specific quality criteria, abstractors applied a three-category (A, B, C) summary quality grading system that has been used in previous evidence reports by the Tufts-New England Medical Center Evidence-based Practice Center, including the report cited above.32 This scheme defines a generic grading system for study quality that is applicable to each type of study design (i.e., RCT, cohort study, case-control study). The categories are defined as follows:
Least bias; results are valid. A study that mostly adheres to the commonly held concepts of high quality, including the following: a formal randomized study; clear description of the population, setting, interventions, and comparison groups; appropriate measurement of outcomes; appropriate statistical and analytic methods and reporting; no reporting errors; less than 20 percent dropout; clear reporting of dropouts; and no obvious bias.
Susceptible to some bias, but not sufficient to invalidate the results. A study that does not meet all the criteria in category A. It has some deficiencies but none likely to cause major bias. Study may be missing information, making assessment of the limitations and potential problems difficult.
Significant bias that may invalidate the results. A study with serious errors in design, analysis, or reporting. These studies may have large amounts of missing information or discrepancies in reporting.
For test characteristics, a Microsoft Excel® spreadsheet was developed that calculated appropriate test characteristics (sensitivity, specificity, negative predictive value, positive predictive value) for individual studies if studies provided enough data to input (a) values for individual cells of a 2-by-2 table, (b) the prevalence of disease and values for sensitivity and specificity, or (c) sufficient data to solve for two equations involving sensitivity, specificity, or predictive values. Ninety-five percent confidence intervals were automatically estimated using the approximate formula for proportions:

Model description. We developed a Markov model to estimate the life expectancy for asymptomatic women who are considered candidates for potential prevention and screening strategies for ovarian cancer. The model tracks a hypothetical cohort of 40-year-old women over their lifetimes and compares the impact of one of six strategies for the prevention of ovarian cancer on cancer incidence, mortality, and overall life expectancy.
Simulation model. Women enter the Markov model (Figure 3
Historically, cancer progression has been modeled as a serial progression through clinical stages - Stage I is followed by Stage II, Stage II is followed by Stage III, and Stage IV follows Stage III. This conceptual model has worked well with cervical cancer, but it is not clear that using this overall “model” for ovarian cancer is appropriate for the following reasons:
The main purpose of cancer staging is to identify groups of patients who have similar prognosis; this allows comparability in comparing treatment results in both prospective and retrospective studies. Although the concept that stages also represent biological progression is attractive, it is not necessarily true, and, at least in the case of the ovarian cancer staging system of the International Federation of Obstetrics and Gynecology (FIGO),33 plays no role in the development and validation of a staging system.
Cervical cancer is, in many ways, unique among human cancers: it has a single cause (persistent infection with certain types of human papilloma virus); exposure to this cause in most people occurs within a relatively narrow time frame (roughly ages 15 to 25, the times of highest sexual activity with multiple partners); and the most common type of cancer is a squamous type, which primarily spreads through direct extension. In contrast, the cause or causes of ovarian cancer are unclear, duration of exposure is unclear, and, most importantly, the pattern of spread and metastases is quite different.
By definition, Stage I ovarian cancer is limited to the ovary, Stage II involves the ovary and other organs in the pelvis, and Stage III, the most common stage at diagnosis, involves organs in the upper abdomen, including the large and small bowel, the omentum, the diaphragm, and other peritoneal surfaces. Peritoneal fluid constantly circulates, and it is not uncommon for loops of small bowel to come in contact with the ovary. In order for the “conceptual model” requiring an intervening Stage II prior to development of Stage III to be correct, one has to assume that cancer cells on the surface of the ovary must necessarily spread to the uterus or other pelvic organs before they can spread to areas in the upper abdomen via transport in peritoneal fluid or via direct contact with small bowel. We postulate that a scenario where a certain unknown proportion of ovarian cancers progress directly from Stage I to Stage III is at least as plausible a scenario.
Given this uncertainty about the clinical progression of ovarian cancer, we therefore modeled the progression under two alternative assumptions: (1) that ovarian cancer needs to progress from Stage I to Stage II before progressing to Stage III; and (2) that a proportion of ovarian cancer progresses directly from Stage I to Stage III. We evaluated how these two competing assumptions about the natural history of ovarian cancer affect the required stage progression and mortality rates and the estimated life expectancies of the alternative prevention strategies.
We assumed that women who have survived their detected cancer for 5 years are to be considered disease-free and to have mortality equal to that of the general population. Each month, a woman may also choose to have a benign oophorectomy, reducing her risk of ovarian cancer. Throughout their lifetimes, all women are at risk for age-specific mortality unrelated to ovarian cancer. We also included age-specific rates for bilateral oophorectomy, under the assumption that women without ovaries are not at risk for developing ovarian cancer; we did not specifically model the possibility of primary peritoneal carcinomatosis in these women.
Data sources. We obtained age-specific estimates of ovarian cancer incidence, mortality, stage distribution, and survival from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) online database Cancer Query System (http://seer.cancer.gov/canques).
Estimates for other-cause mortality were obtained by subtracting age-specific ovarian cancer mortality from age-specific all-cause mortality for women, using U.S. lifetables available from the National Center for Health Statistics (www.cdc.gov/nchs/deaths.htm).
Estimates for age-specific oophorectomy rates were obtained from AHRQ's Nationwide Inpatient Sample, using ICD-9 codes for bilateral oophorectomy, bilateral salpingoophorectomy, or removal of remaining ovary or remaining tube and ovary (http://hcup.ahrq.gov/HCUPnet.asp).
Software. We constructed the model and performed all analyses using DATA Pro 2006 (Williamstown, MA: TreeAge Software, Inc).
Prevention strategies. We modeled six clinical strategies of prevention for ovarian cancer (Figure 4
The baseline strategy of no screening or prevention (NoScreen) where women are identified with ovarian cancer only through development of clinical symptoms.
A primary prevention strategy (PrimaryPrevention) where women undertake a hypothetical method of primary prevention which reduces their incidence of ovarian cancer.
An interval screening strategy (IntervalScreen) where women are screened at recurrent intervals for ovarian cancer using a hypothetical test. Women identified through screening could benefit from early treatment.
A genetic screening strategy where women are tested for a specific genetic mutation and if positive undergo primary prevention for ovarian cancer (Genetic&PrimaryPrevention). The overall population risk for ovarian cancer is unchanged; we varied incidence in those with and without the putative mutation.
A genetic screening strategy where women are tested for a specific genetic mutation and if positive they undergo screening for ovarian cancer at recurrent intervals (Genetic&IntervalScreen).
A strategy where women once identified with ovarian cancer are tested for a hypothetical marker which allows targeted treatment for ovarian cancer (TargetTx). Women who are positive for the marker and undergo the targeted treatment experience greater survival.
Approach. Because the majority of the literature on genomic testing does not allow definitive conclusions about the relative effectiveness of different strategies using different tests, we adapted a “generic” approach to comparison of different strategies.
We chose as a goal a 20 percent reduction in ovarian cancer death, similar to the reductions targed for other cancers in the Healthy People 2010 objectives. With this target, we used the calibrated models to explore the following clinical questions:
How effective would a primary prevention intervention need to be to reduce ovarian cancer deaths by 20 percent?
What combinations of test sensitivity and frequency result in at least a 20 percent reduction in mortality?
What combinations of (a) prevalence of a genetic mutation in the population and (b) relative risk associated with that mutation would result in the target 20 percent reduction in ovarian cancer deaths with either primary prevention (at various levels of effectiveness) or interval screening (at varying levels of sensitivity and frequency)?
How effective would a targeted treatment for ovarian cancer need to be (and in what proportion of the patient population would the marker for that treatment need to exist)? Note that we assume that targeted therapy would be equally effective across all stages of disease.
How do the test characteristics for targeted treatment or genetic screening affect the results?
How do the above results differ under the assumption that cancer must progress from Stage I to II and then III versus that assumption that ovarian cancer may progress directly from Stage I to Stage III?
What effect does the assumption about natural history have on the relative efficacy of screening?
What is the impact of attributable risk proportion on the potential efficacy of genetic risk factors?
We employed internal and external quality-monitoring checks through every phase of the study to reduce bias, enhance consistency, and verify accuracy. Examples of internal monitoring procedures include: three progressively stricter screening opportunities for each article (abstract screening, full-text article review, data abstraction review); involvement of three individuals (two clinicians and copy-editor) in each data abstraction; and agreement of at least two clinicians on all included studies.
Our principal external quality-monitoring device is the peer-review process. Nominations for peer reviewers were solicited from several sources, including the technical expert panel and interested federal agencies. The list of nominees was forwarded to AHRQ for vetting and approval. A list of peer reviewers submitting comments on this draft is provided in Appendix E.*
Question 1 is: What is the evidence that the ovarian cancer genomic tests performed in a typical clinical laboratory actually measure what they are purported to measure?
We sought to identify articles that provided details on the performance of genomic tests in a laboratory setting, with an emphasis on laboratories providing results for clinical care. Because data on sensitivity and specificity are covered under Question 2, our emphasis in this question was on evidence related to analytic performance, such as:
Test reproducibility, as measured by inter- and intra-assay coefficients of variation for quantitative tests, or measurements of observer variability for tests that require human observation (such as immunohistochemistry).
Measurements of correlation with other tests, including previous generations of other tests.
Quantification of variability between laboratories.
Analytic sensitivity and specificity in comparison to a recognized reference standard.
We included only articles that specifically addressed the laboratory performance of genomic tests for ovarian cancer. Although specific assays may have documented analytic validity when used for other cancers, or other conditions, our focus was on ovarian cancer.
Radioimmunoassays for single gene products - cancer antigen 125 (CA-125). We identified six articles that compared the performance of a next-generation radioimmunoassay (RIA) for CA-125 (CA-125 II) from various manufacturers to earlier generation tests or to other RIAs for other tumor markers.34–39 All six studies reported high correlation coefficients with previous assays. All studies reported low inter- and intra-assay coefficients of variation (values generally less than 10 percent for inter-assay, less than 5 percent for intra-assay). Of note, two studies examined coefficients of variation at different levels of CA-125 and found changing variability with CA-125 levels. Fillela et al.,38 using an automated analyzer, found coefficients of variation of 2.8 to 6.4 percent for “level 2” values of CA-125 (mean 47.1 U/mL), with values of 1.8 to 4 percent for “level 3” (mean 164.3 U/mL). Hubl et al.34 reported slightly higher intra-assay coefficients of variation in mid-range (40 U/mL) compared to low-range values (10 to 20 U/mL). A third study39 did not find an effect of concentration in the clinically relevant range. Because 35 U/mL is the most commonly used threshold for considering a CA-125 value suspicious for cancer, these results suggest that random variation in test results may have some impact on sensitivity and specificity at values close to the threshold. The clinical impact of this variability would ultimately depend on how values close to the threshold are managed.
Tuxen and colleagues performed serial measurements of CA-125 over the course of a year in 26 women with known ovarian cancer40 and 31 healthy controls41 to assess the relative effect of analytic variability and inter- and intra-individual biologic variation on CA-125 levels. In women with cancer, analytic imprecision accounted for 12 percent of the variability in levels, intra-individual variations 24.0 percent, and inter-individual variations 43.6 percent; after accounting for this imprecision, the investigators estimated that a change of greater than 62.6 percent in the reference value would be needed in order to be statistically significant. Similar values were found in healthy controls, with imprecision being greater in premenopausal women (69.5 percent) compared with postmenopausal women (35.7 percent) due to variability in levels over the course of the menstrual cycle. The change in reference value required for significance after accounting for variation in the entire group was 50 percent.
One study37 compared sensitivity and specificity of the new generation and first generation assays using 138 stored samples and found slightly higher sensitivity with the new assay (89.8 vs. 84.7 percent), and lower specificity (83.5 vs. 84.7 percent). However, the prevalence of cancer in the samples was much higher than would be expected in a typical clinical population, and the confidence intervals for the sensitivity and specificity estimates overlapped.
Radioimmunoassays for other single gene products. Few studies of other markers were performed in clinical laboratories. Hasholzner and colleagues36 evaluated clinical laboratory performance of an RIA for cancer antigen 72-4 (CA-72-4); intra-assay coefficients of variation were 3.5 to 4 percent, and inter-assay coefficients of variation were 5 to 7.4 percent. Correlation between CA-125 levels and CA-72-4 levels was good for healthy controls (-0.066), but poor for serous ovarian cancer patients (0.576).
Tuxen and colleagues41 measured carcinoembryonic antigen (CEA) and tissue plasminogen activator (TPA) along with CA-125 in healthy controls in the study described above. For CEA, the change in reference values needed for significance after accounting for imprecision was 44.8 percent; for TPA, the value was 67.9 percent. Unlike with CA-125, menopausal status did not affect the degree of intra-individual variability.
Two studies reported research laboratory performance of two other single gene products and preliminary clinical validation. Riisbro and colleagues42 reported an inter-assay coefficient of variation of 7.6 percent, and an intra-assay coefficient of variation of 4.6 percent for an RIA for soluble urokinase plasminogen activator receptor. Using 129 stored serum samples, levels were correlated with malignancy and stage of disease, but not after adjusting for other variables. Thougaard and colleagues43 compared three different antibodies targeted against tetranectin, with similar performance in terms of assay variability, differences in absolute levels of 10 percent or less, and similar correlations in ovarian cancer patients (n = 43); levels were observed to decrease as cancer stage worsened.
Two studies44, 45 reported on the performance of RIAs developed after identification of candidate single gene products identified after using microarrays to identify overexpressed genes. These studies will be discussed below under “Microarrays.”
Other assays. Sapi and colleagues46 reported a method for removing peripheral lymphocytes from blood samples or ascitic fluid in order to measure telomerase activity. After this method, telomerase activity was observed in 8 of 8 patients with Stage IV ovarian cancer, 7 of 20 patients with Stage III, and 0 of 30 controls. CA-125 levels were higher in patients with positive telomerase assays.
Single gene mutation/polymorphism. Janatova and colleagues47 evaluated the performance of Spreadex Polymer NAB (electrophoresis gels) in patients with known breast cancer gene 1/2 (BRCA1/2) mutations (n = 13) and 13 controls; the technique successfully identified mutations only in those subjects with known mutations; all patients with known mutations had mutations detected.
Wen and colleagues48 compared microarray with gel-based DNA sequencing for identifying mutations in the p53 gene in 108 patients with ovarian cancer. Mutations were detected by both methods in 57 cancers, and no mutations by both methods, for a concordance of 81 percent.
Microarrays. Only one study specifically examined test performance in a clinical laboratory. Zarrinkar et al.49 compared high-throughput microarray using parallel analysis to single sample assays in specimens from 31 patients with known ovarian cancer and found a high level of correlation (0.980).
Two studies reported preliminary data from research laboratories on candidate single-gene products identified initially through microarray studies.44, 45 Hellstrom and colleagues45 compared the performance of an antibody to human epididymis protein 4 (HE4) to CA-125 in 121 subjects, of whom 37 (30.6 percent) had ovarian cancer. Reported sensitivities for HE4 were better for HE4 then for CA-125 at a fixed specificity of 96 percent, but confidence intervals were quite wide. Mok and colleagues44 examined the performance of prostatin in 201 subjects, 64 (31.8 percent) of whom had ovarian cancer. Prostatin levels correlated poorly with CA-125 levels; sensitivity of prostatin was less than that of CA-125 at the same specificity of 94 percent, but the combination of the two markers had a sensitivity of 92 percent at the same level of specificity.
Proteomics. Although we identified 10 studies that looked at protein expression in serum as a potential biomarker for ovarian cancer,50–59 all were performed in research laboratories. Because several of these studies have attracted wide attention in the media, we will discuss them in more detail here.
Petricoin et al.,57 created a proteomics-based genetic algorithm with cluster analysis to distinguish between ovarian cancer and non-ovarian cancer serum samples using a training set of 50 ovarian cancers and 50 healthy controls from a high-risk population. The new algorithm was then tested using a validation set consisting of 50 ovarian cancers and 66 non-cancers, some with benign ovarian cysts, benign gynecologic disease, or benign non-gynecologic disease. The algorithm successfully classified 50/50 cancers (sensitivity = 100 percent) and 63/66 non-cancers (specificity = 95 percent) in the validation set. The study has two major limitations. First, the proteins used to distinguish cancers from non-cancers were not identified, leading to questions of whether proteins of interest were actually produced by tumor cells or by other inflammatory responses in the tumor's microenvironment. Although the reported positive predictive value is 94 percent in the study, the low prevalence of ovarian cancer (1 in 2,500) in the general population would reduce the positive predictive value of proteomic screening to less than one percent in a screening population.
These investigators subsequently published three datasets online as the Clinical Proteomics Program Databank (http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp). The first dataset (2-16-02) consists of 100 control, 100 ovarian cancer, and 16 benign disease samples run on a Ciphergen H4 ProteinChip array. Ovarian Dataset 4-3-02 consists of the same samples run on Ciphergen WCX2 ProteinChip array. Ovarian Dataset 8-7-02 contains serum profiles run on Ciphergen WCX2 ProteinChip array of 162 ovarian cancer patients subdivided into stages and 91 non-cancer control subjects.
Sorace et al.55 analyzed Ovarian Dataset 8-7-02 using a training set containing 45 controls and 80 cancers. A 2-sided Wilcoxon test was used to compare intensity between controls and cancers at different mass-to-charge (M/Z) values. A subset of M/Z values that resulted in the lowest Wilcoxon p-values was selected, and stepwise discriminant analysis was used to determine the subset of M/Z values that best discriminated cancers from controls. Classification rules were then used on the remainder of the patient data (test set). Three classification rules were developed, all with sensitivity > 90 percent and specificity > 90 percent when applied to the test set. The authors expressed concerns over the existence of highest discriminatory ability in the M/Z < 500 range, where data are traditionally discarded due to increased “noise.” They hypothesized several explanations for these findings including very low molecular weight (MW) biomarkers such as LPA, low MW degradation products of higher MW macromolecules, and systematic processing error.
Li et al.59 analyzed all three Clinical Proteomics Program Datasets using two different approaches: support vector machine statistical testing (SVM-ST) and support vector machine with genetic algorithm (SVM-GA). Datasets were not split into training and validation sets; instead, a leave-out-one cross validation was used. Sensitivity and specificity for analysis of Dataset 2-16-02 were lower than in the analysis by Petricoin et al.57 of the same data (0.79 and 0.80 for SVM-ST, and 0.96 and 0.948 for SVM-GA, respectively). Sensitivity and specificity were improved with analysis of the other two datasets, achieving 100 percent sensitivity and 100 percent specificity using SVM-GA to analyze Dataset 8-7-02. The authors were unable to reproduce the sensitivity and specificity reported by Petricoin et al. when training an SVM with the discriminatory features identified in the latters' paper.
Zhang et al.53 performed a multicenter study to analyze serum proteomic expression profiles using Ciphergen ProteinChip in 153 patients with epithelial ovarian cancers, 42 with other ovarian cancers, 166 with benign pelvic masses, and 142 healthy controls. Results were cross-validated against different subsets of the data to identify biomarkers. Three biomarkers (apolipoprotein A1, transthyretin, both down-regulated in ovarian cancers, and a fragment of human inter-alpha trypsin inhibitor, upregulated) were identified and immunoassays performed on serum from another subset of patients. Levels of these three biomarkers were included in a multivariate model to predict malignancy, and the model was tested on a validation set consisting of 138 ovarian cancers and 63 healthy controls. The resulting model had a sensitivity and specificity of 0.775 and 0.968, respectively, in distinguishing cancer from healthy controls. The authors also created a model incorporating CA-125 with the three markers, which improved on the specificity of CA-125 alone. The discovered biomarkers were all acute phase reactants deemed unlikely to be released by tumor cells. Controls were not age-matched and were significantly younger (median test and validation sets 39 and 44) compared to cancer patients (median 52 and 57).
Kozak et al.60 analyzed serum from 109 ovarian cancers, 19 patients with benign disease, and 56 healthy donors using the Ciphergen ProteinChip SAX2. Samples were divided into training and test sets. Proteins differentially expressed were identified using t-test and Wilcoxon rank sum tests. Three biomarker protein panels were then developed: SBP (five markers), VBP I (five markers), and VBP II (four markers). Multivariate logistic regression was used to develop panels with the best predictive value. Sensitivity and specificity were 0.957 and 0.826, respectively, for SBP; 0.815 and 0.949 for VBP I; and 0.728 and 0.949 for VBP II. Test sets were employed. Panels correctly identified early stage disease with variable sensitivity. Individual discriminatory proteins were not identified.
Although all studies reported good discrimination for the particular protein profile studied, there were several recurrent issues that limit the ability to draw inferences about potential clinical applicability:
Technical issues with the assay. For example, Conrads and colleagues58 noted that “comparisons...revealed that the variation in mass spectra (overall amplitude, total record count and deviation between ovarian cancer cases and control samples) was statistically indistinguishable from the variance within the process itself, as indicated by the serum reference standard.” Sorace and Zahn,55 in an analysis of a dataset used by several other groups, found sensitivity and specificity of 100 percent in a training set, but noted that much of the discrimination of the profile lies in the region of the spectroscopy results with low mass-to-charge ratios. They note that this region is problematic both because of technical issues of measurement and because differences in protein profiles in this region may result from processes independent of cancer.
Varying analytic methods. No consistent methodology was used. Given the complexity of the data and the variety of methods used, it is difficult to draw consistent conclusions about performance. Li and colleagues59 found marked variability in results using similar statistical methods on different datasets, as well as using different statistical methods on the same dataset.
Unrealistically high prevalence of ovarian cancer. The majority of the studies compared serum samples from known ovarian cancer patients to healthy controls, using relatively small datasets of 100 to 200 subjects, with a prevalence of cancer of 30 to 50 percent. Although repeated sampling and resampling was performed in all of these studies, the prevalence of cancer was still substantially higher than it would be in a screening population (approximately 0.05 percent). Only one study59 provided estimates for the positive predictive value within a screening population; these estimates were in general at least an order of magnitude lower than the results based on the original dataset.
The majority of the literature we identified that specifically addressed issues of clinical laboratory performance in ovarian cancer dealt with radioimmunoassays of single gene products, with CA-125 being the most common product. Test reproducibility and validity is in general quite good for these assays, although a series of Danish studies by Tuxen and colleagues suggests that both inherent laboratory variation and biological variation should be considered when considering thresholds for determining clinically relevant changes in concentrations of these markers. In addition, coefficients of variation for CA-125 are generally greatest when levels are in the range of the most commonly used discriminatory threshold of 35 U/mL, suggesting that this irreducible imprecision may have some impact on sensitivity and specificity in practice.
We did not identify any relevant literature on the clinical laboratory performance of other types of genomic tests. Although there were numerous articles describing research laboratory performance, the relevance of these studies to widespread clinical practice is uncertain. In particular, the prevalence of ovarian cancer in studies of potential proteomic patterns as predictors of early stage ovarian cancer is at least an order of magnitude higher than the likely prevalence in the general population.
The published data on clinical laboratory performance suggests that currently available radioimmunoassays for single gene products have acceptable reproducibility and reliability, although even this level of variability may have some impact on clinical interpretation of results, especially when comparing relatively small serial changes, or levels close to the discriminatory threshold.
There is insufficient evidence to estimate how newer technologies such as microarrays or protein profiles would perform in a “typical clinical laboratory.”
Question 2 is: What is the sensitivity and specificity of genomic tests in detecting ovarian cancer in asymptomatic and symptomatic women, including high-risk women?
We sought to identify articles that provided details on the sensitivity and specificity of genomic tests in a clinical setting. We separately reviewed studies intended for screening purposes, both in the general population and in women identified as high risk based on family history and/or BRCA testing, and studies used for diagnostic purposes, either in women with symptoms or women with a diagnosed mass.
Asymptomatic women - average risk. The systematic review conducted for the U.S. Preventive Services Task Force (USPSTF)30 concluded that annual CA-125 screening had an estimated sensitivity of 80 percent, with false positive rates of 0.1 to 0.6 percent, based on three studies with small numbers of cancers and variable, relatively short, followup durations; the estimated positive predictive value for screening was 1 percent for women called for additional testing, and 15 percent for women undergoing surgery.
The evidence report on management of adnexal masses14 found that the majority of studies did not report results separately for women with asymptomatic masses compared with those who had masses detected because of symptoms. Of note, the report also found an extremely low sensitivity (less than 50 percent) of the bimanual pelvic examination as both a screening test and an initial diagnostic test.
Asymptomatic women - high risk. The BRCA1 and BRCA2 systematic review for the USPSTF31 did not specifically address the sensitivity and specificity of genomic tests in this setting.
Symptomatic women - average risk. Again, the literature on the use of diagnostic tests, including genomic tests, does not provide useful information on differences in test performance in symptomatic versus asymptomatic women. The evidence report on management of adnexal masses14 found an approximate sensitivity of 78 percent for CA-125 in the diagnosis of cancer in adnexal masses, with an approximate specificity of 78 percent; both sensitivity and specificity were higher in postmenopausal women. Other genomic tests (all single gene products) reviewed included TAG-92, cancer antigen 19.9 (CA-19.9), and CEA; all had sensitivities lower than the pooled estimates for CA-125. There were few studies examining combination testing; those that did failed to find improved discrimination compared to CA-125 alone.
Symptomatic women - high risk. The adnexal mass evidence report14 did not identify any studies uniquely in high-risk populations.
Asymptomatic women. We did not identify any studies of genomic tests other than CA-125 that provided evidence of sensitivity and specificity as primary screening tests for ovarian cancer in asymptomatic women. The one major study published subsequent to the USPSTF review reported the initial baseline results of the National Cancer Institute Prostate, Lung, Colon, and Ovarian (PLCO) screening trial.61 In this study, over 28,000 women aged 55 or older were screened with transvaginal ultrasound and CA-125; 402 women (1.4 percent) had an abnormal CA-125. Of the 19 invasive cancers, four had normal CA-125 levels for a sensitivity of 78.9 percent (95 percent confidence interval [CI], 60.6 to 97.3 percent), and a specificity of 98.7 percent (95 percent CI, 98.5 to 98.8 percent), consistent with previous studies in postmenopausal women.
Only one study62 provided sufficient detail about patient characteristics to be able to ascertain test performance of a genomic test (in this case, vascular endothelial growth factor [VEGF]) as a diagnostic tool in asymptomatic women identified with a pelvic mass through screening; sensitivity was 55.9 percent, and specificity 55.3 percent, too low to be considered useful as a second line diagnostic test. All of the other studies that included women with a pelvic mass failed to report the proportion of women with a mass who had presented on the basis of symptoms, or on the basis of asymptomatic detection of a mass through a pelvic examination or imaging study; this limitation is shared by the majority of the literature on diagnosis of ovarian cancer in women with masses.14
Symptomatic women - single gene products. The majority of studies identified were retrospective studies that compared serum or, in some cases, tissue from women with known ovarian cancer to serum from women with benign adnexal masses and/or asymptomatic women. There were more than two studies identified for only two markers, CA-72-4 and VEGF.
| Study | Gene product | TP | FN | FP | TN | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Prevalence |
|---|---|---|---|---|---|---|---|---|---|---|
| Wakahara et al. 200163 | CA-72-4 | 20 | 20 | 18 | 108 | 50.0% (34.5 to 65.5%) | 85.7% (79.6 to 91.8%) | 52.6% (36.8 to 68.5%) | 84.4% (78.1 to 90.7%) | 24.3% |
| Schutter et al., 199864 | CA-72-4 | 28 | 15 | 6 | 86 | 65.0% (50.7 to 79.3%) | 93.0% (87.8 to 98.2%) | 82.4% (69.5 to 95.2%) | 85.1% (78.2 to 92.1%) | 31.9% |
| Fayed et al., 199865 | CA-72-4 | 21 | 9 | 3 | 57 | 70.0% (53.6 to 86.4%) | 95.0% (89.5 to 100%) | 87.5% (74.3 to 100%) | 86.4% (78.1 to 94.6%) | 33.3% |
| Zakrzewska et al., 199966 | CA-72-4 | 39 | 31 | 0 | 26 | 55.7% (44.1 to 67.4%) | 100% (88.5 to 100%) | 100% (92.3 to 100%) | 45.6% (32.7 to 100%) | 72.9% |
| Hasholzner et al., 199636 | CA-72-4 (benign vs. cancer) | 66 | 57 | 1 | 36 | 54.0% (45.2 to 62.8%) | 97.0% (91.5 to 100%) | 98.5% (95.6 to 100%) | 38.7% (28.8 to 48.6%) | 76.9% |
| Hasholzner et al., 199636 | CA-72-4 (healthy vs. cancer) | 66 | 57 | 1 | 29 | 54.0% (45.2 to 62.8%) | 97.0% (90.9 to 100%) | 98.5% (95.6 to 100%) | 33.7% (23.7 to 43.7%) | 80.4% |
Abbreviations: CI = confidence interval; FN = false negative; FP = false positive; NPV = negative predictive value; PPV = positive predictive value; TN = true negative; TP = true positive
| Study | Gene product | TP | FN | FP | TN | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Prevalence |
|---|---|---|---|---|---|---|---|---|---|---|
| Tanir et al., 200367 | VEGF | 11 | 1 | 6 | 44 | 91.7% (76.0 to 100%) | 88.0% (79.0 to 97.0%) | 64.7% (42.0 to 87.4%) | 97.8% (93.5 to 100%) | 19.4% |
| Gorelik et al., 200568 | VEGF | 35 | 9 | 27 | 55 | 79.5% (67.6 to 91.4%) | 67.4% (57.3 to 77.5%) | 56.5% (44.1 to 68.8%) | 85.9% (77.4 to 94.5%) | 34.9% |
| Obermair et al., 199869 | VEGF | 24 | 20 | 19 | 62 | 54.5% (39.8 to 69.3%) | 76.5% (67.3 to 85.8%) | 55.8% (41.0 to 70.7%) | 75.6% (66.3 to 84.9%) | 35.2% |
| Cooper et al., 200270 | VEGF | 75 | 26 | 16 | 34 | 74.0% (65.4 to 82.6%) | 68.0% (55.1 to 80.9%) | 82.4% (74.6 to 90.2%) | 56.7% (44.1 to 69.2%) | 66.9% |
| Oehler and Caffier, 199971 | VEGF (benign mass controls) | 29 | 12 | 7 | 13 | 70.7% (56.8 to 84.7%) | 65.0% (44.1 to 85.9%) | 80.6% (67.6 to 93.5%) | 52.0% (32.4 to 71.6%) | 67.2% |
| Oehler and Caffier, 199971 | VEGF (healthy controls) | 30 | 11 | 6 | 14 | 73.2% (59.6 to 86.7%) | 70.0% (49.9 to 90.1%) | 83.3% (71.2 to 95.5%) | 56.0% (36.5 to 75.5%) | 67.2% |
Abbreviations: CI = confidence interval; FN = false negative; FP = false positive; NPV = negative predictive value; PPV = positive predictive value; TN = true negative; TP = true positive
| Study | Gene product | TP | FN | FP | TN | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Prevalence |
|---|---|---|---|---|---|---|---|---|---|---|
| Abdel-Aleem et al., 199672 | Alpha-L-fucosidase | 43 | 5 | 0 | 28 | 89.6% (80.9 to 98.2%) | 100% (89.3 to 100%) | 100% (93.0 to 100%) | 84.8% (72.6 to 97.1%) | 63.2% |
| Cherchi et al., 200273 | CA-15-3 | 10 | 10 | 6 | 38 | 50.0% (28.1 to 71.9%) | 86.4% (76.2 to 96.5) | 62.5% (38.8% to 86.2%) | 79.2% (67.7 to 90.7%) | 31.3% |
| Cherchi et al., 200273 | CA-19-9 | 13 | 7 | 2 | 42 | 65.0% (44.1 to 85.9%) | 95.5% (89.3 to 100%) | 86.7% (69.5 to 100%) | 85.7% (75.9 to 95.5%) | 31.3% |
| Wakahara et al., 200163 | CA-19-9 | 24 | 42 | 78 | 127 | 36.4% (24.8 to 48.0%) | 62.0% (55.3 to 68.6%) | 23.5% (15.3 to 31.8%) | 75.1% (68.6 to 81.7%) | 24.3% |
| Cherich et al., 200273 | CEA | 8 | 12 | 0 | 44 | 40.0% (18.5 to 61.5%) | 100% (93.2 to 100%) | 100% (62.5 to 100%) | 78.6% (67.8 to 89.3%) | 31.3% |
| Zakrzewska et al., 199966 | CEA | 7 | 63 | 0 | 26 | 10.0% (3.0 to 17.0%) | 100% (88.5 to 100%) | 100% (57.1 to 100%) | 29.2% (19.8 to 38.7%) | 72.9% |
| Mabrouk and Ali-Labib, 200374 | c-erb-2 | 4 | 16 | 4 | 16 | 20.0% (2.5 to 37.5%) | 80.0% (62.5 to 97.5%) | 50.0% (15.4 to 84.6%) | 50.0% (32.7 to 67.3%) | 50.0% |
| Inaba et al., 199575 | CYFRA 21-1 | 48 | 27 | 3 | 137 | 64.0% (53.1 to 74.9%) | 97.9% (95.5 to 100%) | 94.1% (87.7 to 100%) | 83.5% (77.9 to 89.2%) | 34.9% |
| Tempfer et al., 199876 | CYFRA 21-1 | 15 | 22 | 2 | 38 | 40.5% (24.7 to 56.4%) | 95.0% (88.2 to 100%) | 88.2% (72.9 to 100%) | 63.3% (51.1 to 75.5%) | 48.1% |
| Gorelik et al., 200568 | EGF | 37 | 7 | 19 | 63 | 84.1% (73.3% to 94.9%) | 76.7% (67.5 to 85.9%) | 66.1% (53.7 to 78.5%) | 90.0% (83.0 to 97.0%) | 34.9% |
| Kim et al., 200377 | Epithelial cell adhesion molecule | 22 | 30 | 2 | 50 | 42.3% (28.9 to 55.7%) | 96.2% (90.9 to 100%) | 91.7% (80.6 to 100%) | 62.5% (51.9 to 73.1%) | 50.0% |
| Hefler et al., 200078 | FAS | 28 | 24 | 3 | 62 | 53.0% (39.4 to 66.6%) | 95.0% (89.7 to 100%) | 90.3% (79.9 to 100%) | 72.1% (62.6 to 81.6%) | 44.4% |
| Gorelik et al., 200568 | G-CSF | 32 | 12 | 21 | 61 | 72.7% (59.5 to 85.9%) | 74.4% (65.0 to 83.8%) | 60.4% (47.2 to 73.5%) | 83.6% (75.1 to 92.1%) | 34.9% |
| Diamandis et al., 200379 | hK6 | 69 | 77 | 12 | 226 | 47.0% (38.9 to 55.1%) | 95.0% (92.2 to 97.8%) | 85.2% (77.4 to 92.9%) | 74.6% (69.7 to 79.5%) | 38.0% |
| Luo et al., 200180 | hK10 | 62 | 18 | 0 | 42 | 77.5% (68.3 to 86.7%) | 100% (92.9 to 100%) | 100% (95.2 to 100%) | 70.0% (58.4 to 81.6%) | 65.6% |
| Berek et al., 199181 | IL-6 | 18 | 18 | 2 | 10 | 50.0% (33.7 to 66.3%) | 83.3% (62.2 to 100%) | 90% (76.9 to 100%) | 35.7% (18.0 to 53.5%) | 75.0% |
| Gorelik et al., 200568 | IL-6 | 37 | 7 | 11 | 71 | 84.1% (73.3% to 94.9%) | 86.0% (78.5 to 93.5.%) | 77.1% (65.2 to 89.0%) | 91.0% (84.7 to 97.4%) | 34.9% |
| Gorelik et al., 200568 | IL-8 | 39 | 5 | 25 | 57 | 88.6% (79.2 to 98.0%) | 69.8% (59.9 to 79.7%) | 60.9% (49.0 to 72.9%) | 91.9% (85.2 to 98.7%) | 34.9% |
| Gorelik et al., 200568 | MCP | 37 | 7 | 23 | 59 | 84.1% (73.3% to 94.9%) | 72.1% (62.4 to 81.8%) | 61.7% (49.4 to 74.0%) | 89.4% (82.0 to 96.8%) | 34.9% |
| van Haaften-Day et al., 200182 | M-CSF | 69 | 134 | 28 | 166 | 34.0% (27.5 to 40.0%) | 85.6% (80.6 to 90.5%) | 71.1% (62.1 to 80.2%) | 55.3% (49.7 to 61.0%) | 51.1% |
| Bon et al., 199683 | Mucin-like carcinoma-associated antigen | 29 | 47 | 0 | 70 | 38.2% (27.2 to 49.1%) | 100% (95.7 to 100%) | 100% (89.7 to 100%) | 59.8% (50.9 to 68.7%) | 52.1% |
| van Haaften-Day et al., 200182 | OVX1 | 38 | 165 | 16 | 178 | 18.7% (13.4 to 24.1%) | 91.8% (87.9 to 95.6%) | 70.4% (58.2 to 82.5%) | 51.9% (46.6 to 57.2%) | 51.1% |
| Onsrud et al., 199684 | p55 | 26 | 19 | 3 | 24 | 57.8% (43.3 to 72.2%) | 88.9% (77.0 to 100%) | 89.7% (78.6 to 100%) | 55.8% (41.0 to 70.7%) | 62.5% |
| Opala et al., 200585 | p55 | 28 | 23 | 1 | 15 | 54.9% (41.2 to 68.6%) | 93.8% (81.9 to 100%) | 96.6% (89.9 to 100%) | 39.5% (23.9 to 55.0%) | 76.1% |
| Onsrud et al., 199684 | p75 | 7 | 38 | 1 | 26 | 15.6% (5.0 to 26.1%) | 96.3% (89.2 to 100%) | 87.5% (64.6 to 100%) | 40.6% (28.6 to 52.7%) | 62.5% |
| Opala et al., 200585 | p75 | 22 | 29 | 3 | 13 | 43.1% (29.5 to 56.7%) | 81.3% (62.1 to 100%) | 88.0% (75.3 to 100%) | 31.0% (17.0 to 44.9%) | 76.1% |
| Tsukishiro et al., 200586 | Secretory leukocyte protease inhibitor | 42 | 13 | 5 | 20 | 76.0% (64.7 to 87.3%) | 80.0% (64.3 to 95.7%) | 89.4% (80.5 to 98.2%) | 60.6% (43.9 to 77.3%) | 68.8% |
| Darai et al., 199887 | Serum cadherin | 11 | 5 | 0 | 52 | 68.8% (46.0 to 91.5%) | 100% (94.2 to 100%) | 100% (72.7 to 100%) | 91.2% (83.9 to 98.6%) | 23.5% |
| Baron et al., 200388 | Serum EGFR | 125 | 100 | 8 | 136 | 55.6% (49.1 to 62.0%) | 94.4% (90.7 to 98.2%) | 94.0% (89.9 to 98.0%) | 57.6% (51.3 to 63.9%) | 60.9% |
| Baron et al., 200589 | Serum EGFR | 141 | 84 | 86 | 160 | 62.7% (56.3 to 69.0%) | 65.0% (59.1 to 71.0%) | 62.1% (55.8 to 68.4%) | 65.6% (59.6 to 71.5%) | 47.8% |
| Udagawa et al., 199890 | Serum GAT | 68 | 68 | 13 | 285 | 50.0% (41.6 to 58.4%) | 95.6% (93.3 to 97.9%) | 84.0% (76.0 to 91.9%) | 80.7% (76.6 to 84.9%) | 31.3% |
| Sedlaczek et al., 200291 | sIL to 2R | 54 | 13 | 1 | 31 | 80.6% (71.1 to 90.1%) | 96.9% (90.8 to 100%) | 98.2% (94.7 to 100%) | 70.5% (57.0 to 83.9%) | 67.8% |
| Hurteau et al., 199592 | Soluble IL-2 receptor alpha | 37 | 2 | 58 | 3 | 94.9% (87.9 to 100%) | 4.9% (0 to 10.3%) | 38.9% (29.1 to 48.8%) | 60.0% (17.1 to 100%) | 39.0% |
| Opala et al., 200393 | Soluble intra-cellular adhesion molecule 1 | 42 | 9 | 7 | 9 | 82.4% (71.9 to 92.8%) | 56.3% (31.9 to 80.6%) | 85.7% (75.9 to 95.5%) | 50.0% (26.9 to 73.1%) | 76.2% |
| McIntosh et al., 200494 | Soluble mesothelin-related marker (benign masses) | 15 | 37 | 4 | 216 | 28.8% (16.5 to 41.2%) | 98.2% (96.4 to 99.9%) | 78.9% (60.6 to 97.3%) | 85.4% (81.0 to 89.7%) | 19.1% |
| Sedlaczek et al., 200291 | TPS | 53 | 14 | 6 | 26 | 79.1% (69.4 to 88.8%) | 81.3% (67.7 to 94.8%) | 89.8% (82.1 to 97.5%) | 65.0% (50.2 to 79.8%) | 67.8% |
| Medl et al., 199595 | TATI | 75 | 40 | 67 | 200 | 65.2% (56.5 to 73.9%) | 74.9% (69.7 to 80.1%) | 52.8% (44.6 to 61.0%) | 83.3% (78.6 to 88.0%) | 30.1% |
| Peters-Engl et al., 199596 | TATI | 114 | 66 | 60 | 154 | 63.3% (56.3 to 70.4%) | 72.0% (65.9 to 78.0%) | 65.5% (58.5 to 72.6%) | 70.0% (63.9 to 76.1%) | 45.7% |
| Schutter et al., 199997 | Urinary gonadotropin peptide | 7 | 2 | 7 | 14 | 78.0% (50.9 to 100%) | 65.0% (44.6 to 85.4%) | 50.0% (23.8 to 76.2%) | 87.5% (71.3 to 100%) | 30.0% |
Abbreviations: CI = confidence interval; FN = false negative; FP = false positive; NPV = negative predictive value; PPV = positive predictive value; TN = true negative; TP = true positive
Symptomatic women - DNA variations. We did not identify any studies that allowed estimation of sensitivity and specificity of inherited or acquired mutations in detecting ovarian cancer.
Symptomatic women - gene expression. We identified one study that reported on the sensitivity of cytological tests of ascitic fluid for the presence of a series of genes believed to be activated in ovarian cancer;98 although specificities were universally high, sensitivities ranged from 8 to 60 percent, with wide confidence intervals for both values. These low sensitivities were even more striking given the high prevalence of ovarian cancer in the samples (61 percent).
We identified one study that used immunohistochemistry for a range of gene products in the diagnosis of ovarian cancer in ovarian tissue;99 performance for different markers ranged widely. Because there were only 20 ovarian cancer patients (out of a total of 70), confidence intervals were wide. In addition, no data were provided on the reproducibility of the assay or interpretation of results.
Another study measured cytokeratin 19 (CK19) expression in peripheral blood mononuclear cells;100 false-positive rates were quite high in both discriminating cancer from benign ovarian tumors (specificity 28.6 percent) and cancer from normal controls (specificity 40.0 percent). Again, confidence intervals were very wide.
No other studies directly reported the sensitivity and specificity of gene expression patterns identified through the use of microarray technology.
Symptomatic women - proteomics. Studies of protein profiles as a potential tool for early diagnosis of ovarian cancer have attracted considerable attention recently. However, all of the identified studies examined test performance using databases; none have been tested in a clinical population.
In general, single gene products other than CA-125 have not been shown to be useful in the diagnosis of ovarian cancer, either in symptomatic or asymptomatic women. Small sample sizes, lack of detail on the prediagnosis history of patients, and an unrealistically high prevalence of ovarian cancer in the majority of studies make it difficult to assess how any of these tests would perform in clinical practice.
Estimating the clinical value of more complex tests (those using multiple gene and/or protein markers) is even more difficult. Studies of protein expression, in particular, are limited by lack of consensus on appropriate statistical methods, small sample sizes with substantially higher prevalences of ovarian cancer than would be found in the general population, lack of reproducibility, and uncertainty about the specificity of the biological processes resulting in the observed protein patterns. Most importantly, none have been tested in clinical populations.
Question 3 is: What is the evidence that genomic testing to detect ovarian cancer in asymptomatic women, including high-risk women, changes clinical management and leads to improved health outcomes?
We searched for articles related to the use of genomic tests in screening for ovarian cancer in asymptomatic women, including any studies that focused on screening in women previously identified as being at greater risk on the basis of family history or other genomic tests. We excluded studies of CA-125 that were previously reviewed in a report for the USPSTF;30 we did, however, review studies published subsequent to the USPSTF review.
In the review of ovarian cancer screening for the USPSTF,30 studies did consistently show a greater prevalence of Stage I ovarian cancer among women screened with CA-125 (based on small numbers of cancers), but there were no data on the impact of screening on mortality.
In the BRCA1 and BRCA2 review,31 there were “limited” data on the efficacy of intensive screening among carriers, and no prospective studies of chemoprevention (especially oral contraceptives) or tubal ligation, although some suggestion from observational studies that both of those interventions might reduce ovarian cancer risk. In three retrospective studies and one cohort study, prophylactic oophorectomy reduced the risk of ovarian cancer by 85 to 100 percent, although the authors of the review noted that the confidence interval for risk reduction crossed 1.0 in the prospective study.
We did not identify any studies of the use of genomic tests for screening asymptomatic women in any risk group that met our inclusion criteria.
To date, no test has been shown to have acceptable sensitivity and specificity for screening for ovarian cancer, or to reduce the morbidity and mortality associated with ovarian cancer. Because definitive diagnosis of ovarian cancer requires surgery, a high level of specificity is needed in order to minimize the costs and potential complications of unnecessary surgery; although screening in high-risk groups could theoretically have better outcomes (because a higher pretest probability of cancer should result in better positive predictive values), this has not been demonstrated in adequately designed studies. One study of 1,610 women at increased risk because of family history found a positive predictive value for ovarian cancer of less than five percent (3 of 61 abnormal tests),101 a value similar to that observed in the 28,000 average-risk women in the PLCO study (3.7 percent).61 The ability to detect early cancer in these populations, even with intensive screening, may be limited; a study of 291 high-risk women who were screened every 6 months with ultrasound and CA-125 detected early stage ovarian cancer in only one of the eight women who developed ovarian or peritoneal cancer during 10 years of followup; five of the eight had had normal screening tests within 6 months of diagnosis.18 Other studies in similar populations have reported similar findings.16, 19 The degree to which the lack of effectiveness of screening is due to insufficient test sensitivity rather than the inherent biology of ovarian cancer is discussed further in the section on modeling.
We found no articles on the use of genomic tests (other than CA-125) for detecting ovarian cancer in asymptomatic women, regardless of risk group.
Question 4 is: What is the evidence that genomic testing in women with clinical suspicion of ovarian cancer or with already-diagnosed ovarian cancer changes clinical management and leads to improved health outcomes?
Studies included in this section reported data on the association of genomic test results with either a change in clinical management or a health outcome related to a particular management strategy. For example, genomic tests whose results were associated with response to therapy are included here. We did not, however, include studies that related genomic tests strictly to prognosis, for example, describing survival differences based on genomic test results. Similarly, if genomic test results were associated with staging data, we did not include these studies here despite the fact that staging may in turn be used to select treatment. Our notion is that we were primarily interested in how genomic testing could inform clinical management beyond usual clinical staging, which is already routinely used to guide therapy.
We found no studies that compare two groups of women, one of which underwent genomic testing and one of which did not. Ideally, such a study would be prospective with random allocation to the groups. In fact, we did not encounter any non-randomized comparative studies, either prospective or retrospective, that compare management or health outcomes in two such groups. Therefore, the following review considers only uncontrolled studies describing the association of test-positive and test-negative women with management and health outcomes. This design limits the certainty with which one might infer that applying the test in clinical practice could result in improved management decisions or health outcomes compared to not applying the test.
Women with clinical suspicion of ovarian cancer. We found no studies that describe evidence regarding change in management or health outcome resulting from use of genomic testing in women with a clinical suspicion of ovarian cancer. A large number of studies discussed under Question 2 describe the diagnostic accuracy of genomic tests in women with a clinical suspicion of ovarian cancer (based on symptoms), but none of these studies, nor any others screened in the literature search, described clinical management changes or health outcomes resulting from these tests.
Women with already diagnosed ovarian cancer. The most studied use of genomic tests was for predicting or detecting response to treatment after debulking therapy and adjuvant chemotherapy. The studies were of two types. First, several studies sought to predict which patients would have a favorable response to chemotherapy (e.g., complete or partial response vs. stable or progressive disease). A second goal of studies was to predict, among women who appeared to have no evidence of disease on clinical evaluation, who would have evidence of disease on second-look laparotomy (SLL). Finally, several studies related genomic test results at the time of primary debulking surgery with the ability to achieve optimal cytoreduction.
Predicting favorable response to chemotherapy. We found six studies describing the association between CA-125 and favorable response to chemotherapy;79, 102–106 these studies used a wide range of threshold values, from 10 to 500 U/mL. In addition, the following tests were described in one study each: human kallikrein 6 (hK6);79 low-density lipoprotein receptor-related protein (LRP), multidrug resistance protein (MRP), and P-glycoprotein (Pgp);107 multidrug resistance gene 1 (MDR-1), MRP-1, and MRP-2;108 TP53;109 c-erb-B2;110 and human kallikrein 10 (hK10).111
| Test/Study | TP | FN | FP | TN | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Probability of response to chemotherapy |
|---|---|---|---|---|---|---|---|---|---|
| CA-125 | |||||||||
| Balbi et al., 2005102 | 32 | 8 | 2 | 24 | 80.00% (65.2 to 89.5%) | 92.30% (75.9 to 97.9%) | 94.10% (80.9 to 98.4%) | 75.00% (57.9 to 86.7%) | 60.60% |
| Balbi et al., 2005102 | 22 | 4 | 2 | 19 | 84.60% (66.5 to 93.8%) | 90.50% (71.1 to 97.3%) | 91.70% (74.2 to 97.7%) | 82.60% (62.9 to 93.0%) | 55.30% |
| Rustin et al., 2001104 | 80 | 5 | 1 | 2 | 94.10% (87.0 to 97.5%) | 66.70% (20.8 to 93.9%) | 98.80% (93.3 to 99.8%) | 28.60% (8.2 to 64.1%) | 96.60% |
| Rustin et al., 1996105 | 73 | 12 | 4 | 42 | 85.90% (76.9 to 91.7%) | 91.30% (79.7 to 96.6%) | 94.80% (87.4 to 98.0%) | 77.80% (65.1 to 86.8%) | 64.90% |
| Gronlund et al., 2004106 | 27 | 1 | 14 | 30 | 96.40% (82.3 to 99.4%) | 68.20% (53.4 to 80.0%) | 65.90% (50.5 to 78.4%) | 96.80% (83.8 to 99.4%) | 38.90% |
| Gadducci et al., 2004103 | 26 | 16 | 10 | 19 | 61.90% (46.8 to 75.0%) | 65.50% (47.3 to 80.1%) | 72.20% (56.0 to 84.2%) | 54.30% (38.2 to 69.5%) | 59.20% |
| (CA-125 half life) | |||||||||
| Gadducci et al., 2004103 | 26 | 17 | 10 | 19 | 60.50% (45.6 to 73.6%) | 65.50% (47.3 to 80.1%) | 72.20% (56.0 to 84.2%) | 52.80% (37.0 to 68.0%) | 59.70% |
| (CA-125% reduction) | |||||||||
| hk6 | |||||||||
| Diamandis et al., 200379 | 17 | 4 | 46 | 61 | 81.00% (60.0 to 92.3%) | 57.00% (47.5 to 66.0%) | 27.00% (17.6 to 39.0%) | 93.80% (85.2 to 97.6%) | 16.40% |
| MDR-1 | |||||||||
| Kamazawa et al., 2002108 | 21 | 0 | 1 | 5 | 100.00% (84.5 to 100.0%) | 83.30% (43.6 to 97.0%) | 95.50% (78.2 to 99.2%) | 100.00% (56.6 to 100.0%) | 77.80% |
| TP53 | |||||||||
| Kupryjandzyk et al., 2003109 | 98 | 57 | 37 | 37 | 63.20% (55.4 to 70.4%) | 50.00% (38.9 to 61.1%) | 72.60% (64.5 to 79.4%) | 39.40% (30.1 to 49.5%) | 67.70% |
| c-erb-B2 | |||||||||
| Lassus et al., 2004110 | 234 | 66 | 30 | 51 | 78.00% (73.0 to 82.3%) | 63.00% (52.1 to 72.7%) | 88.60% (84.2 to 91.9%) | 43.60% (34.9 to 52.6%) | 78.70% |
| hk10 | |||||||||
| Luo et al., 2003111 | 74 | 44 | 7 | 14 | 62.70% (53.7 to 70.9%) | 66.70% (45.4 to 82.8%) | 91.40% (83.2 to 95.8%) | 24.10% (15.0 to 36.5%) | 84.90% |
| Pgp | |||||||||
| Izquierdo et al., 1995107 | 32 | 9 | 8 | 0 | 78.00% (63.3 to 88.0%) | 0.00% (0.0 to 32.4%) | 80.00% (65.2 to 89.5%) | 0.00% (0.0 to 29.9%) | 83.70% |
| MRP | |||||||||
| Izquierdo et al., 1995107 | 13 | 28 | 3 | 5 | 31.70% (19.6 to 47.0%) | 62.50% (30.6 to 86.3%) | 81.30% (57.0 to 93.4%) | 15.20% (6.7 to 30.9%) | 83.70% |
| LRP | |||||||||
| Izquierdo et al., 1995107 | 5 | 36 | 5 | 3 | 12.20% (5.3 to 25.5%) | 37.50% (13.7 to 69.4%) | 50.00% (23.7 to 76.3%) | 7.70% (2.7 to 20.3%) | 83.70% |
Abbreviations: CI = confidence interval; FN = false negative; FP = false positive; NPV = negative predictive value; PPV = positive predictive value; TN = true negative; TP = true positive
Predicting residual disease on SLL. A second goal of studies was to predict, among women who appeared to have no evidence of disease on clinical evaluation, who would have evidence of disease on SLL. A test might be clinically useful if it could predict with a high sensitivity which patients with clinically undetectable disease might have cancer progression on SLL. Such a test might obviate the need for SLL or at least improve the accuracy of clinical staging. CA-125 is one marker used to detect early recurrence.
We found five studies describing the association between CA-125 and positive disease on SLL.112–116 Two studies described cancer-associated serum antigen (CASA).113, 114 In addition, the following markers were described in one study each: Cathespin-D and nm23;117 p53, murine double minute protein (Mdm2), and Bcl-2;118 interleukin 6 (IL-6);81 cytokeratin fragment 21-1 (CYFRA 21-1);119 tetranectin (TN);113 and TPS.115
Three reports classified patients with microscopic disease as disease-negative,81, 117, 118 while the remaining studies classified macroscopic or microscopic disease at SLL as disease-positive.
| Test/Study | TP | FN | FP | TN | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Probabilty of residual disease |
|---|---|---|---|---|---|---|---|---|---|
| CA-125 | |||||||||
| Folk et al., 1995112 | 3 | 24 | 0 | 18 | 11.1% (3.9 to 28.1%) | 100.0% (82.4 to 100.0%) | 100.0% (43.9 to 100.0%) | 42.9% (29.1 to 57.8%) | 60.0% |
| Hogdall et al., 1996113 | 21 | 17 | 0 | 30 | 55.3% (39.7 to 69.9%) | 100.0% (88.6 to 100.0%) | 100.0% (84.5 to 100.0%) | 63.8% (49.5 to 76.0%) | 55.9% |
| Senapad et al., 2000115 | 11 | 8 | 11 | 3 | 57.9% (36.3 to 76.9%) | 21.4% (7.6 to 47.6%) | 50.0% (30.7 to 69.3%) | 27.3% (9.7 to 56.6%) | 57.6% |
| CA-125 > 10 | |||||||||
| Kierkegaard et al., 1995114 | 23 | 335 | 0 | 35 | 6.4% (4.3 to 9.5%) | 100.0% (90.1 to 100.0%) | 100.0% (85.7 to 100.0%) | 9.5% (6.9 to 12.9%) | 91.1% |
| CA-125 > 15 | |||||||||
| Wong et al., 2000116 | 5 | 23 | 1 | 17 | 17.9% (7.9 to 35.6%) | 94.4% (74.2 to 99.0%) | 83.3% (43.6 to 97.0%) | 42.5% (28.5 to 57.8%) | 60.9% |
| CA-125 > 35 | |||||||||
| CASA | |||||||||
| Hogdall et al., 1996113 | 12 | 26 | 0 | 29 | 31.6% (19.1 to 47.5%) | 100.0% (88.3 to 100.0%) | 100.0% (75.8 to 100.0%) | 52.7% (39.8 to 65.3%) | 56.7% |
| Kierkegaard et al., 1995114 | 13 | 45 | 0 | 35 | 22.4% (13.6 to 34.7%) | 100.0% (90.1 to 100.0%) | 100.0% (77.2 to 100.0%) | 43.8% (33.4 to 54.7%) | 62.4% |
| Cathepsin D | |||||||||
| Baekelandt et al., 1999117 | 22 | 14 | 78 | 61 | 61.1% (44.9 to 75.2%) | 43.9% (35.9 to 52.2%) | 22.0% (15.0 to 31.1%) | 81.3% (71.1 to 88.5%) | 20.6% |
| CYFRA 21-1 | |||||||||
| Gadducci et al., 2001119 | 20 | 4 | 5 | 6 | 83.3% (64.1 to 93.3%) | 54.5% (28.0 to 78.7%) | 80.0% (60.9 to 91.1%) | 60.0% (31.3 to 83.2%) | 68.6% |
| p53 | |||||||||
| Baekelandt et al., 1999118 | 20 | 16 | 64 | 75 | 55.6% (39.6 to 70.5%) | 54.0% (45.7 to 62.0%) | 23.8% (16.0 to 33.9%) | 82.4% (73.3 to 88.9%) | 20.6% |
| Ayhan et al., 1998120 | 9 | 6 | 5 | 10 | 60.0% (35.7 to 80.2%) | 66.7% (41.7 to 84.8%) | 64.3% (38.8 to 83.7%) | 62.5% (38.6 to 81.5%) | 50.0% |
| Other | |||||||||
| Baekelandt et al., 1999117 | 24 | 12 | 100 | 39 | 66.7% (50.3 to 79.8%) | 28.1% (21.3 to 36.0%) | 19.4% (13.4 to 27.2%) | 76.5% (63.2 to 86.0%) | 20.6% |
| nm23 | |||||||||
| Baekelandt et al., 1999118 | 4 | 32 | 25 | 114 | 11.1% (4.4 to 25.3%) | 82.0% (74.8 to 87.5%) | 13.8% (5.5 to 30.6%) | 78.1% (70.7 to 84.0%) | 20.6% |
| Mdm2 | |||||||||
| Baekelandt et al., 1999118 | 19 | 17 | 49 | 90 | 52.8% (37.0 to 68.0%) | 64.7% (56.5 to 72.2%) | 27.9% (18.7 to 39.6%) | 84.1% (76.0 to 89.8%) | 20.6% |
| Bcl-2 | |||||||||
| Berek et al., 199181 | 16 | 5 | 2 | 13 | 76.2% (54.9 to 89.4%) | 86.7% (62.1 to 96.3%) | 88.9% (67.2 to 96.9%) | 72.2% (49.1 to 87.5%) | 58.3% |
| IL-6 | |||||||||
| Hogdall et al., 1996113 | 9 | 29 | 0 | 30 | 23.7% (13.0 to 39.2%) | 100.0% (88.6 to 100.0%) | 100.0% (70.1 to 100.0%) | 50.8% (38.4 to 63.2%) | 55.9% |
| TN | |||||||||
| Combination markers | |||||||||
| Kierkegaard et al., 1995114 | 27 | 31 | 0 | 35 | 46.6% (34.3 to 59.2%) | 100.0% (90.1 to 100.0%) | 100.0% (87.5 to 100.0%) | 53.0% (41.2 to 64.6%) | 62.4% |
| Senapad et al., 2000115 | 11 | 8 | 13 | 1 | 57.9% (36.3 to 76.9%) | 7.1% (1.3 to 31.5%) | 45.8% (27.9 to 64.9%) | 11.1% (2.0 to 43.5%) | 57.6% |
Abbreviations: CI = confidence interval; FN = false negative; FP = false positive; NPV = negative predictive value; PPV = positive predictive value; TN = true negative; TP = true positive
Studies also differ with regard to the separation in time between the time at which the marker was measured and the time of SLL. The immunohistochemical tests are based on surgical samples, while serum markers were measured after surgery, after adjuvant chemotherapy, and immediately prior to SLL.
Predicting ability to perform optimal cytoreduction. Several studies evaluated genomic tests for their ability to predict whether optimal cytoreduction (by surgical debulking) was possible. Definitions for optimal cytoreduction were identical between studies, based on the Gynecology Oncology Group criteria,121 requiring no residual tumor masses > 1 cm at debulking surgery.
| Study/Test | TP | FN | FP | TN | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Probability of optimal cytoreduction |
|---|---|---|---|---|---|---|---|---|---|
| Berchuck et al., 2004122 | 20 | 5 | 7 | 12 | 80.0% (60.9 to 91.1%) | 63.2% (41.0 to 80.9%) | 74.1% (55.3 to 86.8%) | 70.6% (46.9 to 86.7%) | 56.8% |
| Chip | |||||||||
| Diamandis et al., 200379 | 53 | 28 | 9 | 40 | 65.4% (54.6 to 74.9%) | 81.6% (68.6 to 90.0%) | 85.5% (74.7 to 92.2%) | 58.8% (47.0 to 69.7%) | 62.3% |
| hK6 | |||||||||
| Gemer et al., 2001123 | 10 | 6 | 4 | 20 | 62.5% (38.6 to 81.5%) | 83.3% (64.1 to 93.3%) | 71.4% (45.4 to 88.3%) | 76.9% (57.9 to 89.0%) | 40.0% |
| CA-125 | |||||||||
| Memarzadeh et al., 2003124 | 14 | 12 | 31 | 42 | 53.8% (35.5 to 71.2%) | 57.5% (46.1 to 68.2%) | 31.1% (19.5 to 45.7%) | 77.8% (65.1 to 86.8%) | 26.3% |
| CA-125 | |||||||||
| Obeidat et al., 2004125 | 13 | 5 | 6 | 16 | 72.2% (49.1 to 87.5%) | 72.7% (51.8 to 86.8%) | 68.4% (46.0 to 84.6%) | 76.2% (54.9 to 89.4%) | 45.0% |
| CA-125 | |||||||||
| Saygili et al., 2002126 | 33 | 11 | 12 | 36 | 75.0% (60.6 to 85.4%) | 75.0% (61.2 to 85.1%) | 73.3% (59.0 to 84.0%) | 76.6% (62.8 to 86.4%) | 47.8% |
| CA-125 | |||||||||
| Gemer et al., 2005127 | 126 | 56 | 138 | 104 | 69.2% (62.5 to 72.9%) | 57.0% (50.8 to 63.2%) | 54.8% (48.4 to 61.2%) | 71.1% (64.8 to 77.5%) | 42.9% |
| CA-125 | |||||||||
Abbreviations: CI = confidence interval; FN = false negative; FP = false positive; NPV = negative predictive value; PPV = positive predictive value; TN = true negative; TP = true positive
The vast majority of the available literature on the use of genomic tests in the management of patients with ovarian cancer consists of serum measurement of single-gene products, particularly CA-125, as predictors of (a) initial response to chemotherapy; (b) complete resolution of disease (i.e., negative SLL); or (c) the ability to perform optimal cytoreduction. There is also a substantial literature that reports on the association of various genomic tests with prognosis, but the majority of these studies were excluded because they did not describe patient management.
The studies that sought to predict initial response to chemotherapy were generally performed in unselected women with ovarian cancer, not just those with optimal debulking, for example. Although, in theory, there is significant benefit, both clinically and in the research setting, of being able to predict who will respond to chemotherapy, we did not identify any studies that demonstrated this; there were no studies, for example, that compared different chemotherapeutic regimens based on test results.
Studies evaluating the association of genomic test results with second-look surgery were commonly limited to women who had appeared to have a complete response to initial debulking and chemotherapy. In these studies serum markers were often measured close to the time of SLL (e.g., at the end of chemotherapy or immediately prior to SLL); however, immunohistochemical tests were usually based on tissue obtained at the time of primary surgery. SLL is sometimes used to evaluate women who appear clinically to have had a complete response, since other techniques, such as CA-125 and imaging, are fairly insensitive for very small disease. However, most of the data suggests that there is no substantial survival benefit to SLL, even if residual tumor is removed; a Gynecologic Oncology Group non-randomized study reported a difference in median survival of only 1 month.128 Thus, the SLL might be more properly though of as a test for monitoring disease or the outcome of treatment (a reference standard) rather than as a therapeutic option itself; the potential benefit of better sensitivity at detecting residual disease would be the ability to avoid the need for SLL altogether (and its concomitant cost and morbidity).
Finally, the prediction of optimal debulking is potentially helpful; for example, patients might benefit by referral to particularly expert surgeons, or to research protocols. Patients unlikely to obtain optimal debulking could be selected for neoadjuvant chemotherapy to improve the likelihood of debulking success; however, this strategy has not been tested.129 As with tests for response to chemotherapy, there are, as yet, no studies prospectively demonstrating improved patient outcomes from such a strategy.
Although there is a reasonable amount of data on the association between genomic tests, particularly CA-125, and the likelihood of different clinical outcomes, we did not identify any studies that provided evidence for changes in management leading to improved outcomes based on the results of the tests.
Question 5 is: What are the harms of using genomic tests for ovarian cancer prevention and management?
The nature of the potential harms associated with genomic testing in ovarian cancer varies depending on the potential application of the test:
Testing for increased risk of ovarian cancer. Potential harms associated with testing for inherited or acquired genetic changes that are associated with increased risk of ovarian cancer include:
The harms associated with the management of women who have positive results. These include complications of primary preventive therapy (for example, surgical complications from prophylactic oophorectomy) and sequelae of the therapy (loss of fertility, premature menopause). For strategies involving more frequent screening, the impact on time and any discomforts associated with the screening test need to be considered.
The effects on quality of life and other psychological measures of a diagnosis that provides knowledge of an increased risk of disease, with little direct evidence for the benefit of management strategies.
The potential impact on decisions about childbearing for inherited mutation.
Screening for early ovarian cancer, diagnosis of ovarian cancer. The main potential harm associated with the use of genomic tests for screening for ovarian cancer or for the diagnosis of ovarian cancer is the risk of a false-positive test result, with potential for anxiety about the diagnosis, as well as the risks of definitive diagnostic surgery.14
Testing for targets for specific therapy. The main potential harm associated with testing for specific targets is a false-positive result, which would lead to inappropriately exposing a patient to the risks of the targeted treatment, provision of ineffective treatment, and delayed start of potentially more effective therapy.
We searched for studies that described these classes of adverse outcomes for any type of genomic testing, excluding studies covered in previous reviews of the potential harms of screening using CA-125 and BRCA1/2 conducted for the USPSTF,14, 30, 31 but including articles published after the inclusion dates for these reviews.
The review of ovarian cancer screening for the USPSF did not identify any specific articles describing harms of screening, but pointed out the low positive predictive value of available tests and the large number of unnecessary surgeries.30 The adnexal mass evidence report was unable to draw any conclusions about the potential harms of false-positive surgeries because of limitations in the literature, primarily a failure to distinguish the preoperative indication for surgery from the postoperative findings.14
The review of BRCA1/2 testing identified relatively few studies addressing the harms of testing.31 Only one study reported complications of prophylactic oophorectomy in carriers (4 of 80 women). Quality-of-life studies which involved only prophylactic oophorectomy (as opposed to prophylactic mastectomy with or without oophorectomy) were inconclusive.31
We did not identify any articles that specifically described the harms associated with genomic testing for ovarian cancer. In the PLCO study, 62 of 402 women with an elevated CA-125 underwent surgical biopsy; of these, 16 had any neoplasm, with 13 (3.7 percent) having an invasive cancer.61 The paper did not report whether there were any complications from these surgeries.
McInerney-Leo and colleagues published two papers from the same large study. In the first paper,130 they reported data on a variety of psychological measures, including depression, self-esteem, and cancer-related distress, in 212 adult members of families with documented BRCA1/2 mutations before and after counseling and possible testing. Intrusive thoughts, depressive symptoms, and breast and ovarian cancer worries improved in subjects with negative test results, but there was no change in those with positive results, or in those who declined testing.
In the second paper, McInerney-Leo and colleagues131 measured the impact of BRCA1/2 testing on family relationships using a validated index in the same study population. Interestingly, subjects who declined testing had more positive changes than those who accepted testing. In those who accepted testing, there was a non-significant trend towards decreased expressiveness among family members in those who had an abnormal test result.
Two studies reported measures separately for ovarian and breast cancer in women at risk for BRCA1 and BRCA2 mutations. In the first, Bish and colleagues132 collected data on a variety of quality-of-life measures in 203 subjects undergoing counseling regarding BRCA1 and 2 mutation testing because of family histories, and found that (1) worry about ovarian cancer was significantly less than worry for breast cancer; (2) worry about ovarian cancer was highest in women with a personal history of cancer, independent of the degree of risk or results of testing and (3) there was no overall change in worry about ovarian cancer in response to testing.
Claes and colleagues133 performed a similar study in 71 similar subjects. As in the Bish study, there were differences in responses by cancer type: risk perception and distress were higher for breast cancer than for ovarian cancer. After testing, distress related to ovarian cancer was higher in carriers then in non-carriers, but was not signficiantly different from baseline levels by 12 months posttesting. Women who underwent prophylactic oophorectomy had decreased levels of concern, but higher levels of somatic symptoms.
Only two articles specifically reported outcomes relevant to ovarian cancer. The majority of the available literature focuses on BRCA1/2 testing, but does not report results separately for ovarian and breast cancer outcomes. The differences observed in the studies we did identify suggest that testing for genetic markers of ovarian cancer susceptibility alone may have different implications compared to testing for genes that affect both breast and ovarian cancer risk.
For the most part, the potential harms associated with the use of genomic tests in screening, diagnosis, and management of ovarian cancer are no different than those of other tests, such as imaging: the risks of false-positive results leading to unnecessary and potentially dangerous treatment, as well as the psychological effects of a cancer diagnosis; and the risks of false-negative results leading to delayed diagnosis and therapy, with a potential for a poorer prognosis. The types of risks are similar - the only potential difference between genomic tests and other modalities lies in the quantitative risk of false-negative and false-positive results, which in turn depends on test sensitivity, specificity, and the pretest probability of disease. Higher quality evidence about the test characteristics of genomic tests for ovarian cancer should allow better estimation of these types of harms. The one type of test that might have implications similar to those for markers of increased risk would be a test for an inherited polymorphism that affected the likelihood of response to therapy; if such a polymorphism also affected the likelihood of responses or side effects for therapies for other conditions, the longer term implications for the patient and her family would have to be considered.
For tests of markers of increased risk, such as inherited mutations, there are some additional potential harms, namely, the impact of knowledge of increased risk when optimal options for reducing that risk, either through primary or secondary prevention, are unclear. Specifically, in the case of ovarian cancer, bilateral oophorectomy in premenopausal women affects childbearing potential and induces premature menopause; given the uncertainty about optimal methods for hormone replacement, this option may be even more confusing for some women. Because of varying degrees of penetrance, even the estimate of increased risk associated with a given mutation is subject to a fairly wide degree of uncertainty. Issues surrounding heritability in subsequent generations may also be important to some women. In this case, although the literature on BRCA testing is helpful in identifying some of these issues, there is a greater need for providing results specifically for ovarian cancer-related issues in studies of markers which affect the risk of several types of cancers. The available literature suggests that, for most women with BRCA mutations, breast cancer is a greater concern than ovarian cancer. Given the differences in both quantitative risk and the types of risks associated with testing, diagnosis, and prophylaxis, results need to be provided specifically for ovarian cancer-related outcomes in studies of BRCA testing.
The literature on the harms of genomic tests for ovarian cancer is sparse, with the majority of the available literature on psychological impacts of testing consisting of studies of women tested for susceptibility to both breast and ovarian cancer. Future studies will need not only to identify the short-term psychological impact of different test results, but also provide data on the outcomes of strategies used to reduce the risk of ovarian cancer in patients who undergo testing for susceptibility. The theoretical harms of genomic tests in the setting of screening, diagnosis, and management of ovarian cancer are similar to those for other types of tests and should ultimately be estimated based on better evidence for test characteristics and the effectiveness of management strategies based on test results.
Question 6 is: Has direct-to-consumer and direct-to-physician marketing of genomic tests on ovarian cancer increased the “appropriate” use (as defined by study investigators) of these tests?
We searched for articles that specifically measured responses by providers and/or patients to direct advertising campaigns. We also considered alternative sources of data on the nature and volume of direct-to-consumer and direct-to-provider marketing and identified methodological issues involved with utilizing these alternative sources.
We did not identify any articles specifically targeting ovarian cancer genomic testing. We identified two articles that investigated the impact of a single advertising campaign for BRCA1/2 testing, targeted at women at high risk for breast or ovarian cancer.134, 135
From September 2002 through February 2003, the U.S. manufacturer of BRCA1/2 testing conducted a pilot direct-to-consumer marketing campaign in Atlanta, Georgia, and Denver, Colorado. The campaign was targeted at women aged 25 to 54 with personal or family histories of breast and/or ovarian cancer, along with their healthcare providers. Television, radio, and print advertisements were generated to raise awareness about BRCA1/2 testing and to motivate women to ask their providers about how testing might help assess risk and change management. Providers received information and patient support materials prior to the beginning of the campaign. Although this marketing campaign was not designed as a research study, two groups were able to take advantage of the campaign and design studies to assess the impact of the campaign on test utilization.
In a study conducted by the Centers for Disease Control and Prevention (CDC) and the state health departments of Colorado, Georgia, North Carolina, and Washington,134 investigators conducted a survey of providers and consumers in the two pilot cities and two comparison metropolitan areas (Raleigh-Durham, NC, and Seattle, WA). From April 21 through May 20, 2004, a 51-question consumer telephone survey was conducted using random telephone numbers, with a target response of 1,600 women. Questions included family history; campaign awareness; interest in genetic testing for BRCA1/2; cancer concerns; and interactions with providers, family, and friends. A 35-question survey and monetary survey were mailed to providers (randomly selected to be proportionately representative of family practice, internal medicine, obstetrics/gynecology, and oncology) on May 1, 2003; the target response was approximately 1,600.
One thousand and six hundred and thirty-five (1,635) women completed the survey, for a response rate of 45 percent; most were non-Hispanic white women with more than high school education and a median age of 40 years. Women in the pilot cities were significantly more likely to have heard of the test, but no significant differences were observed in stated knowledge about genetic testing, concern about risk for breast and ovarian cancer, or proportion of women who had talked with someone about genetic testing. Family histories were similar among those who expressed an increased interest compared to those who did not.
One thousand and fifty-four (1,054) providers completed the survey (66 percent response rate). Providers in pilot cities were significantly more likely to report that they and their patients had been exposed to an advertisement about genetic testing and to report an increase in the number of patients asking about testing, asking for genetic counseling, and requesting testing. The number of tests ordered increased significantly in the pilot cities as well, although the number of referrals for counseling did not increase. Provider knowledge about testing did not differ between cities, but knowledge did differ between specialists, with obstetricians/gynecologists and oncologists having higher levels of knowledge.
Limitations noted by study investigators included lack of data on non-responders; the potential for bias because of the low response rate among consumers; a relatively short lag time between the advertising campaign and the survey, which might have been insufficient to allow all those interested in testing to undertake and complete the process; lack of availability of data on the number of tests actually performed and the appropriateness of those tests because of the proprietary nature of the tests; and lack of data on the appropriateness of education, counseling, and testing ordered by providers.
A separate study conducted by investigators at Kaiser Permanente Colorado compared utilization of testing before and after the advertising campaign to similar time periods using data from the Henry Ford Health System in Detroit, Michigan.135 Utilization assessment was through electronic records. The investigators noted a 240 percent increase in the number of referrals for genetic testing in Colorado during the advertising campaign compared to a similar time period 1 year prior to the campaign (from 144 referrals per average membership to 499 referrals per average membership), while no change was seen in Detroit (53 and 52 referrals per average membership during the two time periods).
Interestingly, although the absolute number of women with 10 percent or greater pretest probability of a mutation increased during the advertising campaign, the proportion of all referrals with a high pretest probability decreased from 69 percent to 48 percent in Denver, while no change was seen in Detroit. An increase in referrals from non-physician providers was noted.
The authors noted the difference between self-reported patient behavior in the CDC report and their observations; possible explanations included inaccurate self-report, differences in interest in testing between the general population and women with prepaid access to healthcare, concurrent education efforts by Kaiser Permanente, and discussion among women at workplaces or other settings with common insurance coverage.
We identified only two relevant articles on the impact of direct-to-consumer advertising on utilization of genomic tests, both involving the same advertising campaign. One study found evidence of increased awareness of the test covered in the advertising campaign among consumers, but no self-reported increase in knowledge or intention to get tested. Conversely, providers reported the perception of an increased number of patients discussing and requesting testing and reported ordering more tests. The second study used administrative data to measure test utilization within a managed care organization before and after the campaign and found an increase in the number of tests ordered, with a decrease in the proportion of women with a high pretest probability of a mutation.
There are a number of methodological issues involved in assessing the impact of advertising on genetic test utilization:
Definition of “appropriate” use of testing. Possible definitions include:
Use of the test only in those women with characteristics similar to those for whom the benefit of the test has been conclusively demonstrated, preferably through a randomized trial.
Use of the test only in those women with characteristics that meet criteria agreed upon by expert consensus.
Use of the test only in women who receive unbiased counseling on the state of knowledge regarding the benefits and harms of the test and who, based on their personal preferences, wish to have the test.
Use of the test only in women for whom use of the test is estimated to result in an acceptable cost-effectiveness ratio.
Given the state of the literature on genomic tests in ovarian cancer, consensus opinions on appropriateness are likely to be the only criteria available for the near future.
Measuring test utilization. There are several challenges to measuring test utilization:
As the CDC researchers noted, data on tests performed in private laboratories may not be publicly available; companies may view these data as proprietary information which, if available, would put them at a disadvantage in a competitive marketplace.
Test utilization data could be obtained through administrative data, such as from a managed care organization. The ability to link to clinical data could help in estimating “appropriateness” - estimation of pretest probability of BRCA1/2 mutations by the Kaiser researchers is an example of this. However, this type of data is also often proprietary and not readily accessible to outside researchers. In addition, depending on the data source, there may be issues about the generalizability of results, since presumably utilization of tests and other resources would be more tightly constrained within a managed care organization. Also, as the Kaiser researchers point out, women enrolled in managed care plans may be different in many respects from the general population.
Quantifying direct-to-consumer advertising. Challenges to measuring and quantifying direct-to-consumer advertising include:
Estimating exposure to various advertising in various types of media requires complex survey methodology.
Measurement of the impact of non-advertising coverage in the media, such as coverage in news reports of scientific meetings, journal publications, or other forums (such as congressional hearings), needs to be considered.
It is possible that other publicly available information could provide some insight into advertising; for example, annual reports of publicly traded companies might provide a breakdown of marketing expenses, although the extent to which specific inferences could be drawn about the nature of these marketing expenses is unclear.
Quantifying direct-to-physician advertising. Methdological challenges in measuring direct-to-physician advertising include:
Quantifying advertising in journals would require either hand searches or access to records from a specific journal. This would be particularly difficult for non-peer-reviewed journals, which are frequently not maintained in medical libraries, but which may represent a significant portion of the literature.
Quantifying other types of “advertising,” such as sponsorship of symposia at meetings, exhibits at meetings, sponsorship of continuing education activities at local hospitals, etc.
Study design. Even with a clear definition of “appropriate use” and methods for measuring utilization and advertising, there remain additional methodological issues. For example, although randomized trials would be ideal, it is difficult to imagine how one would be feasible. Before-and-after studies, with “exposed” and control populations, as were done with BRCA1/2 testing, appear to be the most practical, but require considerable planning, including, ideally, advance notice about the advertising campaign.
There are considerable methodological issues involved in determining the effect of direct-to-consumer and direct-to-physician marketing on the appropriate or inappropriate use of tests. We identified two studies of a single advertising campaign which suggested increased utilization of testing in the near term after a direct-to-consumer campaign, but provided little information on the appropriateness of the testing. The decrease in pretest probability observed after the advertising campaign suggests that, for some types of genomic tests, there may be a decrease in positive predictive value in response to advertising.
The results presented below represent initial calibration of the natural history models (one assuming that all cancers progress through Stage II, and one allowing direct progression to Stage III from Stage I).
| Age | Stage I | Stage II | Stage III/IV | ||||||
|---|---|---|---|---|---|---|---|---|---|
| SEER | Model 1* | Model 2* | SEER | Model 1* | Model 2* | SEER | Model 1* | Model 2* | |
| 45–54 | 25.77% | 20.66% | 21.18% | 8.25% | 9.71% | 5.81% | 65.98% | 69.63% | 73.02% |
| 55–64 | 17.71% | 19.57% | 20.33% | 7.29% | 9.45% | 5.79% | 75.00% | 70.98% | 73.88% |
| 65–74 | 12.63% | 19.14% | 19.98% | 6.32% | 9.34% | 5.78% | 81.05% | 71.53% | 74.24% |
| 75+ | 9.20% | 18.83% | 19.74% | 5.75% | 9.26% | 5.78% | 85.06% | 71.91% | 74.49% |
| All ages | 16.33% | 19.55% | 20.31% | 6.90% | 9.44% | 5.79% | 76.77% | 71.01% | 73.91% |
Model 1 assumes that patients must transition from Stage I ovarian cancer through Stage II before progressing to Stage III. Model 2 assumes that some proportion of patients proceed directly to Stage III ovarian cancer from Stage I.
Both models closely approximate the Surveillance, Epidemiology, and End Results (SEER) lifetime risk for women at age 40 of 1.44 percent; there is a slight underestimation of mortality risk, primarily for women 85 and older (which is likely due to the assumption of constant stage-specific probability of diagnosis; because of age-specific variations in access to care, prevalence of conditions mimicking ovarian cancer, etc., this assumption may be incorrect). Lifetime risk of dying from ovarian cancer within the SEER data is 1.13 percent; model predictions are 1.06 percent.
| Parameter | Value: Model 1 | Value: Model 2 |
|---|---|---|
| Annual probability of progression | ||
| Stage I to Stage II or III | 0.75 | 0.725 |
| Stage II to Stage III | 0.925 | 0.75 |
| Stage III to Stage IV | 0.35 | 0.35 |
| Proportion of Stage I tumors progressing directly to Stage III | 0 | 0.75 |
| Annual probability of detection | ||
| Stage I | 0.25 | 0.25 |
| Stage II | 0.25 | 0.4 |
| Stage III | 0.7 | 0.7 |
| Stage IV | 1 | 1 |
The wider variation in stage distribution observed between age groups within the SEER data compared to model predictions may be due to any of several factors:
There may be age-specific variations in the rates of progression between stages; for example, age-related changes in hormonal status or immune function may affect the likelihood of metastasis.
There may be age-specific variations in the rates of detection within stages; for example, older women may have their cancer detected at later stages because of less access to physicians, different thresholds for seeking care for symptoms, or delayed diagnosis because the non-specific symptoms of ovarian cancer frequently mimic other conditions common in older women.
SEER data is cross-sectional, and the model is simulating a cohort. There may be unmeasured cohort effects in exposure to risk factors for ovarian cancer, competing risks, etc., which are not captured.
Future versions of the model will explore these possibilities by allowing the probabilities of progression and detection to vary with age. Allowing either more rapid progression to Stage III/IV or lower probability of detection among older women, in particular, would result in a greater proportion of advanced stage disease and higher mortality rates, and result in a closer match to SEER data. However, the greater “precision” of such an approach must be balanced against the risks of introducing inaccuracies by “overfitting” a cohort model which does not incorporate potential cohort effects into cross-sectional data.
| Strategy | Model 1 | Model 2 |
|---|---|---|
| Primary prevention | Efficacy of primary prevention must be greater than 20% | |
| Interval screening | Screening (with 95% specific and 90% sensitive test) should be every 33 months or less | Screening (with 95% specific and 90% sensitive test) should be every 31 months or less |
| Interval screening | Assuming every 2-year screening, sensitivity must be greater than 67% | Assuming every 2-year screening, sensitivity must be greater than 69% |
| Genetic screening and interval screening of women with the mutation | Assuming every 2-year screening for positive patients, a genetic mutation needs to confer at least a 30× risk increase | Assuming every 2-year screening for positive patients, a genetic mutation needs to confer at least a 32× risk increase |
| Genetic screening and interval screening | Assuming every 2-year screening for positive patients, the genetic mutation needs to be prevalent in 60% of the population | Assuming every 2-year screening for positive patients, the genetic mutation needs to be prevalent in 64% of the population |
| Genetic screening and primary prevention in women with the mutation (effectiveness of primary prevention = 20%) | A genetic mutation needs to confer at least a 30× risk increase | A genetic mutation needs to confer at least a 30× risk increase |
| Genetic screening and primary prevention in women with the mutation (effectiveness of primary prevention = 20%) | The genetic mutation needs to be prevalent in 51% of the population | The genetic mutation needs to be prevalent in 52% of the population |
| Targeted treatment | If the targeted treatment reduces cancer mortality by 67% then the targeted risk factor needs to be prevalent in approximately 80% of the population. An 89% reduction would require 35% prevalence of the targeted risk factor. | If the targeted treatment reduces cancer mortality by 67% then the targeted risk factor needs to be prevalent in approximately 89% of the population. An 89% reduction would require 35% prevalence of the targeted risk factor. |
Addressing the original questions:
How effective would a primary prevention intervention need to be to reduce ovarian cancer deaths by 20 percent?
What combinations of test sensitivity and frequency result in at least a 20 percent reduction in mortality?
What combinations of (a) prevalence of a genetic mutation in the population and (b) relative risk associated with that mutation would result in the target 20 percent reduction in ovarian cancer deaths with either primary prevention (at various levels of effectiveness) or interval screening (at varying levels of sensitivity and specificity)?
How effective would a targeted treatment for ovarian cancer need to be (and in what proportion of the population would the marker for that treatment need to exist)? Note that we assume that targeted therapy would be equally effective across all stages of disease.
How do the test characteristics for targeted treatment or genetic screening affect the results?
How do the above results differ under the assumption that cancer must progress from Stage I to II (Model 1) and then III versus that assumption that ovarian cancer may progress directly from Stage I to Stage III (Model 2)?
What effect does the assumption about natural history have on the relative efficacy of screening?
The current model closely approximates lifetime cancer incidence, mortality, and stage distribution. Differences between observed and predicted values for age-specific stage distribution and mortality will require further imputation of values for stage-specific progression and detection probabilities.
Strategies which seek to identify high-risk groups are likely to have relatively small impact on overall ovarian cancer mortality, even if they are highly successful in reducing mortality in the risk group.
Therapies after diagnosis which are based on genomic targets may reduce mortality substantially, but only if the targets are common and treatment highly effective.
Reductions in ovarian cancer mortality of greater than 50 percent through screening require testing at intervals of 12 months or less, and are relatively independent of test sensitivity. Conversely, the number of false-positive results is quite high at these screening intervals unless test specificity is quite high.
These findings suggest that the failure to identify effective strategies for ovarian cancer screening may be due at least in part to the natural history of the disease, rather than the failure of the tests evaluated.
There are several limitations to this evidence report:
We did not review articles published in languages other than English because of a lack of resources for translation. It is possible that this led to failure to include some relevant studies.
We did not attempt to perform meta-analysis of specific tests, because of the considerable heterogeneity in design and patient populations.
Although we attempted to provide some sense of study quality, the validity and reproducibility of measures of study quality is uncertain.
Many of the key parameters used in modeling ovarian cancer incidence and mortality are unknown, and, in some cases, unknowable. In particular, this is true for the probability of progression between stages, and the probability that a woman with ovarian cancer at a given stage will have her cancer detected on the basis of symptoms. Although the model can be calibrated to provide a good fit to current data, it is possible that choices about the imputed values used for these parameters are incorrect in ways that affect the validity of the model.
The model is calibrated against reported age-specific incidence. Because the model simulates a cohort but is calibrated against cross-sectional age-specific data, it is possible that cohort effects in important variables, such as exposure to causes of ovarian cancer, exposure to risk modifiers such as pregnancy or contraceptive use, or competing risks such as other cause mortality or oophorectomy rates, play important roles in the observed incidence in specific age groups. Failure to take these into account during calibration may result in errors in the model (this is a common but rarely discussed issue with almost all cohort models of cancer incidence).
Many of the issues identified in the evidence report on adnexal mass14 were found in this literature as well. The majority of the papers reviewed failed to adequately describe the patient population; in particular, for those studies that included women with both benign and malignant ovarian disease, the manner in which the mass was originally detected and the subsequent evaluation can affect the probability of underlying disease, and thus predictive values. Depending on study design, prevalence may also indirectly affect estimates of sensitivity and specificity, especially in cases where women who have a negative test do not undergo the reference standard evaluation. Thus, even though the performance of a given test may vary depending on whether the patient is symptomatic or asymptomatic, the failure of studies to describe this aspect of their population makes drawing inference about applicability in specific clinical settings difficult.
Another common shortcoming was the failure of many studies to describe potential differences in results stratified by age or menopausal status. Given the clear and widely recognized relationship between age and ovarian cancer risk, as well as the effect of menopausal status on the prevalence of biological processes that may affect the levels of some tumor markers such as cancer antigen 125 (CA-125), we believe that this should be standard in all studies of potential ovarian cancer tests. This is especially true for studies of complex phenomena such as multiple gene or protein expression patterns, where the discovery process is based on identifying differences in patterns between populations; if some of the identified differences between cancer patients and controls in gene expression or protein profiles are in fact differences related to aging, menopausal status, or other processes unrelated to ovarian cancer, the ultimate sensitivity and specificity of tests based on these pattern recognitions may be substantially worse than in preliminary reports.
Few of the studies we reviewed included a priori sample size calculations, and use of confidence intervals for parameter estimates was uncommon. Our calculated confidence intervals were, for the most part, quite wide.
The majority of the studies we reviewed included prevalence of ovarian cancer of 30 to 60 percent. In a screening setting, this is several orders of magnitude higher than the observed prevalence in screening studies in the U.S. of 0.05 to 0.2 percent.14 This higher prevalence leads to falsely decreased confidence intervals for the estimates of sensitivity. Even more importantly, as only one author59 pointed out, this prevalence lowers the positive predictive value of tests to substantially less than 5 percent.
The prevalence in studies of the diagnostic use of a test would be expected to be higher, but, as discussed above, how much higher is dependent on the age, menopausal status, symptom status, etc., of the patient. Failure to describe these characteristics prevents assessment of how closely the study population reflects a likely clinical population.
Given that most ovarian cancer presents in later stages, the stage distribution of samples used for test development and validation is likely to be skewed towards later stages. Any abnormalities identified may be more common in advanced cancers than early cancers; since the goal of screening is to identify early stage (or even preinvasive cancers), the sensitivity for tests derived from these types of samples may be quite lower in real-world settings. This is especially true for tests which identify simultaneous multiple changes in a variety of markers (such as gene arrays or studies of protein patterns) without a clear understanding of the underlying biological significance of the changes. Changes associated with late stage cancer may not be seen in early cancers.
Many studies, especially of tests for single gene products, reported measures of assay reproducibility. However, we identified only one series of studies40, 41 that reported on the impact of both test and biological variability on interpretation of test results, in this case the significance of changes in CA-125 levels. Since both test reproducibility and biological variations may affect test characteristics (especially in applications where serial measurements are used to make clinical decisions), documentation of these effects for other tests should be required.
Similarly, for studies of multiple gene or protein expression, demonstration of reproducibility of results by different groups using similar analytic approaches is necessary; the available evidence suggests that reproducibility is still an issue.55, 59
The majority of the studies we identified on the use of genomic tests in patients already diagnosed with ovarian cancer reported on associations between test results and certain clinical outcomes, such as lack of response to chemotherapy, positive second look laparotomy, or length of overall or disease-free survival. We did not identify any studies which explicitly discussed how these results could be used, let alone any studies which formally tested the impact of use of the tests on patient outcomes. For example, if the results of a genomic test indicate a greater likelihood of failure to respond to standard chemotherapy, should that patient be offered only experimental therapies, or comfort care, rather than undergoing the effects of therapy which is unlikely to work? There are obvious ethical and feasibility issues involved in designing studies of such an approach, but if a patient will undergo the same therapy regardless of the results of the genomic tests, there seems to be little clinical value to performing the test. It is possible that there is some value to the patient and her family in having greater information about the probability of various outcomes, even if therapy is not affected, but this should be demonstrated using appropriate study designs and instruments.
The most common approach to initial test development in the studies we reviewed was to use serum or tissue from patients with cancer and compare results to a comparison group with no disease, or with non-ovarian cancer disease. In some cases, an attempt was made to discriminate between normal women, women with early stage ovarian cancer, and women with late stage ovarian cancer, in the hopes of identifying markers of early stage disease.
There are several implicit assumptions involved with this type of study design:
If attempts are not made to discriminate between stages, then the assumption is that all cancers, regardless of stage, exhibit a similar pattern. However, if there are changes in gene and/or protein patterns which are associated with advancing stage, failure to examine differences between stages may affect test accuracy.
If stages are examined differently, then the assumption is that all of the advanced stage cancers must have “looked” like the early stage cancers at some point. However, if “early” stage cancers represent cancers which are biologically different, rather than an early, necessary step in the development of ovarian cancer,136 then identification of markers of early stage cancer may not result in substantial reductions in mortality.
There are no other factors other than ovarian cancer itself which can explain observed differences between cancers and controls. The effect of potential confounders such as age, menopausal status, or other factors on gene or protein patterns would affect test specificity substantially.
Ultimately, the ideal approach is to use prospectively collected sera to attempt to identify markers for those patients who subsequently developed advanced stage ovarian cancer, an approach which may be achievable in some of the large ongoing studies of ovarian cancer screening.137
The search for better screening tests for ovarian cancer has been based on the implicit assumption that ovarian cancer progresses through a series of stages in a fashion analogous to that of cervical cancer. Alternative models are biologically plausible, and, as demonstrated by our simulation models, mathematical models can be “fitted” to match reported data under both alternatives. Our modeling suggests that, even under a model that assumes that all cancers progress through Stage II, screening at intervals more frequent than every 12 months is needed to reduce mortality by greater than 50 percent, even with a highly sensitive test, and that such screening would have a very low positive predictive value, even with a highly specific test.
If this is the case, then alternative methods for reducing ovarian cancer morbidity and mortality, such as improved methods for primary prevention and improved therapies, may ultimately offer more promise than the search for the Pap test equivalent in ovarian cancer.
Question 1 (Analytic Validity)
Question 2 (Sensitivity and Specificity)
Question 3 (Clinical Management of Asymptomatic Women)
Question 4 (Clinical Management of Diagnosed Women)
Question 5 (Potential Harms)
Question 6 (Direct Marketing)
We were able to closely approximate reported ovarian cancer incidence and mortality using a simulation model. This model can be used to identify test and treatment characteristics that would result in substantial reductions in ovarian cancer mortality. The most striking finding of the model is that the effect of screening frequency in achieving large-scale reductions in ovarian cancer mortality is greater than that of test sensitivity; achieving mortality reductions greater than 50 percent requires screening frequencies of less than 12 months. This is problematic for several reasons. First, a pilot study suggests that women are unlikely to be compliant with more frequent screening intervals.138 Second, more frequent screening results in lower overall positive predictive value, even with a highly specific test. Finally, if effective primary prevention strategies are identified which lower the incidence of ovarian cancer, the positive predictive value of screening will be lowered to an even larger extent.
This chapter outlines research priorities identified through the review, both in terms of fundamental gaps in knowledge and in addressing methodological issues in existing studies.
We suggest that future studies relevant to screening and diagnosis provide data on, and present results stratified by, the following minimal subject characteristics:
Subject age and/or menopausal status;
Subject race and ethnicity;
Presence or absence of known risk factors for ovarian cancer, particularly family history;
For subjects with cancer or adnexal masses, the means by which the mass was initially diagnosed;
For subjects with cancer or adnexal masses, the reason for the initial examination which led to diagnosis of a mass: symptoms referable to a mass or ovarian cancer, evaluation for other symptoms, asymptomatic screening for ovarian cancer, or asymptomatic screening for other conditions.
We recognize that, when using large databases for initial analysis, such as those used in many early proteomics studies, such detail may not be available; however, researchers should recognize and discuss the potential biases introduced by these factors.
Data on test reproducibility - such as coefficients of variation, inter- and intra-observer agreement, or concordance of results across laboratories - should be consistently reported or referenced.
Whenever possible, the potential impact of this reproducibility on test characteristics should be estimated. For example, given a coefficient of variation of some percent, what proportion of test results will fall on the other side of the threshold between positive and negative due to chance alone?
The potential impact of reproducibility on interpretation of serial test results should also be estimated where appropriate.
The effect of variation with time, either randomly or in relation to cyclic changes such as the menstrual cycle, should also be reported for tests which have potential use as serial markers.
Any variability due to age, menopausal status, or other biological processes should be tested for and noted.
Since in many studies “control” patients never undergo the reference standard (histological examination of the ovaries), there is the potential for verification bias. Although, given the relatively low incidence of ovarian cancer, the probability of misclassification is fairly low, studies should ideally have some followup on test-negative subjects to ensure that ovarian cancer has not developed within a short time after the test was performed.
Ultimately, tests need to be evaluated based on their intended use and at the stage in the clinical pathway where they will be used. Therefore, potential screening tests must be evaluated in screening settings, with a realistic underlying prevalence of cancer. Similarly, potential diagnostic tests must be tested in settings where there is uncertainty about the diagnosis of ovarian cancer.
Ideally, test characteristics for a variety of tests will be compared within the same study population, in order to avoid the inherent difficulties of comparing results across studies. At a minimum, given that the performance characteristics of cancer antigen 125 (CA-125) are well established, new tests should be directly compared to CA-125.
Although retrospective studies based on sera or other tissues are useful for establishing estimates of test performance for sample size considerations, new screening and diagnostic tests need to be evaluated prospectively. For example:
For screening tests, prospective demonstration of at least one important outcome, such as (a) reduced ovarian cancer-specific mortality, or (b) improved quality of life as documented by a validated instrument. Ideally, this would be done via randomized trials; however, alternative study designs (such as prospective cohort studies with appropriate adjustment for potential confounders) are reasonable for rarer primary outcomes (such as ovarian cancer mortality). In the screening context, given the relatively low positive predictive value of any screening test, documenting the effect of the test on overall quality of life at the population level should be easily demonstrated within the context of a randomized trial.
Evaluation of the use of tests in predicting outcomes must ultimately be linked to some change in patient outcomes; at the least, there should be some measure of the value of the information gained from the test result is helpful in some way to the patient. Ideally, the effect of changes in management based on test results should be evaluated in properly designed studies. For example:
For tests which appear to reliably predict failure to respond to conventional therapies, studies should prospectively document improved patient outcomes based on this knowledge (such as improved quality of life based on more precise prognosis, or improved quality of life due to avoidance of side effects from ineffective therapy). Ideally, this would be based on randomized trials - patients could be randomized to testing with treatment based on test results, versus no testing; alternatively, testing could be done, with randomized allocation to usual care versus no care for those with test results predicting poor response.
For tests which predict greater response to specific agents, improved survival and quality of life need to be documented using randomized trials of those agents in those with specific test results.
Underlying assumptions about the natural history of ovarian cancer can have a large effect on the estimated impact of screening compared to other strategies for prevention of ovarian cancer morbidity and mortality. Every effort should be made towards a better understanding of whether ovarian cancer “behaves” like cervical cancer in the sense of progressing through different stages, or whether rapid progression is the most common biological behavior.
The implications of these assumptions on the relative efficacy of screening compared to other strategies needs to be evaluated by more sophisticated simulation models.
Ovarian cancer remains a significant cause of morbidity and mortality in women, and efforts at reducing its toll have been relatively unsuccessful, especially when compared with other causes of cancer death in women. Unlike lung cancer or cervical cancer, there does not appear to be a common causal exposure which can be addressed through various public health interventions; unlike cervical, breast, or colorectal cancer, effective screening methods have not yet been identified; unlike breast cancer, markers of response to specific treatments have not yet been discovered and proven to improve patient outcomes.
The ever-increasing knowledge of the role of genes in health and disease offers the promise of greater understanding of the biology of ovarian cancer, and evidence-based strategies for prevention based on that understanding. Understanding of the causal mechanisms could potentially lead to population-based primary prevention strategies which preserve ovarian function, while identification of markers of increased risk in addition to breast cancer genes 1/2 (BRCA 1/2) offers the potential for more radical preventive measures such as prophylactic oophorectomy. Improved understanding of the molecular changes leading to cancer may lead to screening tests of very high sensitivity and specificity. Identification of markers of response to therapy could lead to improved survival, or reduced side effects from current treatment.
Unfortunately, our review found that there is limited evidence for the utility of genomic tests other than cancer antigen 125 (CA-125) or BRCA1/2 in the prevention of ovarian cancer. Other than commercially approved radioimmunoassay tests for single gene products, there is little available literature on the analytic validity of potential genomic tests in typical clinical laboratories. There are almost no data on the sensitivity and specificity of genomic tests for screening or diagnosis in clinically realistic settings. Although results of some genomic tests have been shown to be associated with certain outcomes of treatment, there are no data on how changes in management based on those test results would lead to improved patient outcomes.
New genomic tests do not appear to have any qualitative risks beyond those of other tests for inherited susceptibility for cancer, or other tests used in screening, management, and treatment. Depending on the ultimate sensitivity and specificity of the tests in typical practice, the quantitative probability of these harms may differ from existing tests.
The use of direct-to-consumer advertising has the potential to increase utilization of these tests, but, in the absence of criteria for “appropriate use,” it is unclear how to evaluate this increased utilization.
Ultimately, the clinical utility of genomic tests in the prevention of morbidity and mortality from ovarian cancer will depend not only on the sensitivity and specificity of a given test in a specific clinical situation, but on the underlying natural history of ovarian cancer. If the biological features of ovarian cancer predispose most cancers to rapid dissemination within the abdominal cavity, then strategies which emphasize primary prevention and/or improved treatment efficacy may ultimately be more effective than the most sensitive and specific test.
| AHRQ | Agency for Healthcare Research and Quality |
| β-hCG | Beta human chorionic gonadotropin |
| Bcl-2 | (Anti-apoptosis protein) |
| BRCA1/2 | Breast cancer gene 1/2 |
| CA-125 | Cancer antigen 125 |
| CA-15-3 | Cancer antigen 15-3 |
| CA-19-9 | Cancer antigen 19-9 |
| CA-27-29 | Cancer antigen 27-29 |
| CA-72-4 | Cancer antigen 72-4 |
| CASA | Cancer-associated serum antigen |
| CDC | Centers for Disease Control and Prevention |
| CEA | Carcinoembryonic antigen |
| c-erb-B2 | (Same as HER-2) |
| c-erb-2 | (Same as HER-2) |
| CI | Confidence interval |
| CK19 | Cytokeratin 19 |
| CYFRA 21-1 | Cytokeratin fragment 21 |
| EGAPP | Evaluation of Genomic Applications in Practice and Prevention |
| EGFR | Epidermal growth factor receptor |
| FAS | Fatty acid synthase |
| FIGO | International Federation of Obstetrics and Gynecology |
| GAT | Galactosyltransferase associated with tumor |
| G-CSF | Granulocyte-colony stimulating factor |
| HE4 | Human epididymis protein 4 |
| HER-2 | Human epidermal growth factor receptor 2 |
| hK6 | Human kallikrein 6 |
| hK10 | Human kallikrein 10 |
| IL-2 | Interleukin 2 |
| IL-6 | Interleukin 6 |
| IL-8 | Interleukin 8 |
| LASA | Lipid-associated sialic acid |
| LMP | Low malignant potential |
| LRP | Low-density lipoprotein receptor-related protein |
| LSA | Lysophospholipids |
| M-CSF | Macrophage colony stimulating factor |
| Mdm2 | Murine double minute protein |
| MDR-1 | Multidrug resistance gene 1 |
| MeSH | Medical Subject Headings |
| MMP | Matrix metalloproteinases |
| MRP1/2 | Multidrug resistance protein 1/2 |
| MW | Molecular weight |
| M/Z | Mass-to-charge |
| nm23 | (Metastasis suppressor) |
| OVX1 | (Monoclonal antibody raised against a human ovarian carcinoma cell line) |
| p53 | (Transcription factor) |
| p55, p75 | (Tumor necrosis factor receptors) |
| Pgp | P-glycoprotein |
| PLCO | Prostate, Lung, Colon, and Ovarian screening trial (National Cancer Institute) |
| RCT | Randomized controlled trial |
| RIA | Radioimmunoassay |
| ROC | Receiver operating characteristic |
| SAX2 | Strong anionic exchanger |
| SEER | Surveillance, Epidemiology, and End Results |
| SELDI-TOF | Surfance-enhanced laser desorption inonization time-of-flight |
| SLL | Second-look laparotomy |
| SVM-GA | Support vector machine with genetic algorithm |
| SVM-ST | Support vector machine statistical testing |
| TATI | Tumor-associated trypsin inhibitor |
| TN | Tetranectin |
| TP53 | (Same as p53) |
| TPA | Tissue plasminogen activator |
| TPS | Tissue polypeptide-specific antigen |
| USCS | United States Cancer Statistics |
| USPSTF | U.S. Preventive Services Task Force |
| VEGF | Vascular endothelial growth factor |
| WCX2 | Weak cationic exchanger |
Database: Ovid MEDLINE® <1966 to May Week 2 2006>
Search Strategy:
liotta l$.au.
Ovarian Neoplasms/
1 and 2
exp Ovarian Neoplasms/
exp Genomics/
exp Genetic Phenomena/
ovacheck.mp.
myriad.mp.
Chorionic Gonadotropin, beta Subunit, Human/
GENES, BRCA1/ or BRCA1 PROTEIN/
GENES, BRCA2/ or BRCA2 PROTEIN/
CA-125 Antigen/
Antigens, Tumor-Associated, Carbohydrate/
Carcinoembryonic Antigen/
Receptor, erbB-2/
Tumor Markers, Biological/
Antigens, Neoplasm/
4 and (or/5–17)
correlogic.mp.
4 and (or/5,7–17)
18 not 20
limit 20 to (humans and english language and abstracts)
exp Diagnosis/
exp “Sensitivity and Specificity”/
di.fs.
22 and (or/24–25)
3 and 26
3 and 22
3 not 28
28 not 27
*“Proteome”/
oligonucleotide array sequence analysis/ or protein array analysis/
4 and (or/5,7–17,32)
33 not 20
3 and 34
2 and (or/5,7–17,32)
*ovarian neoplasms/
37 and (or/5,7–17,32)
2 and (or/7–8,19,32)
3 and 39
39 not 26
3 not 26
41 or 42
limit 43 to (humans and english language)
limit 44 to abstracts
from 45 keep 1–10
from 45 keep 1–167
“Reproducibility of Results”/
reference standards/
quality control/
reference values/
or/48–51
52 and (or/5,7–17,19,32) and 4
52 and (or/5,7–17,19,32) and 4
52 and (or/5,7–17,19,32)
Genetic Screening/
Genetic Counseling/
4 and (or/56–57)
limit 58 to (humans and english language and abstracts)
59 not (26 or 45)
54 not (26 or 45)
limit 61 to (humans and abstracts)
genes, brca1/ or genes, brca2/
60 not 63
60 not 64
from 64 keep 1–155
from 62 keep 1–76
All excluded studies listed below were reviewed in their full-text version. Following each reference, in italics, is the reason for exclusion. Reasons for exclusion signify only the usefulness of the articles for this study and are not intended as criticisms of the articles.
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]| Study | Study Design | Description of Test(s) | Patient Characteristics | Results | Comments/Quality Scoring |
|---|---|---|---|---|---|
| StudyID | Geographical location | Genomic test(s) used: | Age: | [For each test reported, please provide a 2×2 table and report or calculate sensitivity, specificity, NPV, and PPV (all with confidence intervals); alternatively, for continuous variables, report the correlation coefficient or other measure of association. Also include data on reproducibility (inter- and intra-assay coefficient of variation, kappa, etc.).] | [IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE] |
| [city & state (U.S.) or city & country (foreign)]: | Type(s) of samples | Mean (SD): | 1) [2×2 table - use this header space to provide information needed for reader to interpret “Test +,” “Test -,” “Ref stand +,” and “Ref stand -” headings in following table.] | [COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION] | |
| Study dates [month & year]: | [delete all that do not apply]: | Median: | 2) Data on other test accuracy measures (correlation coefficient, interclass correlation, etc.) | Quality assessment: | |
| Size of population | Blood or tissue | Range: | [+ if appropriate quality, - if not; add text to describe] | ||
| [give num/denom for screening studies]: | Cyst fluid | Race/ethnicity (n [%]): | Reference standard: | ||
| Type of laboratory | Ascites | Diagnoses (n [%]): | Verification bias: | ||
| [delete all that do not apply]: | Ovarian cancer: | Test reliability/variability: | |||
| Clinical lab | Borderline: | Sample size: | |||
| Commercial lab | Benign ovarian mass: | Statistical tests: | |||
| Hospital-based clinical samples | Other (specify): | Blinding: | |||
| Research lab | Healthy controls: | Definition of +/- on screening test: | |||
| Not reported | Inclusion criteria: | Grade: | |||
| Exclusion criteria: | This article is also relevant to: | ||||
| [delete all that do not apply] | |||||
| Question 2 | |||||
| Question 3 | |||||
| Question 4 | |||||
| Question 5 | |||||
| Question 6 |
| Study | Study Design | Patients | Clinical Presentation | Results | Comments/Quality Scoring |
|---|---|---|---|---|---|
| StudyID | Geographical location | Age: | Screening only (n [%]): | [For each test reported, please provide a 2×2 table and report or calculate sensitivity, specificity, NPV, and PPV (all with confidence intervals). Also include data on reproducibility (inter- and intra-assay coefficient of variation, kappa, etc.).] | [IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE] |
| [city & state (U.S.) or city & country (foreign)]: | Mean (SD): | Diagnosis of mass: | 1) [2×2 table - use this header space to provide information needed for reader to interpret Test +, Test -, Disease +, and Disease - headings in following table.] | [COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION] | |
| Study dates [month & year]: | Median: |
| 2) Data on other test accuracy measures (correlation coefficient, interclass correlation, etc.) | Quality assessment: | |
| Size of population | Range: | Additional data used for diagnosis: | [+ if appropriate quality, - if not; add text to describe] | ||
| [give num/denom for screening studies]: | Menopausal status (n [%]): | Reference standard: | |||
| Type of population | Pre (< 45): | Verification bias: | |||
| [delete all that do not apply]: | Peri (45–55): | Test reliability/variability: | |||
| Screening | Post (> 55): | Sample size: | |||
| Adnexal mass | Race/ethnicity (n [%]): | Statistical tests: | |||
| Other (specify) | Risk factors (n [%]): | Blinding: | |||
| Genomic test(s) used: | Family history: | Definition of +/- on screening test: | |||
| Reference standard | Genotype: | Grade: | |||
| [delete all that do not apply]: | Other (specify): | This article is also relevant to: | |||
| Surgical pathology | Diagnoses (n [%]): | [delete all that do not apply] | |||
| Clinical outcome (specify) | Ovarian cancer: | Question 1 | |||
| Reference standard applied to all test negatives?: | Borderline: | Question 3 | |||
| Test reliability established?: | Benign ovarian mass: | Question 4 | |||
| Statistical tests used: | Other (specify): | Question 5 | |||
| Blinding: | Healthy controls: | Question 6 | |||
| Definition of positive and negative on screening test: | Inclusion criteria: | ||||
| Exclusion criteria: |
| Study | Study Design | Patients | Outcome Measures | Results | Comments/Quality Scoring |
|---|---|---|---|---|---|
| StudyID | Geographical location | Age: | Use of test results: [e.g., change in screening test or frequency] | For each outcome measured, report outcomes based on test result; include 95% confidence intervals if available. | [IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE] |
| [city & state (U.S.) or city & country (foreign)]: | Mean (SD): | Outcomes measured: | [COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION] | ||
| Study dates [month & year]: | Median: | [delete all that do not apply] | Quality assessment: | ||
| Study type [delete all but one]: | Range: | Cancer incidence | [+ if appropriate quality, - if not; add text to describe] | ||
| RCT | Menopausal status (n [%]): | Cancer mortality | For RCT: | ||
| Cohort | Pre (< 45): | Quality of life | Randomization method: | ||
| Case-control | Peri (45–55): | Other (specify) | Blinding: | ||
| Other (specify) | Post (> 55): | Dropout rate < 20%: | |||
| Size of population: | Race/ethnicity (n [%]): | Adequacy of randomization concealment: | |||
| Genomic test(s) used: | Risk factors (n [%]): | For cohort study: | |||
| Reference standard: | Family history: | Unbiased selection of the cohort (prospective recruitment of subjects): | |||
| [delete all that do not apply] | Genotype: | Large sample size: | |||
| Surgical pathology | Other (specify): | Adequate description of the cohort: | |||
| Clinical outcome (specify) | Inclusion criteria: | Use of validated method for genomic test: | |||
| Test reliability established?: | Exclusion criteria: | Use of validated method for ascertaining clinical outcomes: | |||
| Statistical tests used: | Adequate follow-up period: | ||||
| Definition of positive and negative on screening test: | Completeness of follow-up: | ||||
| Analysis (multivariate adjustments) and reporting of results: | |||||
| For case-control study: | |||||
| Valid ascertainment of cases: | |||||
| Unbiased selection of cases: | |||||
| Appropriateness of the control population: | |||||
| Verification that the control is free of cancer: | |||||
| Comparability of cases and controls with respect to potential confounders: | |||||
| Validated dietary assessment method: | |||||
| Appropriateness of statistical analyses: | |||||
| Grade: | |||||
| This article is also relevant to: | |||||
| [delete all that do not apply] | |||||
| Question 1 | |||||
| Question 2 | |||||
| Question 4 | |||||
| Question 5 | |||||
| Question 6 |
| Study | Study Design | Patients | Outcome Measures | Results | Comments/Quality Scoring |
|---|---|---|---|---|---|
| StudyID | Geographical location | Age: | Use of test results: [e.g., change in screening test or frequency] | [For each outcome measured, report outcomes based on test result. Note that you should only abstract data when 2×2 tables can be constructed. Articles that report only Kaplan Meier curves or Hazard Ratios should not be abstracted.] | [IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE] |
| [city & state (U.S.) or city & country (foreign)]: | Mean (SD): | Outcomes measured: | 1) [2×2 table - use this space to provide information needed for reader to interpret Test +, Test -, Outcome +, and Outcome - headings in following table.] | [COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION] | |
| Study dates [month & year]: | Median: | [delete all that do not apply] | 2) [2×2 table - use this space to provide information needed for reader to interpret Test +, Test -, Outcome +, and Outcome - headings in following table.] | Quality assessment: | |
| Study type [delete all but one]: | Range: | Cancer incidence | 3) Hazard Ratio or other relevant information: | [+ if appropriate quality, - if not; add text to describe] | |
| RCT | Menopausal status (n [%]): | Cancer mortality | For RCT: | ||
| Cohort | Pre (< 45): | Quality of life | Randomization method: | ||
| Case-control | Peri (45–55): | Other (specify) | Blinding: | ||
| Other (specify) | Post (> 55): | Dropout rate < 20%: | |||
| Size of population: | Race/ethnicity (n [%]): | Adequacy of randomization concealment: | |||
| Genomic test(s) used: | Risk factors (n [%]): | For cohort study: | |||
| Reference standard: | Family history: | Unbiased selection of the cohort (prospective recruitment of subjects): | |||
| [delete all that do not apply] | Genotype: | Large sample size: | |||
| Surgical pathology | Other (specify): | Adequate description of the cohort: | |||
| Clinical outcome (specify) | Diagnoses (n [%]): | Use of validated method for genomic test: | |||
| Test reliability established?: | Ovarian cancer: | Use of validated method for ascertaining clinical outcomes: | |||
| Statistical tests used: | Borderline: | Adequate follow-up period: | |||
| Definition of positive and negative on screening test: | Benign ovarian mass: | Completeness of follow-up: | |||
| Other (specify): | Analysis (multivariate adjustments) and reporting of results: | ||||
| Healthy controls: | For case-control study: | ||||
| Treatment (n [%]): | Valid ascertainment of cases: | ||||
| Surgery: | Unbiased selection of cases: | ||||
| Chemotherapy: | Appropriateness of the control population: | ||||
Platinum: | Verification that the control is free of cancer: | ||||
Taxol: | Comparability of cases and controls with respect to potential confounders: | ||||
Other (specify): | Validated dietary assessment method: | ||||
| Other (specify): | Appropriateness of statistical analyses: | ||||
| Inclusion criteria: | Grade: | ||||
| Exclusion criteria: | This article is also relevant to: | ||||
| [delete all that do not apply] | |||||
| Question 1 | |||||
| Question 2 | |||||
| Question 3 | |||||
| Question 5 | |||||
| Question 6 |
| Study | Study Design | Patients | Outcome Measures | Results | Comments/Quality Scoring |
|---|---|---|---|---|---|
| StudyID | Geographical location | Age: | Use of test results: [e.g., change in screening test or frequency] | For each outcome measured, report outcomes based on test result. | [IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE] |
| [city & state (U.S.) or city & country (foreign)]: | Mean (SD): | Outcomes measured: | 1) | [COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION] | |
| Study dates [month & year]: | Median: | [delete all that do not apply] | 2) | Quality assessment: | |
| Study type [delete all but one]: | Range: | Complications | 3) | [+ if appropriate quality, - if not; add text to describe] | |
| RCT | Menopausal status (n [%]): | Quality of life | 4) | For RCT: | |
| Cohort | Pre (< 45): | Other (specify) | 5) | Randomization method: | |
| Case-control | Peri (45–55): | 6) | Blinding: | ||
| Other (specify) | Post (> 55): | 7) | Dropout rate < 20%: | ||
| Size of population: | Race/ethnicity (n [%]): | Adequacy of randomization concealment: | |||
| Genomic test(s) used: | Risk factors (n [%]): | For cohort study: | |||
| Reference standard: | Family history: | Unbiased selection of the cohort (prospective recruitment of subjects): | |||
| [delete all that do not apply] | Genotype: | Large sample size: | |||
| Surgical pathology | Other (specify): | Adequate description of the cohort: | |||
| Clinical outcome (specify) | Diagnoses (n [%]): | Use of validated method for genomic test: | |||
| Test reliability established?: | Ovarian cancer: | Use of validated method for ascertaining clinical outcomes: | |||
| Statistical tests used: | Borderline: | Adequate follow-up period: | |||
| Definition of positive and negative on screening test: | Benign ovarian mass: | Completeness of follow-up: | |||
| Other (specify): | Analysis (multivariate adjustments) and reporting of results: | ||||
| Healthy controls: | For case-control study: | ||||
| Inclusion criteria: | Valid ascertainment of cases: | ||||
| Exclusion criteria: | Unbiased selection of cases: | ||||
| Appropriateness of the control population: | |||||
| Verification that the control is free of cancer: | |||||
| Comparability of cases and controls with respect to potential confounders: | |||||
| Validated dietary assessment method: | |||||
| Appropriateness of statistical analyses: | |||||
| Grade: | |||||
| This article is also relevant to: | |||||
| [delete all that do not apply] | |||||
| Question 1 | |||||
| Question 2 | |||||
| Question 3 | |||||
| Question 4 | |||||
| Question 6 |
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]The Duke Evidence-based Practice Center is grateful to the following peer reviewers who read and commented on a draft version of this report:
Ralph J. Coates, PhD, Associate Director for Science, Division of Cancer Prevention and Control, Centers for Disease Control and Prevention, Atlanta, Georgia
Tracy G. Lively, PhD, Associate Chief, Diagnostics Evaluation Branch, Cancer Diagnosis Program, National Cancer Institute, Rockville, Maryland
Edward E. Partridge, MD, Professor, Obstetrics and Gynecology, and Interim Director, UAB Comprehensive Cancer Center, Birmingham, Alabama
Gurvaneet Randhawa, MD, MPH, Center for Outcomes and Evidence, Agency for Healthcare Research and Quality, Rockville, Maryland
Edward (Ted) Trimble, MD, MPH, Head of Gynecologic Cancer Therapeutics and Quality of Cancer Care Therapeutics, National Cancer Institute, Rockville, Maryland
Combined comments from the Evaluation of Genomics Applications in Practice and Prevention (EGAPP)/Centers for Disease Control and Prevention (CDC) Discussion Group
Nominations for peer reviewers were solicited from several sources, including the project's technical expert panel and interested federal agencies. The list of nominees was vetted and approved by the Agency for Healthcare Research and Quality (AHRQ).
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]