NCBI » Bookshelf » Health Services/Technology Assessment Text (HSTAT) » AHRQ Evidence Reports » Management of Adnexal Mass
 
hserta
AHRQ Evidence Reports
public health

Chapter  130:  Management of Adnexal Mass

A207833

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0025

Prepared by:

Duke Evidence-based Practice Center

Durham, North Carolina

Investigators

Evan R. Myers, MD, MPH

Lori A. Bastian, MD, MPH

Laura J. Havrilesky, MD

Shalini L. Kulasingam, PhD

Mishka S. Terplan, MD

Kathryn E. Cline, MHS

Rebecca N. Gray, DPhil

Douglas C. McCrory, MD, MHS

AHRQ Publication No. 06-E004

February 2006

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

Suggested Citation:

Myers ER, Bastian LA, Havrilesky LJ, Kulasingam SL, Terplan MS, Cline KE, Gray RN, McCrory DC. Management of Adnexal Mass. Evidence Report/Technology Assessment No. 130 (Prepared by the Duke Evidence-based Practice Center under Contract No. 290-02-0025.) AHRQ Publication No. 06-E004. Rockville, MD: Agency for Healthcare Research and Quality. February 2006.

This report is based on research conducted by the Duke Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0025). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.

This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

None of the investigators has any affiliations or financial involvement that conflicts with the material presented in this report.

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0025

Prepared by:

Duke Evidence-based Practice Center

Durham, North Carolina

Investigators

Evan R. Myers, MD, MPH

Lori A. Bastian, MD, MPH

Laura J. Havrilesky, MD

Shalini L. Kulasingam, PhD

Mishka S. Terplan, MD

Kathryn E. Cline, MHS

Rebecca N. Gray, DPhil

Douglas C. McCrory, MD, MHS

AHRQ Publication No. 06-E004

February 2006

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

Suggested Citation:

Myers ER, Bastian LA, Havrilesky LJ, Kulasingam SL, Terplan MS, Cline KE, Gray RN, McCrory DC. Management of Adnexal Mass. Evidence Report/Technology Assessment No. 130 (Prepared by the Duke Evidence-based Practice Center under Contract No. 290-02-0025.) AHRQ Publication No. 06-E004. Rockville, MD: Agency for Healthcare Research and Quality. February 2006.

This report is based on research conducted by the Duke Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0025). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.

This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

None of the investigators has any affiliations or financial involvement that conflicts with the material presented in this report.

Preface

The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. This report was requested and funded by the Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion at the Centers for Disease Control and Prevention (CDC). The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.

To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.

AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.

We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by e-mail to .

Acknowledgments

The authors gratefully acknowledge Jane Kolimaga and Alison Lee for assistance in study initiation; Margaret Jamison, PhD, for assistance with analysis of the Nationwide Inpatient Sample data set; and Karen Hoffman, MD, for initial work on developing the ovarian cancer natural history model. They also thank Mona Saraiya, MD, MPH, and Christie Eheman, PhD, from the Centers for Disease Control and Prevention for their valuable input.

Structured Abstract

Objectives: To assess diagnostic strategies for distinguishing benign from malignant adnexal masses.

Data Sources: MEDLINE® and reference lists of recent reviews; discharge data from the Nationwide Inpatient Sample.

Review Methods: The major diagnostic methods evaluated were bimanual pelvic examination, ultrasound (morphology and Doppler velocimetry), MRI, CT, FDG-PET, CA-125, and scoring systems that incorporated multiple clinical, laboratory, and radiologic findings. Meta-analysis using a random-effects model was used to estimate pooled sensitivity and specificity for discriminating benign from malignant. We reviewed evidence for followup strategies for masses considered benign, and for adverse outcomes of diagnostic surgery. We also reviewed published models of the natural history of ovarian cancer and compared the impact of assumptions about natural history on outcomes.

Results: The majority of studies did not describe whether patients presented with asymptomatic masses detected through screening or with symptoms. Prevalence of malignant masses in a U.S. postmenopausal screening population was approximately 0.1 percent, while benign masses were found in 0.8 to 1.8 percent of women. Pooled (a) sensitivity and (b) specificity were: bimanual exam (a) 0.45, (b) 0.90; ultrasound morphology scores (a) 0.86 to 0.91, (b) 0.68 to 0.83; Doppler resistive index (a) 0.72, (b) 0.90; pulsatility index (a) 0.80, (b) 0.73; maximum systolic velocity (a) 0.74, (b) 0.81; presence of vessels (a) 0.88, (b) 0.78; combined morphology and Doppler (a) 0.86, (b) 0.91; MRI (a) 0.91, (b) 0.88; CT (a) 0.90, (b) 0.75; FDG-PET (a) 0.67, (b) 0.79; and CA-125 (a) 0.78, (b) 0.78. Both sensitivity and specificity of CA-125 were better in postmenopausal than in premenopausal women. In modeled outcomes, combinations of imaging and CA-125 were both more sensitive and more specific than either alone. Performance of scoring systems in validation studies was consistently worse than in development studies; the highest demonstrated specificity observed was 0.91, with a concurrent sensitivity of 0.74. Evidence on followup strategies was sparse, although one large study provided good evidence for safely following unilocular cysts less than 10 cm in diameter. Overall complication rates in studies of surgically managed adnexal masses were low, but important clinical information was not reported.

Conclusions: All diagnostic modalities showed trade-offs between sensitivity and specificity, but the available literature does not provide sufficient detail on relevant characteristics of study populations to allow confident estimation of the results of alternative diagnostic strategies. Although modeling studies may prove useful in evaluating diagnostic algorithms, further work is needed to explore the implications of uncertainty about the natural history of ovarian cancer.

Executive Summary

Introduction

Ovarian cancer is the leading cause of cancer death from gynecologic malignancies in the United States, with an annual incidence of over 25,000 and an annual mortality of approximately 14,000. Cancer incidence increases dramatically with age.

The high case-fatality rate has largely been attributed to the fact that most ovarian cancers are diagnosed in advanced stages (Stage III, where the cancer has spread beyond the pelvis to organs of the upper abdominal cavity, and Stage IV, where the cancer has spread outside of the peritoneal cavity), when survival is poor. Stage I cancer (limited to the ovaries) has a survival rate of over 90 percent. Thus, there has long been an emphasis on early detection of ovarian cancer in the belief that detection in early stages will lead to decreases in morbidity and mortality. The detection of a mass in the area of the ovaries and fallopian tubes (the uterine adnexae) raises the possibility of ovarian cancer, which necessitates further study to rule out malignancy.

There are two main clinical routes by which an adnexal mass may be detected: (1) women with symptoms may have an adnexal mass detected as part of their evaluation for those symptoms, either by physical exam or radiographic imaging; (2) the mass may be detected during bimanual pelvic examination or radiologic imaging as part of a routine health maintenance examination.

For the purposes of this evidence report, we define an adnexal mass as an enlarged structure in the uterine adnexa that can either be palpated on a bimanual pelvic examination or visualized using radiographic imaging.

There are a number of conditions that can be associated with an adnexal mass. These include malignancies arising from the ovary and fallopian tube, or metastatic disease from another site (such as the breast or gastrointestinal tract), as well as a wide range of benign conditions. For the purposes of this evidence report, “management” of the adnexal mass refers to the process by which a mass is ultimately classified as benign or malignant.

The clinical significance of discriminating benign from malignant masses differs depending on the clinical setting in which the mass is initially detected. For women with symptoms, in whom surgical management may be appropriate whether or not the mass is malignant, the main reason to discriminate between benign and malignant lesions is to facilitate referral and management by clinicians who have specialized training and experience in managing ovarian malignancy, with improved outcomes. For asymptomatic women, discriminating benign from malignant disease is important both to ensure appropriate management in the setting of malignancy, but also to avoid unnecessary diagnostic procedures, including surgery, in women with asymptomatic, nonmalignant conditions.

The prevalence of malignancy may differ between women with symptomatic and asymptomatic masses, which may in turn affect the positive and negative predictive value of a test, and, potentially, sensitivity and specificity as well. Prevalence also varies with age and with family history.

This report focuses on the evidence relevant to establishing the most appropriate way to distinguish benign from malignant adnexal masses in both symptomatic and asymptomatic women. A key consideration throughout the report will be the underlying likelihood of malignancy in the populations studied, and the impact of this prevalence on the interpretation of the results of the reviewed studies. The results of this report are intended primarily to (a) provide a resource for clinicians and policymakers developing guidelines on management of adnexal masses, and (b) provide a resource for researchers and funding agencies in identifying gaps in our knowledge and research priorities.

Methods

Working with the Agency for Healthcare Research and Quality (AHRQ), the Centers for Disease Control and Prevention (CDC), and members of the technical expert panel, we developed seven questions to be addressed, using an analytic framework which incorporated prior probability of disease, test results, and outcomes of diagnostic surgery.

We searched MEDLINE® (1966-September 2004) and the Cochrane Database of Systematic Reviews. Searches of these databases were supplemented by reviews of reference lists contained in all included articles and in relevant review articles and meta-analyses. The searches yielded a total of 1,023 citations. Pairs of readers reviewed each abstract and selected 445 articles for full text review. Specific inclusion criteria were developed for each question, and both readers were required to agree on inclusion.

We developed tables to abstract each article, and quality criteria for each question. For studies of diagnostic tests, 2-by-2 tables were constructed for each included article, and sensitivity, specificity, and positive and negative predictive values, with 95 percent confidence intervals (CIs) for each, were calculated. If not provided, we also calculated 95% CIs for articles about prevalence and adverse event rates during diagnostic surgery. For diagnostic tests, pooled estimates of sensitivity and specificity were calculated using a random-effects model.

We performed three supplemental analyses. First, we used the Nationwide Inpatient Sample (NIS), a nationally representative database containing discharge data from approximately 20 percent of U.S. hospitals. Using International Classification of Diseases, Ninth Revision (ICD-9) codes and the provided corrections for sample weighting, we estimated the number of cases of women 15 and older undergoing diagnostic laparoscopy and exploratory laparotomy in 2000 and 2001 for diagnoses consistent with an adnexal mass. Mortality and morbidity rates for each type of procedure within each diagnosis were also estimated.

Second, we performed a simple decision model based on serial or parallel testing using the pooled sensitivity and specificity of various tests to predict outcomes.

Finally, we used a previously developed Markov model of the natural history of ovarian cancer to explore the implications of alternative possible pathways in the development of advanced disease - specifically, that some cancers limited to the ovaries (Stage I) may spread to the upper abdomen (Stage III) without first spreading to other pelvic organs (Stage II).

Results

Question 1: What is the prevalence of various tumor types among women with an adnexal mass, stratified by cancer status (malignant vs. benign), age, menopausal status, and size of tumor?

Question 2: What are the sensitivity, specificity, and reliability of the bimanual pelvic examination?

Question 3: Among women with a palpable adnexal mass on exam or a mass identified by ultrasound/imaging, what is the sensitivity/specificity of various evaluation modalities including ultrasound (transvaginal ultrasound, transabdominal ultrasound, color Doppler, two-dimensional [2D] vs. three-dimensional [3D] ultrasound), computer tomography (CT) scan, magnetic resonance imaging (MRI) scan, and CA-125 levels for distinguishing benign from malignant masses?

Question 4: What is the accuracy of explicit scoring systems which incorporate various combinations of imaging findings, patient risk factors, and/or CA-125 levels for detecting malignancy? Have these scoring systems been applied to a population of women before laparoscopy or laparotomy?

Question 5: Among women with suspected benign masses on initial investigation, what are the sensitivity and specificity of monitoring with periodic CA-125 and/or interval ultrasound examinations for detecting malignant masses? How does the interval of testing/definition of change affect sensitivity and predictive value?

Question 6: Among women with adnexal masses, what are the morbidity and mortality from diagnostic surgery (laparoscopy or laparotomy)? At what point does the risk of surgery outweigh the risk of detecting malignancy?

Question 7: What are the estimated trade-offs resulting from various strategies for evaluation of the adnexal mass?

Discussion

Limitations of the Literature

The main limitation in the literature was the failure to adequately describe relevant patient characteristics, including the presence or absence of symptoms, and variable reporting of menopausal status. Inadequate sample size, lack of blinding, and failure to account for observer variability were also common limitations.

Limitations of the Report

The report did not include non-English publications. We did not include non-U.S. studies in our review of the prevalence of different types of adnexal mass. Given the heterogeneity of studies, pooled estimation of sensitivity and specificity may not be appropriate. The NIS does not include outpatient procedures, and our coding algorithm may have missed some complications.

Future Research

Research priorities include: a minimal consensus data set on key patient characteristics (with results presented stratified by those characteristics); better estimates of prevalence and surgical outcomes using data sources that capture inpatient and outpatient encounters, such as Medicare or health maintenance organizations; better characterization of patient characteristics in all studies; better evidence on the value of the pelvic exam as part of routine health maintenance; and development of additional models for simulating the natural history of ovarian cancer and evaluating screening, diagnosis, and treatment strategies.

Conclusions

Developing an effective and efficient algorithm for the evaluation of any condition requires good evidence on the prevalence of the condition at the first diagnostic encounter, and the sensitivity and specificity of the potential diagnostic tests to be used. Unfortunately, the overwhelming majority of the literature we reviewed did not provide sufficient detail on important patient characteristics to allow estimation of the outcomes of different diagnostic strategies, either in the context of detecting adnexal masses or distinguishing benign from malignant masses.

All of the diagnostic tests and scoring systems we evaluated exhibited a trade-off between sensitivity and specificity - studies of a given test that reported higher sensitivity had lower specificity, and vice versa. The bimanual pelvic examination has low sensitivity for both detection of adnexal masses and discriminating benign from malignant masses, raising doubts about its utility as a screening test in asymptomatic women. In pooled analysis, the combination of ultrasound morphology and Doppler blood flow had the best combination of sensitivity and specificity, with MRI comparable. In a preliminary model, serial testing with imaging followed by CA-125 was both more sensitive and more specific than either test alone; parallel testing using both tests incorporated into the Risk of Malignancy Index resulted in fewer missed cancers (greater sensitivity) but more surgeries (lower specificity), with twice as many tests.

Studies of surgical management suffered from the same limitations in terms of description of patient characteristics, making estimation of the risks of false positive diagnostic testing impossible.

Ultimately, evaluation of potential strategies for reducing morbidity and mortality from ovarian cancer may require use of simulation models, a technique that has proven helpful in evaluating prevention strategies for other cancers. Because the natural history of ovarian cancer is relatively unknown, testing of alternative models is critical. Although a few sophisticated models exist, development of additional models would be helpful, especially in the context of evaluating results from ongoing trials of screening. If any of these trials show a benefit from screening, then the need for better evidence on the diagnostic evaluation of adnexal masses will become even more critical.

Chapter 1. Introduction

Ovarian Tumors

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig1130.jpg.

   Figure 1. U.S. ovarian cancer incidence by age and race, 1992-2002

Source: Surveillance, Epidemiology, and End Results (SEER) Program ( www.seer.cancer.gov).2

Cancer of the ovaries is the leading cause of cancer death from gynecologic malignancies in the United States, with an annual incidence of over 25,000 and an annual mortality of approximately 14,000.1 Cancer incidence increases dramatically with age, being relatively rare prior to age 50 (Figure 1).

Table 1

Age-adjusted annual U.S. incidence and mortality per 100,000 women by race and ethnicity, 1992-2002
WhiteAfrican-AmericanAsian/Pacific IslanderNative AmericanHispanic
Incidence15.110.310.48.911.9
Mortality9.37.64.85.16.2

Source: Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov).2

Ovarian cancer incidence varies by race and ethnicity. Both incidence and mortality are highest for white women (Table 1).

Malignant tumors of the ovary can either arise in the ovary (primary ovarian cancer) or be the result of metastasis from another site, such as the breast or colon. Primary ovarian tumors, whether benign or malignant, can arise from three broad types of cells: the cells on the surface (epithelial cells); the cells that form eggs (germ cells); and the cells surrounding the eggs, including the cells that produce ovarian hormones (sex cord-stromal cells). Epithelial tumors are the most common type, accounting for 60 percent of all ovarian tumors and up to 90 percent of primary cancers. Sex-cord-stromal tumors account for 10 to 15 percent of all tumors, while germ cell tumors account for 25 percent of tumors. In general, sex cord-stromal tumors and germ cell tumors are relatively more common in younger premenopausal women. Thus, although ovarian cancer is relatively rare in younger women, when it does occur it is more likely to be a non-epithelial cancer than cancers in postmenopausal women.3

Within the broad classification of epithelial, sex cord-stromal, and germ cell tumors, tumors are further classified by the individual cell types from which they are derived. For example, the most common epithelial tumors are serous and mucinous tumors, the most common sex-cord stromal tumors are fibromas (arising from the connective tissue surrounding eggs), and the most common germ cell tumors are teratomas. Within each histological class, tumors can be benign or malignant, based on their ability to metastasize.3

Some epithelial tumors are classified as “borderline” or “low malignant potential” (LMP) tumors. These are tumors in which there is no invasion into the ovarian stroma, but for which histologic evidence of proliferation exists (increased cell division, changes in the appearance of the cell nucleus). There is controversy over whether these tumors represent pre-invasive cancer, and, if untreated, would go on to become a cancer, or whether they represent a subtype of tumor that has a relatively small chance of becoming a cancer.3 In estimating the diagnostic accuracy of tests for determining whether a mass is benign or malignant, whether LMP tumors are classified as benign or malignant can have an effect on the estimates of test performance, as we will discuss later in the report.

Ovarian cancer spreads primarily by dissemination throughout the peritoneal cavity; common sites of metastasis are the small and large bowel, the omentum, the liver, and the diaphragm. Spread to retroperitoneal lymph nodes is also common.

Treatment for ovarian cancer consists of surgical removal of the ovaries, fallopian tubes, and uterus (if present), along with as much metastatic disease as possible; if there is no obvious spread beyond the ovaries, the lymph nodes are sampled to determine if there has been lymphatic metastasis. Surgery is followed by chemotherapy, with responsiveness to chemotherapy depending on the amount of tumor left after surgical removal and the cell type of tumor, among other factors.3

The high case-fatality rate observed in ovarian cancer has largely been attributed to the fact that most ovarian cancers are diagnosed in advanced stages (Stage III, where the cancer has spread beyond the pelvis to organs of the upper abdominal cavity, and Stage IV, where the cancer has spread outside of the peritoneal cavity), when survival is poor. Stage I cancer (limited to the ovaries) has a survival rate of over 90 percent. Thus, there has long been an emphasis on early detection of ovarian cancer, in the belief that detection in early stages will lead to decreases in morbidity and mortality, just as cervical cancer screening has resulted in substantial reductions in morbidity and mortality from cervical cancer. The detection of a mass in the area of the ovaries and fallopian tubes (the uterine adnexae) raises the possibility of ovarian cancer, which necessitates further study to rule out malignancy.

This evidence report was prepared by the Duke Evidence-based Practice Center, in partnership with the Centers for Disease Control and Prevention (CDC) and the Agency for Healthcare Research and Quality (AHRQ). The purpose of the report is to provide followup data regarding key issues identified at two conferences sponsored by CDC, one in November 2000 on broad issues in preventing morbidity and mortality from ovarian cancer,4 and one in May 2002 on the use of ultrasound in the diagnosis of ovarian cancer.5

Definition of an Adnexal Mass

For the purposes of this report, we define an adnexal mass as an enlarged structure in the uterine adnexa which can either be palpated on a bimanual pelvic examination or visualized using radiographic imaging. The normal ovary is approximately 3 cm in length, decreasing in size after menopause.6 In terms of physical examination, the precise size definition used in the literature is quite variable and, in practice, may also vary depending on the ease with which the examination is performed, the patient's body habitus, the examiner's experience, the time taken during the exam, and the presence of other abnormalities such as uterine fibroids. Historically, because of the decrease in size after menopause, any palpable mass in a postmenopausal woman has been considered abnormal (the “palpable postmenopausal ovary syndrome”).7 As discussed below, some masses may ultimately prove to not be ovarian in origin.

The definition of an abnormal structure on radiologic imaging is also quite variable. Small fluid-filled cysts are quite common in both pre- and postmenopausal women. For the purposes of this report, we consider any structure observed during radiologic imaging that prompts additional evaluation (such as measurement of serologic markers or further imaging) as a mass.

Detection of an Adnexal Mass

There are three main clinical routes by which an adnexal mass may be detected. First, women with symptoms may have an adnexal mass detected as part of their evaluation for those symptoms, either by physical exam or radiographic imaging. Because ovarian cancer often presents with vague abdominal symptoms, we would consider any evaluation for symptoms to be in symptomatic women. Second, the mass may be detected as part of a routine health maintenance examination. Finally, it is possible that an asymptomatic mass could be detected during imaging done for another indication. In premenopausal women, the most likely scenario where this would occur would be during ultrasound evaluation during pregnancy. Another common scenario in perior postmenopausal women would be evaluation for uterine bleeding; because uterine bleeding is not a common symptom of ovarian cancer, a finding of an adnexal mass during evaluation for bleeding could be considered as an incidental finding. Because malignancy is rare during pregnancy, and because the technical considerations for both diagnosis and management are different, the most appropriate management of masses detected during pregnancy, especially if detected serendipitously by ultrasound, is outside of the scope of this report.

We did not identify any literature that would allow an estimate of the proportions of women with adnexal masses presenting by each route; as we will discuss, this is a major deficiency of the literature. The proportions are likely to vary by setting, referral patterns, patient thresholds for seeking care, physician thresholds for diagnostic tests, and other factors. For example, one gynecologic oncologist estimated that well over half of the referrals for evaluation in a large health maintenance organization were for incidentally detected masses (W. Kinney, personal communication).

Types of Adnexal Mass

Conditions that can present as an adnexal mass include:

  • Benign primary ovarian tumors - epithelial, sex cord-stromal, and germ cell;

  • Borderline and malignant ovarian tumors - epithelial, sex cord-stromal, and germ cell;

  • Metastatic malignant tumors - most commonly breast and gastrointestinal tract;

  • Masses arising from the fallopian tube - most commonly benign, including hydrosalpinx (a large, fluid-filled fallopian tube) and pyosalpinx (an infected, pus-filled fallopian tube); primary fallopian tube malignancies can occur, but are relatively rare.

  • Masses arising from the uterus - most commonly benign leiomyomas (fibroids);

  • Masses arising from the gastrointestinal tract - diverticula of the colon, large colonic tumors, tumors of the appendix;

  • Masses arising from the urinary tract - pelvic kidneys, diverticula of the ureter;

  • Masses arising from remnants of embryological development;

  • Endometriosis;

  • Pelvic inflammatory disease;

  • Cysts arising from normal ovarian functions, such as development of eggs (follicular cysts) and ovulation (corpus luteum cysts).

Management of the Adnexal Mass

With such a wide range of potential causes, and with a wide range of appropriate therapeutic options, precise diagnosis of a mass, especially in symptomatic women, is important. Once diagnosed, a mass may be managed in a variety of ways, ranging from observation to surgical removal and chemotherapy. However, a review of the test characteristics of various methods for obtaining precise diagnoses of specific conditions, and of the range of medical and surgical treatment options for each condition, is beyond the scope of this report. For our purposes, “management” of the adnexal mass refers to the process by which a mass is ultimately classified as benign or malignant.

Importance of Discriminating Benign from Malignant Masses

The clinical significance of discriminating benign from malignant masses differs depending on the clinical setting in which the mass is initially detected.

In women who initially present with symptoms, diagnosis of the underlying cause of the mass is important since it may help define available treatment options. Although medical therapy may relieve symptoms in some cases, surgical management is the treatment of choice for many conditions. Because surgery may ultimately be the most appropriate management for symptomatic adnexal masses, the main reason to discriminate between benign and malignant lesions is to facilitate referral and management by clinicians with specialized training and experience in managing ovarian malignancy, with improved outcomes.810

The other main group of women with adnexal masses consists of those without symptoms who have a mass detected through either physical examination or imaging. No organization currently recommends routine screening with serum markers or imaging for ovarian cancer.11, 12 The U.S. Preventive Services Task Force gives screening (including serum markers, imaging, or pelvic examination) a “D” recommendation (fair evidence against screening).13 However, because an annual pelvic examination continues to be recommended by professional organizations such as the American College of Obstetricians and Gynecologists (ACOG),11, 14 many asymptomatic women may have an adnexal mass detected during a periodic health maintenance examination. In this setting, discriminating benign from malignant disease is important not only to ensure appropriate management in the setting of malignancy, but also to avoid unnecessary diagnostic procedures, including surgery, and anxiety in women with asymptomatic, nonmalignant conditions. In some cases, there may be a rationale for removing certain asymptomatic benign lesions, including prevention of malignant transformation; prevention of ovarian torsion (a condition where the ovary twists and occludes its blood supply, causing abdominal pain and possibly resulting in loss of ovarian function); prevention of rupture, which might lead to acute symptoms or a worse prognosis (for example, in the case of endometriosis); prevention of more advanced or complicated surgery for a larger mass or more extensive pathologic process after the development of symptoms; and, for premenopausal women, possible enhancement of fertility. A review of the evidence (or lack of evidence) supporting these rationales is beyond the scope of this report.

Significance of Clinical Presentation in Evaluation of Management Strategies

As discussed above, the results of tests used to distinguish benign from malignant disease have different implications depending on whether the patient is symptomatic or asymptomatic. However, clinical presentation also has implications for interpretation of test results.

Diagnostic or screening tests are most commonly characterized by their sensitivity and specificity. The sensitivity of a test is the probability that, given the underlying presence of the disease, the test result will be positive; 100 percent minus the sensitivity is commonly called the false negative rate. The specificity of the test is the probability that, given the underlying absence of disease, the test result will be negative; 100 percent minus the specificity is commonly called the false positive rate. In an ideal evaluation, the sensitivity and specificity of the test are independent of the underlying probability, or prevalence, of disease.

Clinically, the more common scenario is that the clinician is aware of the test result and needs to know the probability of the presence or absence of disease. In this setting, the positive and negative predictive values of the test are more important.

The negative predictive value of a test is the probability that, given a negative test result, the patient truly does not have disease. It is a function of three parameters: the pretest probability of the disease, the sensitivity of the test, and the specificity of test:

(1 - Prevalence) * Specificity/[(1 - Prevalence) * Specificity] + [Prevalence * (1 - Sensitivity)]

As can be seen in the equation, the negative predictive value is much more dependent on test sensitivity than test specificity. Negative predictive value will be high when test sensitivity is high, and when prevalence is low (i.e., disease is rare).

Similarly, the positive predictive value is the probability that, given a positive test result, the patient actually has the disease. It is also a function of prevalence, sensitivity, and specificity:

Prevalence * Sensitivity/(Prevalence * Sensitivity) + [(1 - Prevalence) * (1 - Specificity)]

Positive predictive value is high when a test has high specificity, or when prevalence is high (disease is common).

For any given test, the positive predictive value will be higher and the negative predictive value lower when used in populations where the disease is common compared to populations where the disease is rare, while the positive predictive value will decrease and the negative predictive value increase as the disease becomes less common. This effect of prevalence on predictive values is independent of test sensitivity and specificity. The significance of the prevalence of disease in the population in which test characteristics are being evaluated is even more critical because, under some types of study design, disease prevalence can also affect estimates of sensitivity and specificity.15

Therefore, variations in the prevalence of malignancy among women with different clinical presentations will affect at least predictive values, and possibly sensitivity and specificity estimates. The prevalence of ovarian cancer clearly rises with age, so age and/or menopausal status are important considerations in evaluating management strategies in both the symptomatic and asymptomatic patient with an adnexal mass.

The prevalence of malignancy among asymptomatic women with an adnexal mass will be a function of the underlying prevalence or incidence of malignancy and the test characteristics of the initial test used to detect the mass. Evaluation of the different screening tests and strategies for early detection of ovarian cancer is beyond the scope of this report, especially since there are at least three large trials still ongoing.1618 However, in order to properly interpret the results of tests performed in asymptomatic women with pelvic masses, some estimate of the underlying probability of malignancy among these women is needed. Since many of these women are likely identified through a bimanual pelvic examination, deriving this estimate requires an assessment of the sensitivity and specificity of the pelvic examination. Symptomatic patients may be more likely to have an underlying adnexal malignancy, especially among postmenopausal women.19 In any series of women with adnexal masses, the proportion of women who are symptomatic and asymptomatic will likely determine the prevalence, and thus the predictive values of the diagnostic tests used to evaluate the mass.

Summary

In summary, this report focuses on the evidence relevant to establishing the most appropriate way to distinguish benign from malignant adnexal masses in both symptomatic and asymptomatic women. A key consideration throughout the report will be the underlying likelihood of malignancy in the populations studied, and the impact of this prevalence on the interpretation of the results of the reviewed studies. The results of this report are intended primarily to (a) provide a resource for clinicians and policymakers developing guidelines on management of adnexal masses, and (b) provide a resource for researchers and funding agencies in identifying gaps in our knowledge and research priorities.

Chapter 2. Methods

This section of the report describes the basic methodology used to develop the evidence report, including topic assessment and refinement, analytic framework, literature search strategies and results, literature screening and grading process and criteria, data abstraction and analysis methods, and quality control procedures.

Topic Assessment and Refinement

The Centers for Disease Control and Prevention (CDC) and the Agency for Healthcare Research and Quality (AHRQ) originally identified five key questions to be addressed by the report, focused on management of adnexal masses in peri- and postmenopausal women. The Duke research team clarified and refined the overall research objectives and key questions by first consulting with the two study sponsors, AHRQ and CDC, at which time two questions were added, and then by convening a panel of national experts who would serve as advisors to the project. These experts were selected to represent relevant specialties including radiology, obstetrics-gynecology, and gynecologic oncology, as well as national professional societies, including the American College of Obstetricians and Gynecologists (ACOG), the Society of Gynecologic Oncologists (SGO) and the American College of Radiology (ACR). Members of the technical expert panel were:

Susan Ascher, MD; Department of Radiology, Georgetown University Hospital; Washington, DC (ACR)

Michael L. Berman, MD; Division of Gynecologic Oncology, UCI Medical Center; Orange, CA (SGO)

Barry B. Goldberg, MD; Department of Radiology, Thomas Jefferson University Hospital; Philadelphia., PA (ACR)

Edward E. Partridge, MD; Department of Obstetrics and Gynecology, University of Alabama, Birmingham; Birmingham, AL (American Cancer Society)

George F. Sawaya, MD; Department of Obstetrics and Gynecology, University of California, San Francisco; San Francisco, CA

Howard T. Sharp, MD; University of Utah Hospitals and Clinics; Salt Lake City, UT (ACOG)

Stanley Zinberg, MD, MS; ACOG; Washington, DC

As a result of an initial conference call with the technical experts, AHRQ, and CDC, the Duke research team modified the key research questions originally proposed in the Task Order in two fundamental ways: (1) The questions were expanded to include women of all ages, and (2) Question 6 would include laparotomy data, where available. After review of a draft version of the report by the technical experts and additional reviewers, the order of the questions was also changed to allow a more logical flow.

The key questions addressed by this report are:

Question 1: What is the prevalence of various tumor types among women with an adnexal mass, stratified by cancer status (malignant vs. benign), age, menopausal status, and size of tumor?

Question 2: What are the sensitivity, specificity, and reproducibility of the bimanual pelvic examination?

Question 3: Among women with a palpable adnexal mass on exam or a mass identified by ultrasound/imaging, what is the sensitivity/specificity of various evaluation modalities including ultrasound (transvaginal ultrasound, transabdominal ultrasound, color Doppler, two-dimensional [2D] vs. three-dimensional [3D] ultrasound), computer tomography (CT) scan, magnetic resonance imaging (MRI) scan, and cancer antigen 125 (CA-125) levels for diagnosing malignant masses?

Question 4: What is the accuracy of explicit scoring systems which incorporate various combinations of imaging findings, patient risk factors, and/or CA-125 levels for detecting malignancy? Have these scoring systems been applied to a population of women before laparoscopy or laparotomy?

Question 5: Among women with suspected benign masses on initial investigation, what are the sensitivity and specificity of monitoring with periodic CA-125 and/or interval ultrasound examinations for detecting malignant masses? How does the interval of testing/definition of change affect sensitivity and predictive value?

Question 6: Among women with adnexal masses, what are the morbidity and mortality from diagnostic surgery (laparoscopy or laparotomy)? At what point does the risk of surgery outweigh the risk of detecting malignancy?

Question 7: What are the estimated trade-offs resulting from various strategies for evaluation of the adnexal mass?

Analytic Framework

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig2130.jpg.

   Figure 2. Analytic framework for evidence report (numbers refer to key questions)

Based on the original proposal and discussions with CDC, AHRQ, and the technical expert panel, we developed the following analytic framework to structure our review and synthesis (Figure 2).

Comments on this analytic framework are as follows:

  • Separate consideration of age or menopausal status is important, since several factors that may affect the probability that a given adnexal mass is malignant may vary with age and/or menopausal status: the underlying incidence of various conditions that result in an adnexal mass, the frequency of contact with clinicians, the type and length of followup, and the prevalence of other conditions that may cause symptoms similar to those caused by ovarian malignancy or other symptomatic pelvic pathology. Race/ethnicity may also play a role, both in the relative likelihood of malignancy and the likelihood of other conditions and contact with clinicians.

  • A variety of conditions, both benign and malignant, can cause a mass in the adnexa. The underlying prevalence of each type of condition, along with the sensitivity and specificity of the initial diagnostic test, will determine the proportion of patients with a given test result who are truly disease-free, or who truly have disease. The evidence on the prevalence of these conditions is reviewed in Question 1.

  • Women can present with an adnexal mass in one of two ways - through presentation with symptoms and subsequent detection of a mass through a physical examination, or through detection of a mass in an asymptomatic woman during physical examination or an imaging study. The ultimate probability of malignancy may vary based on how an adnexal mass is initially detected, since the prevalence of malignancy at this stage will drive the positive and negative predictive values of all subsequent tests. Because many women will initially have their masses detected through a bimanual pelvic examination, we review the evidence on the sensitivity and specificity of this component of the physical examination in Question 2.

  • After the initial diagnosis of an adnexal mass, the choice of the next test will provide a revised estimate of the probability of a given disease. Although determining this probability is important in the symptomatic patient so that she may receive appropriate therapy, it is even more important in the asymptomatic patient, who runs the risk of undergoing unnecessary surgery for a benign condition if the test is falsely positive. Question 3 addresses the sensitivity and specificity of tests commonly used as “next step” diagnostic procedures.

  • Frequently, a combination of various test results and patient characteristics can provide better discrimination between diseased and non-diseased, or benign and malignant, than any single test parameter. Question 4 addresses the performance of various multivariate scoring systems in discriminating benign from malignant masses.

  • Because 100 percent sensitivity is difficult to achieve, some tests will be falsely negative. One strategy to minimize the consequences of a false negative test would be to monitor the patient with a specified test or tests, at a specified frequency, for a specified duration. Question 5 addresses the evidence for the effectiveness of such an approach, and which combination of test, test frequency, and duration of followup offers optimal performance.

  • The ultimate diagnosis of ovarian malignancy requires surgical exploration, either through laparoscopy or laparotomy. Although an adverse outcome of surgery is not desirable under any circumstances, patients who undergo surgery because of a symptomatic mass have the possibility of improvement in symptoms, while, for patients who ultimately prove to have an ovarian malignancy, surgical management with adequate staging and reduction in tumor bulk appears to improve outcomes. However, for patients with some asymptomatic benign masses, the benefits of surgery may be less clear while providing substantial risks. Question 6 addresses the risks of diagnostic surgery, both laparoscopy and laparotomy, for women with adnexal masses.

  • Finally, estimating the benefits, harms, and costs of various management strategies, including screening, for ovarian cancer is complex. Synthesizing the wide range of data and incorporating uncertainty, as well as missing data, can often be done using simulation models. Question 7 presents an initial attempt at summarizing the likely outcomes of several different diagnostic strategies. Because modeling the natural history of ovarian cancer will ultimately be important for comprehensive analyses of different screening and diagnostic strategies, we also review existing models for the natural history of ovarian cancer with special attention paid to underlying assumptions.

Literature Search and Review

Sources

The primary sources of literature were MEDLINE® (1966-September 2004) and the Cochrane Database of Systematic Reviews. Searches of these databases were supplemented by reviews of reference lists contained in all included articles and in relevant review articles and meta-analyses.

Search Strategies

The basic search strategy used the National Library of Medicine's Medical Subject Headings (MeSH) key word nomenclature developed for MEDLINE® and was adapted for use in the other databases. The searches were limited to the English language. The texts of the three major search strategies are given in Appendix A.* The searches yielded a total of 677 citations, whose records are maintained in a ProCite20 database.

Abstract and Full-text Screening

Paired researchers from the Duke research team independently reviewed a set of abstracts and classified each as “include” or “exclude” according to study-specific criteria, which they developed. An abstract was included if at least one of the paired reviewers recommended that it be included. A total of 445 abstracts were included for the further “full-text review” stage. Interrater reliability for include/exclude decisions was tested by having 10 pairs of readers review 138 abstracts. Agreement was good to excellent (kappa 0.66 to 0.95).

At the full-text review stage, the paired researchers independently reviewed a set of the articles, and indicated a decision to “include” or “exclude” the article for the data abstraction stage. When a pair of reviewers arrived at a different opinion about whether to include an article, they were asked to reconcile the difference. Detailed inclusion and exclusion screening criteria were developed by research question and are listed below.

Full-text Screening Criteria

Initially, the patient population was limited to peri- and postmenopausal women, and only articles that provided data specifically by age or menopausal status were included. After initial discussion with the expert panel, the search was expanded to include premenopausal women.

Question 1. Background clarifications were as follows:

  • (1)

    The search should be limited to (a) screening studies and (b) case series of women with an undiagnosed mass (not just women who went to laparoscopy/path diagnosis).

  • (2)

    Pathology list:

    • a

      Benign

      • i

        Uterine leiomyoma

      • ii

        Nonneoplastic cysts, such as:

        • 1

          Follicular (functional) cysts

        • 2

          Corpus luteal (functional) cysts

        • 3

          Theca lutein cysts

        • 4

          Simple cysts

        • 5

          Peritoneal inclusion cysts

        • 6

          Paraovarian cysts

        • 7

          Hemorrhagic cysts

        • 8

          Endometrial cyst

      • iii

        Polycystic ovary disease

      • iv

        Cystic teratoma (dermoid cyst)

      • v

        Hydrosalpinx,

      • vi

        Cystadenoma

      • vii

        Fibroma

    • b

      Malignant ovarian neoplasms

      • i

        Adenocarcinoma

      • ii

        Others

    • c

      Tumors of low malignant potential

Screening criteria for Question 1 were:

  • (1)

    undiagnosed mass (regardless of whether symptomatic or asymptomatic; detected by palpation or ultrasound imaging);

  • (2)

    exclude if n < 50; if n ≥ 50, write n on decision sheets;

  • (3)

    histology diagnosis;

  • (4)

    screened women without mass (case series or cohort) or women with adnexal mass (case series).

Question 2. Screening criteria were as follows:

  • (1)

    comparison of bimanual pelvic examination to a reference standard;

  • (2)

    n ≥ 20;

  • (3)

    able to construct 2-by-2 table for test characteristics.

Question 3. Screening criteria were as follows:

  • (1)

    undiagnosed mass (regardless of whether symptomatic or asymptomatic; detected by palpation or ultrasound imaging) or screening population;

  • (2)

    disease status distinguishes malignant from non-malignant;

  • (3)

    must have 20 or more subjects;

  • (4)

    disease status must be verified by histology or negative surgery (laparoscopy/laparotomy);

  • (5)

    test is ultrasound, CT, MRI, PET, serum CA-125, or bimanual pelvic exam;

  • (6)

    able to construct 2-by-2 table for test characteristics.

Question 4. Screening criteria were as follows:

  • (1)

    patients with cancer;

  • (2)

    studies with scoring, risk score, combined modality approach;

  • (3)

    assesses predictive value of two or more variables (radiographic, patient characteristics or CA-125) using multivariable model;

  • (4)

    screening studies;

  • (5)

    n ≥ 50.

Question 5. Screening criteria were as follows:

  • (1)

    n ≥ 50;

  • (2)

    histology or followup interval = at least 9 months;

  • (3)

    outcome = continued negative test with no clinical evidence of developing ovarian cancer.

Question 6. Screening criteria were as follows:

  • (1)

    procedure = operative laparoscopy for adnexal mass, with or without biopsy;

  • (2)

    addresses complications of procedure (morbidity or mortality);

  • (3)

    n ≥ 100 for morbidity.

Question 7. Screening criterion was as follows: article described mathematical or computer model of natural history of ovarian cancer.

Table 2

Results of abstract screening and full-text review
Articles identified1,023
Abstracts reviewed1,023
 Included445
 Excluded578
Full-text articles reviewed445
 Included204
 Excluded269

The combined number of included (204) and excluded (269) articles exceeds the total 445 reviewed at the full-text level because 28 articles were considered excluded for one question, but included for another question.

Table 3

Included full-text articles by research question
QuestionNumber of articles
Question 1: Prevalence of tumor types20
Question 2: Bimanual pelvic examination14
Question 3: Single modality tests153
Question 4: Explicit scoring systems36
Question 5: Monitoring women with suspected benign masses9
Question 6: Surgical morbidity and mortality24
Question 7: Modeling diagnostic strategies4
Total number of included articles204

Some articles were included for more than one question.

Summaries of the results of the abstract screening and full-text review are provided in Tables 2 and 3. A list of excluded articles by reason for exclusion is found in Appendix B.*

Data Abstraction and Development f Evidence Tables

The Duke research team developed and piloted evidence table formats for abstracting data to answer each of the seven research questions (see Appendix C *). Based on clinical expertise, a pair of researchers was assigned to one of the seven research questions to abstract the data from the eligible articles. One of the paired researchers abstracted the data into the evidence tables, and the second researcher over-read the article and accompanying evidence table to check for accuracy and completeness. The completed evidence tables are provided in Appendix D.*

Quality Assessment Criteria

At the data abstraction stage, the researcher was asked to evaluate each included article for factors affecting internal and external validity. The quality assessment criteria varied by question and are listed below. Researchers were instructed to assign a + or - to each item, and provide a brief rationale for each decision.

Quality criteria were as follows:

Question 1: What is the prevalence of various tumor types among women with an adnexal mass, stratified by cancer status (malignant vs. benign), age, menopausal status, and size of tumor?

Question 2: What are the sensitivity, specificity, and reliability of the bimanual pelvic examination?

Question 3: Among women with a palpable adnexal mass on exam or a mass identified by ultrasound/imaging, what is the sensitivity/specificity of various evaluation modalities including ultrasound (transvaginal ultrasound, transabdominal ultrasound, color Doppler, 2D vs. 3D ultrasound), CT scan, MRI scan, and CA-125 levels for diagnosing malignant masses?

Question 4: What is the accuracy of explicit scoring systems which incorporate various combinations of imaging findings, patient risk factors, and/or CA-125 levels for detecting malignancy? Have these scoring systems been applied to a population of women before laparoscopy or laparotomy?

Question 5: Among women with suspected benign masses on initial investigation, what are the sensitivity and specificity of monitoring with periodic CA-125 and/or interval ultrasound examinations for detecting malignant masses? How does the interval of testing/definition of change affect sensitivity and predictive value?

Question 6: Among women with adnexal masses, what are the morbidity and mortality from diagnostic surgery (laparoscopy or laparotomy)? At what point does the risk of surgery outweigh the risk of detecting malignancy?

Question 7: What are the estimated trade-offs resulting from various strategies for evaluation of the adnexal mass?

Additional Analyses

Test Characteristics and Confidence Intervals

For test characteristics, a Microsoft Excel® spreadsheet was developed which calculated appropriate test characteristics (sensitivity, specificity, negative predictive value, positive predictive value) for individual studies if studies provided enough data to input (a) values for individual cells of a 2-by-2 table, (b) the prevalence of disease and values for sensitivity and specificity, or (c) sufficient data to solve for two equations involving sensitivity, specificity, or predictive values. Ninety-five percent confidence intervals were automatically estimated using the approximate formula for proportions:

graphic element
, where p = point estimate of proportion, N = total sample size.

Prevalence and Event Rates and Confidence Intervals

For Questions 3 and 6, prevalence of different mass types, and morbidity and mortality rates, were also calculated using the above formula. For studies where the numerator of a particular proportion was 0, the upper bound was estimated using the formula:

graphic element
, where p = 2/(N + 2).

Estimation of Pooled Sensitivity and Specificity

For Questions 2, 3, and 4, we used two complementary methods for assessing diagnostic test performance: (1) summary receiver operating characteristic (ROC) analysis; and (2) independently combined sensitivity and specificity values. We calculated pooled sensitivity and specificity estimates, along with 95 percent confidence intervals and summary ROC curves, using Meta-Stat 0.6, a shareware program for performing meta-analyses of diagnostic tests.22 In this software, logits of sensitivity and specificity values are pooled, using a random-effects model weighted by the inverse of the variance.23

We combined the sensitivity and specificity values of the tests across studies using a random-effects model to estimate the average values. A random-effects model incorporates both the within-study variation (sampling error) and between-study variation (true treatment-effect differences) into the overall treatment estimate. It gives a wider confidence interval than the fixed-effect model (which considers only within-study variability) when estimates are based on heterogeneous results.

When each is combined separately, sensitivity and specificity tend to underestimate the true test sensitivity and specificity; however, they can provide an indication of the approximate test operating point for most of the studies.

Summary ROC curves are a potentially useful graphical summary of the diagnostic test performance data. In brief, each study provides a pair of sensitivity and specificity values to the analysis. After logistic transformation of data, a linear model is fitted to the observed studies using regression analysis. This best-fit model can then be transformed back to ROC space and plotted as curve. A summary ROC curve can be thought of as an ROC curve that describes joint changes in sensitivity and specificity with changes in cutoff values. The ideal position of an ROC curve is near the upper left corner. The area under the curve (AUC) is another summary measure of the degree of discrimination of a test.

The summary ROC method assumes that the variability in the reported sensitivity and specificity values from different studies is due to different cutoff values (explicit or implicit) being applied.24 However, the summary ROC curve can summarize studies whose variability may be due to other sources of variation, since the summary ROC curve no longer ties specific cutoff values to specific intervals of the curve. One can think of a summary ROC curve as an overall estimate of the discrimination ability of a test.

When there is little variability in the test results - i.e., when studies appear to be operating at similar thresholds and report similar results - summary ROC analysis provides little additional information. In this case, separately averaged sensitivity and specificity values across studies will give similarly useful summary information. However, where there is substantial variability in test results, the separately averaged sensitivity and specificity values tend to have wide confidence intervals and have means that do not characterize any of the studies. In this case, SROC curves provide a more suitable analysis framework.

Estimates of National Rates of Surgery for Adnexal Mass

The Nationwide Inpatient Sample (NIS) is a public access database maintained by AHRQ. The NIS represents a stratified sample of approximately 20 percent of all discharges from U.S. hospitals; data for the year 2000 contain administrative discharge data from hospitals in 28 states, while 2001 contains data from 33 states.25 Weights are provided in order to allow estimation of national data based on this sample. We used data from 2000 and 2001 to provide supplemental data on the frequency of diagnostic laparoscopy and exploratory laparotomy for Question 6. Because previous work has shown that administrative data may lack sufficient clinical detail to compare outcomes,26 we did not attempt to directly compare complication rates between these procedures, or between diagnoses.

The search was limited to women 15 years and older, who had one of the following International Classification of Diseases, Ninth Revision (ICD-9) diagnostic codes: 183.x (malignant neoplasm of the ovary and other uterine adnexa), 220.x (benign neoplasms of the ovary); 620.x (ovarian cysts); 752.11 (para-ovarian cysts); 614.0, 614.1, 614.2, 614.6 (adnexal masses secondary to pelvic inflammatory disease); 789.33, 789.34, 789.39 (abdominal masses arising in the left or right lower quadrant, or other nonspecified site); and V655 (normal findings after diagnostic evaluation).

In order to avoid overestimation of complication rates due to other procedures, we then excluded patients who had an ICD-9 diagnosis code for hysterectomy (68.x). Procedures were then classified as laparoscopy only (54.21), laparoscopy with conservative ovarian surgery (65.3x, 65.4x, 65.5x, 65.6x), laparoscopy with oophorectomy (65.0×, 65.2x), or laparotomy (54.11) alone, with conservative ovarian surgery (same codes), or with oophorectomy (same codes).

A discharge status of “Dead” indicated in-hospital mortality. Complications of surgery or hospitalization were indicated by diagnosis codes of E870 through E876.

Model of Natural History of Ovarian Cancer

We used a Markov state-transition model to explore the impact of alternate assumptions about the natural history of ovarian cancer. The original model was developed as a graduate school project by Karen Hoffman, MD, and further refined in collaboration with two of the authors of this report (Drs. Kulasingam and Myers).

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig3130.jpg.

   Figure 3. Schematic of natural history model

Table 4

Model variables
Variable descriptionModel abbreviation of variableValueRange varied
Probability of clinical diagnosis for each stage (I, II, III, or IV) if no screening test or if screening produces a false negativepClinDxStageI 0.261 Calibrated
pClinDxStageII 0.446
pClinDxStageIII 0.837
pClinDxStageIV 0.950
Probability of dying from diagnostic exploratory laporotomypLapDeath0.000230.00 to 0.0010
Probability of dying from each stage of cancer, based on 5-year survival ratespDieStageI 0.051 Not varied
pDieStageII 0.187 Not varied
pDieStageIII 0.691 Not varied
pDieStageIV 0.691 Not varied
Probability of developing Stage I cancer, based on ovarian cancer incidence ratestCompIncVaries with age
Probability of dying from a cause other than ovarian cancertMortCaAdjVaries with age
The model simulates a cohort beginning at age 40 distributed across cancer stages. Subjects progress from no cancer through the stages of ovarian cancer to death. Each cycle is 12 months long. The original model design is illustrated in Figure 3; subsequent modifications include removal from the at-risk population by undergoing oophorectomy for another cause, and allowing some Stage I cancers the possibility of progressing directly to Stage III. Model variables and the ranges over which they were varied are outlined in Table 4. Probability of progressing from no cancer to Stage I cancer varies by age and is based on age-adjusted ovarian cancer incidence rates. Because the probability of progression (or duration of time within a stage) is unknown, probability of progression from Stage I to II, from Stage II to III, and from Stage III to Stage IV was adjusted to reflect incidence distribution across stages. Within the model, subjects may die from causes other than ovarian cancer. The probability of dying from a cause other than ovarian cancer varies by age and was constructed from CDC National Vital Statistics reports and Surveillance, Epidemiology, and End Results (SEER) data.27, 28 Probability of clinical diagnosis is based on the annual report of the International Federation of Gynecology and Obstetrics (FIGO).29 Five-year survival rates gathered by SEER 1992-98 were used to predict probability of dying from ovarian cancer.27 SEER localized disease corresponds to Stage I cancer, regional disease corresponds to Stage II cancer, and distant disease corresponds to Stage III/IV ovarian cancer. The model was constructed in DATA 4.0.30

Abbreviations for probabilities are described in Table 4, below.

Peer Review Process

We employed internal and external quality-monitoring checks through every phase of the study to reduce bias, enhance consistency, and verify accuracy. Examples of internal monitoring procedures include: three progressively stricter screening opportunities for each article (abstract screening, full-text article review, data abstraction review); involvement of three individuals (two clinicians and copy editor) in each data abstraction; agreement of at least two clinicians on all included studies.

Our principle external quality-monitoring device was the peer-review process. Nominations for peer reviewers were solicited from several sources, including a technical expert panel and interested federal agencies. The list of nominees was forwarded to the Agency for Healthcare Research and Quality (AHRQ) for vetting and approval. A final list of peer reviewers is provided in Appendix E.*

Chapter 3. Results

Question 1: Prevalence of Tumor Types

Question 1 is: What is the prevalence of various tumor types among women with an adnexal mass, stratified by cancer status (malignant vs. benign), age, menopausal status, and size of tumor?

Approach

We included studies in the U.S. population with more than 50 women and limited the literature search to screening studies and case series where results were provided for all women with an undiagnosed mass, not just those with subsequent positive additional tests.21 Studies of adnexal mass in which the gold standard is applied only to those with positive tests results would underestimate the prevalence of disease and cause a substantial bias.

Results

Twenty articles met the inclusion criteria and are described in the Evidence Table 1 (Appendix D *).3150

Table 5

Prevalence of tumor types in screening studies*
StudyN% MenopausalMalignantBorderlineBenign
DePriest et al.,1993363,220100; most had positive family history of breast, ovarian, or colorectal cancer0.09%Not reported1.3%
DePriest et al.,1997346,470Either menopausal or had positive family history of breast (30%), ovarian (24%), or colorectal cancer (15%)0.11%Not reported1.2%
Modesitt et al., 20034015,1061000.18%Not reported0.8%
Van Nagell et al., 20004914,469Either menopausal or had positive family history of breast (34%), ovarian (23%), or colorectal cancer (23%)0.1%0.02%1.1%
*

Note: All four publications represent the same screening study at different times.

Detailed prevalences for specific tumor types are provided in Evidence Table 1. The included studies can be divided into two groups. The first group includes four reports from a large screening study in Kentucky (Table 5). The prevalence of malignancy ranged from 0.09 to 0.18 percent. In postmenopausal women, the prevalence of malignancy was 0.09 to 0.18 percent, borderline tumors were not reported, and the prevalence of benign tumors was 0.08 to 1.3 percent. In a population that included either postmenopausal women or those with a family history of breast, ovarian, or colorectal cancer, the prevalence of malignancy was 0.10 to 0.11 percent, of borderline tumors 0.02 percent, and of benign tumors 1.1 to 1.2 percent.

The most common malignant tumor types include primary ovarian carcinoma, such as serous and mucinous cystadenocarcinoma, granulosa cell tumors, and undifferentiated adenocarcinoma. Borderline tumors were less common, such as serous low malignant potential (0.02 percent). The most common benign tumors were serous cystadenoma (0.4 to 0.7 percent), paratubal cyst (0.1 to 0.16 percent), endometrioma (0.03 to 0.3 percent), and mature teratoma (0.02 to 0.08 percent).

Table 6

Case series and retrospective medical record reviews
StudyDenominatorLocationAge, menopausal status, raceMalignantBorderlineBenign
Childers et al.,199632138AZ5213.8%Not reported86.2%
Dottino et al.,199937160NY52.28.1%5%86.9%
53% post
91% white
Fleischer et al., 19963862TN5050%Not reported50%
>50% post
Lin et al., 19933980NY5657.5%2.5%40%
76% post
90% white
Parker et al., 19944161Multi-site65NoneNone100%
100% post
Roman et al., 199742226CA20% post11.5%7.5%81%
Schneider et al., 19934355AZ5325.5%3.6%70.9%
60% post
Scoutt et al., 199444109CT4020.2%Not reported79.8%
Shen-Gunther et al., 200245125OK/NV5844.8%9.6%45.6%
82% white
Smikle et al., 199546195TX40% post13.3%Not reported86.7%
*Chalas et al., 199231241NYNot reported50.2%7.5%42.3%
Cohen et al., 20013371IL22–8018.3%1.4%80.3%
44% post
DePriest et al., 199335121KY3–7410.7%Not reported89.3%
49% post
Troiano, 199747144CT4511.8%2.1%86.1%
29% post
Twickler et al., 199948244TX38.65.7%6.6%87.7%
Vasilev et al., 198850182CANot reported8.2%1.6%90.1%
*

Retrospective chart review

The majority of U.S. studies with histological diagnosis of all masses (n = 16) were case series of women with undiagnosed adnexal mass undergoing laparotomy (Table 6). The prevalence of malignancy ranged from 5.7 to 57.5 percent, the range of borderline tumors was 1.4 to 11.2 percent, and the prevalence of benign tumors was 40 to 100 percent. All tumor types were over-represented because patients had an undiagnosed adnexal mass, and the clinical presentation was not well described in the majority of studies. Most studies included both premenopausal women and postmenopausal women and did not provide results separately. The one study that included only postmenopausal women41 found only benign tumors.

Discussion

Estimating the age-specific prevalence of specific adnexal tumor types from the available literature is difficult. The best data come from a series of reports from a large screening study; overall prevalence of masses was 1 to 2 percent, with benign masses outnumbering malignant by 4- to 10-fold. Because patients with negative screening test results did not undergo definitive diagnostic procedures in these studies, the prevalence estimates are dependent on the sensitivity of the screening tests used (and the completeness of followup among test negatives). In addition, there is a potential bias in that premenopausal women enrolling in the screening study were at higher risk than average because of family history; in addition, postmenopausal women may have been more likely to enroll because of concerns based on family history, vague symptoms, or other reasons which would affect relative prevalence compared to the general population.

Estimates of prevalence in studies with 100 percent histologic diagnosis are inevitably biased by the clinical factors that determine which patients ultimately undergo surgery. These can include the presence and nature of symptoms (patients with symptoms referable to a mass would likely undergo surgery sooner than those with asymptomatic masses, all other things being equal); other findings (for example, the presence of ascites); patient anxiety; the diagnostic algorithms used (for example, the duration of followup for persistence); and the nature of the practice (malignancies will be more frequent in a gynecologic oncology practice compared to a general gynecology practice).

As mentioned previously, we did not include studies from outside the United States. Given differences in ethnic backgrounds (affecting genetic risks), observed differences in cancer incidence, and differences in clinical practice between countries, and the almost universal failure of studies to describe the clinical history leading to the diagnosis of adnexal mass, inclusion of these studies would not have allowed a more precise estimate of prevalence of different types of adnexal masses in the U.S. population.

Summary

In four reports from a large U.S. screening study, the prevalence of adnexal masses detected by ultrasound among postmenopausal women was 0.8 to 1.3 percent, and the prevalence of malignancy 0.09 to 0.18 percent (i.e., 9 to 18 per 10,000). Prevalence of different pathologies varies widely among case series. There are no data on the relative prevalence of different pathologies among women with asymptomatic masses compared to women with symptomatic masses.

Question 2: Bimanual Pelvic Examination

Question 2 is: What are the sensitivity, specificity, and reliability of the bimanual pelvic examination?

Approach

Articles were sought which evaluated the ability of the bimanual examination to detect adnexal masses, and/or to discriminate benign from malignant masses. Preference was given to studies where there was histological confirmation of the diagnosis, but an alternative reference standard (such as followup) was allowed for screening studies. Data allowing calculation of sensitivity and specificity had to be provided.

Our rationale for including the pelvic examination was based on its role in the initial evaluation of adnexal masses. Some asymptomatic women will have a mass detected as part of a “routine” physical examination; others will have a mass detected as part of an examination performed because of symptoms. The postexamination probability of malignancy is a function of the prevalence of cancer and the sensitivity and specificity of the bimanual examination; these probabilities, in turn, will affect the positive and negative predictive values of additional tests such as cancer antigen 125 (CA-125) and imaging studies. Because the pelvic examination will be the first test performed, either as a screening test or as a diagnostic test, knowledge of its test characteristics is important for evaluating subsequent diagnostic tests.

Results of Literature Search and Screening

We identified 14 studies that met the inclusion criteria.42, 5163 Nine studies provided data on discrimination between benign and malignant masses,42, 51, 5357, 62, 63, four on the ability of the bimanual examination to detect any adnexal mass,52, 5961 and one provided data on both discrimination between benign and malignant and ability to detect masses.58 All 14 studies are summarized in Evidence Table 2 (Appendix D *).

Study Characteristics

Types of data incorporated. Two of the studies54, 56 included history or clinical impression as part of the “test;” results were not provided separately for examination alone.

Types of study population. Ten of the 14 studies were performed prior to surgery for an adnexal mass, while four were from screening studies.51, 52, 57, 58

Reporting of study populations. Of the screening studies, Andolf et al.52 was performed in women over 40 considered at high risk of ovarian cancer because of symptoms or risk factors; Grover and Quinn57 was performed in asymptomatic volunteers 25 and older, but described menopausal status; Adonakis et al.51 was performed in women over 45; and Jacobs et al.58 was done entirely in a postmenopausal population.

Seven of the 11 preoperative studies reported menopausal status, but only two reported on test characteristics specifically by menopausal status.55, 56 None reported race/ethnicity, and none reported the clinical route by which patients had come to surgery (detection of an asymptomatic mass, symptoms, etc.).

Methodology. The methodological quality of the included studies was as follows:

Reference standard. Of the preoperative studies, all but one42 had operative confirmation of findings. Ultrasound was used as the reference standard in the four screening studies, with 12-month followup examinations or questionnaires.

Verification bias. In the study by Roman et al.,42 26 women with non-palpable masses did not undergo definitive diagnosis.

Test reliability. Only one study60 provided direct data on test reliability. Grover and Quinn,57 Ong et al.,59 Schutter et al.,63 and Buckshee et al.54 used a single examiner. The other studies did not address the issue of test reliability.

Sample size. None of the reports had a priori sample size calculations.

Use of appropriate statistical tests. All reports used appropriate techniques for calculating test characteristics.

Blinding. Only two studies54, 60 explicitly stated whether examiners were blinded to prior history or other findings.

Definition of positive and negative test. Nine of 14 studies reported their definitions of a positive test, although the precision of the definitions was quite variable (from “a mass 5 cm or more in diameter” to “larger than normal”); others relied on “clinical impression.”

Results

Table 7

Sensitivity and specificity of pelvic examination in detecting the presence of an adnexal mass
StudyNSensitivity (95% CI)Specificity (95% CI)% with confirmed massNotes
Jacobs et al., 1988581,01084.6% (65.0 to 100%)98.3% (97.5 to 99.1%)1.3% (0.1% malignant)Reference standard: ultrasound
Screening study
Andolf et al., 19905280133.7% (26.5 to 41.0%)92.0% (89.9 to 94.1%)20% (0.1% malignant)Reference standard: ultrasound by midwife
Screening in women considered at high risk for ovarian cancer; no ovarian cancers detected: 2 endometrial cancers, 1 LMP detected
Padilla et al., 20056125215.6% (8.1 to 23.0%)93.8% (90.1 to 97.5%)35.7% (unclear if any malignacies)Exam under anesthesia prior to surgery for pelvic mass; examiners blinded to radiology findings
Likelihood of not detecting an adnexal mass increased with less experience (OR for resident 1.13, student 1.36 compared to attending, although 95% CIs cross 1).
Statistically significant increase in missed diagnosis if subject with BMI > 30 (OR 2.57; 95% CI, 1.36 to 4.87), and significant decrease in presence of enlarged uterus (OR 0.48; 95% CI, 0.25 to 0.93).
Final diagnoses not presented, reasons for surgery not systematically presented
Padilla et al., 200060140 (82 masses)Left adnexa (attending exam): 32.7% (19.5 to 45.8%)Left adnexa (attending exam): 88.5% (81.4 to 95.6%)58% (0 malignancies)Exam under anesthesia prior to surgery for pelvic mass; examiners blinded to radiology findings; no clear relationship to experience
Right adnexa (attending exam): 21.2% (7.3 to 35.2%)Right adnexa (attending exam): 78.7% (70.4 to 87.0%)
Ong et al., 1996598671.9% (60.9 to 82.9%)59.1% (38.5 to 78.6%)74.4% (0 malignant)Pre-surgical exam

Abbreviations: BMI = body mass index; CI = confidence interval; LMP = low malignant potential tumor; OR = odds ratio

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig4130.jpg.

   Figure 4. Performance of bimanual pelvic examination for detecting the presence of an adnexal mass

Key to Figure 4: 1 = Jacobs et al., 1988;58 2 = Andolf et al., 1990;52 3 = Padilla et al., 2005;61 4 = Roman et al., 1997;42 5 = Padilla et al., 2000;60 6 = Ong et al., 199659

Table 7 and Figure 4 present the results of studies that evaluated the sensitivity of the bimanual examination for detecting an adnexal mass. The studies of Padilla et al.60, 61 are particularly striking for the low sensitivity, since the examinations were performed under anesthesia, when, presumably, patient discomfort would not be a limiting factor. Both studies suggested a relationship between experience and accuracy; medical students performed worse than residents, who performed worse than attending physicians. Although these differences were not statistically significant, the studies were underpowered to detect significant differences. Obesity, defined as a body mass index greater than 30, had a significant negative impact on sensitivity, while increasing uterine size increased sensitivity, possibly by elevating the adnexae out of the pelvis.

When sensitivity and specificity were combined separately using a random-effects model, the pooled sensitivity was 0.45 (95% confidence interval [CI], 0.28 to 0.68), and the pooled specificity was 0.90 (0.80 to 0.96).

Table 8

Sensitivity and specificity of pelvic examination in discriminating benign from malignant adnexal masses
StudyNSensitivity (95% CI)Specificity (95% CI)% MalignantNotes
Adonakis et al., 1996512,00066.7% (13.3 to 100%)97.2% (96.5 to 97.9%)0.15%Screening study; threshold of “abnormal or ambiguous exam;” CA-125 used in conjunction to proceed to ultrasound
Grover et al., 1995572,6230% (0 to 100%)98.5% (98.0 to 98.9%)0.05%Screening study; ultrasound and clinical followup
Jacobs et al., 1988581,010100% (0 to 100%)97.3% (96.3 to 98.3%)0.1%Screening study; followup with ultrasound
Roman et al., 19974220051.2% (36.3 to 66.1%)83.6% (77.8 to 89.4%)21%Results for 26 patients with non-palpable masses not included; no substantial difference based on menopausal status
Buckshee et al., 1998543477.8 % (50.6 to 100%)88.9% (77.0 to 100%)25%One examiner; non-consecutive patients prior to surgery
Balbi et al., 2001537290% (77.5 to 100%)74% (61.8 to 86.2%)31%18 patients with “clearly benign masses” and 2 with “clearly malignant” excluded; clinical impression
Finkler et al., 19885610643.2% (27.3 to 59.2%)90.8% (83.7 to 97.8%)36%“Clinical impression” included exam plus history; results not calculated for exam alone
Premenopausal: 16.7% (0 to 33.9%)Premenopausal: 92.3% (85.1 to 99.6%)Premenopausal: 26%
Postmenopausal: 68.4% (47.5 to 89.3%)Postmenopausal: 84.6% (65.0 to 100%)Postmenopausal: 59%
Schutter et al., 19986315591.5% (84.4 to 98.6%)73.9% (64.9 to 82.9%)39%All postmenopausal; high prevalence of cancer; single examiner; inclusion/exclusion criteria not described
Schutter et al., 19946222292.6% (87.4 to 97.9%)63.0% (54.6 to 71.4%)43%Preoperative patients
Dowd et al., 19935522551.0% (41.7 to 60.3%)87.0% (80.8 to 93.2%)49%Preoperative patients
Premenopausal: 31%Premenopausal: 95%
Postmenopausal 59%Postmenopausal: 75%

Abbreviations: CA-125 = cancer antigen 125; CI = confidence interval

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig5130.jpg.

   Figure 5. Performance of bimanual pelvic exam for distinguishing benign from malignant adnexal masses

Key to Figure 5: 1 = Grover and Quinn, 1995;57 2 = Adonakis et al., 1996;51 3 = Jacobs et al., 1988;58 4 = Dowd et al., 1993;55 5 = Schutter et al., 1994;62 6 = Finkler et al., 1988;56 7 = Balbi et al., 2001;53 8 = Buckshee et al., 199854

Table 8 and Figure 5 show the test characteristics for discriminating benign from malignant masses. Using a random-effects model, pooled sensitivity was 0.72 (95% CI, 0.49 to 0.88) and specificity was 0.92 (0.80 to 0.97). When only the three screening studies were included, pooled sensitivity was 0.58 (95% CI, 0.21 to 0.88), pooled specificity 0.98 (0.97 to 0.98).

For both types of studies, there appears to be a trend towards decreased specificity as prevalence increases, although the number of studies is small and the confidence intervals are wide. The extreme differences in sensitivity in the two largest studies (0 and 100 percent) prevent even a qualitative assessment of any relationship between prevalence and sensitivity.

The two studies that stratified results by menopausal status55, 56 found lower sensitivity and higher specificity for discriminating benign from malignant masses in premenopausal women compared to postmenopausal women (Table 8).

Discussion

Despite the common recommendation for routine pelvic examination, we found surprisingly little literature on its accuracy. Based on the literature we did identify, its sensitivity for detecting adnexal masses appears fairly low. Sensitivity for detecting normal adnexa is also low, as demonstrated in a recent study of examinations under anesthesia.64 Although sensitivity for distinguishing a malignant mass from a benign one is somewhat better, these results need to be interpreted with caution, since most of the studies were done in preoperative patients, who would already have a higher probability of having a malignancy. In the four large screening studies, there was a total of only five malignancies, with the bimanual detecting 0 percent, 66 percent, and 100 percent in the three individual studies where ovarian cancer was detected; the fourth had one case of a low malignant potential tumor and two endometrial cancers. Pooled sensitivity for the three screening studies that addressed discrimination between benign and malignant masses was considerably lower than for all studies combined (and was similar to the pooled sensitivity of the studies that examined the ability to detect any adnexal mass).

Both types of studies show a trend toward decreased specificity as the prevalence of abnormality increases - this may reflect a greater degree of suspicion on the part of the examiner, based on other findings, and a greater likelihood of calling an examination abnormal. This is supported by the finding of the two studies which stratified results by menopausal status, which found higher sensitivity and lower specificity in postmenopausal women compared to premenopausal women.55, 56 Because examiners were unblinded, and were likely aware of the higher prevalence of malignancy among postmenopausal women, they may have been more likely to assign a diagnosis of malignancy among those patients. Future studies need to pay stricter attention to blinding examiners to other information. In theory, this bias should also result in higher sensitivity as prevalence increases, although, because of the small number of studies, the small numbers of subjects in most studies, and the diametrically opposed findings of the two largest studies, we were unable to recognize any relationship.

In the two studies that addressed the effect of experience on test characteristics,60, 61 there appeared to be a relationship between increasing experience and increased sensitivity (specificity did not change); however, even attending physicians achieved a sensitivity of only 28 percent. Based upon the available literature, the bimanual examination does not appear to be a sensitive test for detecting the presence of adnexal masses and appears to have limited ability to discriminate benign from malignant masses. Although specificity was somewhat better, positive predictive values will still be quite low in low prevalence settings, as discussed under Question 7. This will, in turn, lower the positive predictive value of diagnostic tests performed in patients referred on the basis of a pelvic examination. These tests are discussed in detail in the next section.

Question 3: Single Modality Tests

Question 3 is: Among women with a palpable adnexal mass on exam or a mass identified by ultrasound/imaging, what is the sensitivity/specificity of various evaluation modalities including ultrasound (transvaginal ultrasound [TVUS], transabdominal ultrasound, color Doppler, two-dimensional [2D] versus three-dimensional [3D] ultrasound), computer tomography (CT) scan, magnetic resonance imaging (MRI) scan, and CA-125 levels for distinguishing benign from malignant masses?

Approach

This section considers the various evaluation modalities that are described in the literature and would be available to a clinician to aid in the work-up of an adnexal mass after it has been diagnosed. We focused our search on articles whose primary reference standard was histopathology. Ideally this reference standard would be applied to all test negatives. However, we accepted a repeat negative test (such as imaging) conducted at least 6 months later as an acceptable alternative. We did include some studies that were from population-based screening samples, and these will be considered in a separate section below. The evaluation modalities investigated can be divided into several general categories. Imaging studies will be divided by technological mode (ultrasound, MRI, etc.). Ultrasound studies will be divided into those that evaluate adnexal morphology (either by an explicit scoring system or by descriptive standards), those that measure vascular flow in the mass (Doppler), and those that evaluate these modalities in combination. Serum studies will focus primarily on CA-125, as this is the most common marker in both the literature and in clinical practice. However, other serum markers will be discussed as well. Finally, the studies for which it was possible to stratify by menopausal status will be discussed where appropriate.

Results of Literature Search and Screening

Two hundred and five articles were identified for abstraction. Of these, 153 met the inclusion criteria and were abstracted into Evidence Table 2 (Appendix D *).31, 3336, 39, 4244, 46, 47, 4956, 58, 62, 63, 65195

Ultrasound Morphology

Conventional grey scale ultrasonography is the most common imaging modality used to differentiate benign from malignant adnexal masses. Especially with the advent of high-frequency transvaginal probes, the quality of the images allows description of the gross anatomic features of the lesion. This is, however, limited by the great variability of macroscopic characteristics of both benign and malignant masses. Furthermore, the technique is operator dependent. To overcome these limitations, morphologic scoring systems have been developed. Such scoring systems are based on specific ultrasound parameters each with several scores according to determined features and with a cutoff value to categorize masses as either malignant or benign.

Table 9

Detailed description of ultrasound scoring systems
Scoring systemScore
Sassone et al., 1991159
Morphology12345
Inner wall structureSmoothIrregularities ≤ 3 mmPapillarities > 3 mmNot applicable, mostly solid-
Wall thickness (mm)Thin (≤ 3)Thick (> 3)Not applicable, mostly solid--
Septa (mm)NoneThin (≤ 3)Thick (> 3)-
EchogenicitySonolucentLow echogenicityLow echogenicity with ochogenic core; mixed echogenicity-High echogenicity
DePriest et al., 199336
Morphology01234
Cystic wall structureSmooth (< 3 mm thick)Smooth (> 3 mm thick)Papillary projection (< 3 mm)Papillary projection (≥ 3 mm)Predominately solid
Volume (cm3)< 1010–50> 50–200> 200–500> 500
Septum structureNo septaThin septa (< 3 mm)Thick septa (3 mm to 1 cm)Solid area (≥ 1 cm)Predominately solid
Ferrazzi et al., 199793
Morphology12345
Wall≤ 3 mm> 3 mm-Irregular, mostly solidIrregular, not applicable
SeptaNone≤ 3 mm> 3 mm
VegetationsNone--≤ 3 mm> 3 mm
EchogenicitySonolucentLow echogenicity-With echogenic areasWith heterogeneous echogenic areas, solid
Lerner et al., 1994131
Morphology0123
Wall structureSmooth or small irregularities < 3 mm-Solid or not applicablePapillarities ≥ 3 mm
ShadowingYesNo--
SeptaNone or thin (< 3 mm)Thick (≥ 3 mm)--
EchogenicitySonolucent or low-level echo or echogenic core--Mixed or high
Table 9 describes the details of the most commonly used scoring systems. Briefly, the following scores are suggestive of malignancy: Sassone159 greater than 9, DePriest36 greater than or equal to 5, Ferrazzi93 greater than 9, and for Lerner131 greater than or equal to 3. Although the development of all the scoring systems was motivated to improve the reproducibility of morphological measurements, only the scoring system by Lerner et al. based the categories on a multivariate logistic analysis.

Reproducibility of tests. Timmerman et al.196 evaluated the subjective assessment of ultrasonographic images for discriminating between malignant and benign masses. Three hundred consecutive patients were evaluated with TVUS by six different operators, and both diagnostic accuracy and interassessor agreement were calculated. The operators had varied experience in TVUS - from approximately 300 to 15,000 scans. The two most experienced operators agreed 92 percent of the time. The accuracy of the least experienced operators ranged from 82 to 87 percent (p = 0.0001). Overall, 65 percent of all the masses were correctly classified by all six operators. Interassessor agreement was greater between the most experienced operators as well (kappa = 0.852). When comparing experienced with less experienced operators, the kappa ranged from 0.581 to 0.737. This is similar to the kappa reported by Yamashita et al.192 among five operators, 0.62 (± 0.02) with TVUS. Interassessor agreement was not calculated between the less experienced operators. None of the included articles described operator experience, and only a few addressed interobserver variability. Although operator experience appears to correlate with accuracy, the specialty training of the unltrasonographer does not. In a meta-analysis of both morphologic and color Doppler tests in the evaluation of adnexal masses, Kinkel et al.197 found no difference between radiologists and gynecologists in the performance of ultrasound.

TVUS versus abdominal ultrasound. Of the 122 articles that evaluated adnexal masses via ultrasound (through either ultrasound morphology or Doppler measurements), only five articles exclusively used transabdominal imaging.52, 58, 116, 133, 198 Fifty-nine articles used TVUS exclusively and 51 used a combination of TVUS and abdominal ultrasound. There were seven articles for which the ultrasound modality was unknown. In the majority of the articles that used a combination of TVUS and abdominal ultrasound, TVUS was the “method of choice.” The most common reasons cited for also including abdominal ultrasound were patient refusal of transvaginal scans, virginity, poor image quality, and very large masses. Although a few articles reported how many women had which type of ultrasound, none of the articles reported their results such as to permit a stratification by TVUS or abdominal ultrasound. We therefore elected to group all ultrasound studies together regardless of TVUS or abdominal imaging.

Table 10

Sensitivity and specificity of ultrasound morphology
Scoring systemPooled sensitivity (95% CI)Pooled specificity (95% CI)Range of sensitivity in individual studiesRange of specificity in individual studiesReferences
Sassone0.86 (0.79 to 0.91)0.77 (0.73 to 0.81)0.65 to 1.000.65 to 0.9343,54,68,69,83,93,130,131,154,159,160,163,179,193,199
DePriest0.91 (0.84 to 0.95)0.68 (0.49 to 0.82)0.88 to 1.000.40 to 0.8135,36,69,83,93,115
Ferrazzi0.87 (0.80 to 0.92)0.81 (0.62 to 0.91)0.84 to 0.870.67 to 0.88697593
Finkler0.82 (0.65 to 0.91)0.78 (0.59 to 0.91)0.52 to 0.880.55 to 0.7056,62,63
Other (note: significant heterogeneity in criteria used for diagnosis - see ROC curve)0.86 (0.82 to 0.89)0.83 (0.76 to 0.88)0.43 to 1.000.29 to 1.0033,34,39,42,43,67,69,74,7680,87,90,95,97,101,102,104,106,108,112,117,118,122,124127,133135,138140,142,144,146,147,155,161,166,168,169,171,180,181,185,187,188,192,195

Abbreviations: CI = confidence interval; ROC = receiver operating characteristic

Trials identified. We identified 69 articles comprising 73 ultrasound morphology assessments. Despite the availability of published scoring systems, most of the studies based their diagnoses on either descriptive assessments of adnexal masses or used a modified or unique scoring system. Only 13 studies explicitly used Sassone's criteria, six used DePriest's, and three used Ferrazzi's, Finkler's, Lerner's, and Valentin's respectively. When a scoring system other than an established criterion was used, it was not always clear how it had been developed or modified. Details of the tests and their evaluative performance are provided in Table 10. Assessments of adnexal morphology by ultrasound which were either a unique or modified or unclear scoring system are labeled “other” with a brief description when possible. It is also important to note that not all of the established scoring systems were employed using the original cutpoints. For example, Caruso et al.83 and Itakure et al.115 both used a cutpoint of > 7 for the DePriest scoring system, where the original description used ≥ 5.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig6130.jpg.

   Figure 6. Performance of ultrasound scoring according to Sassone's criteria (1991)

Key to Figure 6: 1 = Lerner et al., 1994;131 2 = Ferrazzi et al., 1997;93 3 = Sawicki et al., 2001;160 4 = Rehn et al., 1996;154 5 = Sassone et al., 1991;159 6 = Caruso et al., 1996;83 7 = Leeners et al., 1996;130 8 = Alcazar and Lopez-Garcia, 2001;68 9 = Alcazar et al., 2003;69 10 = Timor-Tritsch et al., 1993;179 11 = Zanetta et al., 1994;193 12 = Alcazar et al., 1996;199 13 = Schneider et al., 1993;43 14 = Buckshee et al., 1998;54 15 = Sengoku et al., 1994163

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig7130.jpg.

   Figure 7. Performance of ultrasound scoring according to DePriest's criteria (1993)

Key to Figure 7: 1 = Ferrazi et al., 1997;93 2 = Caruso et al., 1996;83 3 = DePriest et al., 1993;35 4 = Alcazar et al., 2003;69 5 = Itakura et al., 2003;115 6 = DePriest et al., 199336

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig8130.jpg.

   Figure 8. Performance of ultrasound scoring according to Ferrazzi's criteria (1997)

Key to Figure 8: 1 = Ferrazzi et al., 1997;93 2 = Berlanda et al., 2002;75 3 = Alcazar et al., 200369

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig9130.jpg.

   Figure 9. Performance of ultrasound scoring according to Finkler's criteria (1988)

Key to Figure 9: 1 = Schutter et al., 1994;62 2 = Schutter et al., 1998;63 3 = Finkler et al., 198856

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig10130.jpg.

   Figure 10. Performance of ultrasound scoring according to various other unvalidated criteria

Key to Figure 10: 1 = DePriest et al., 1997;34 2 = Marchetti et al., 2002;140 3 = Tailor et al., 2003;171 4 = Ekerhovd et al., 2001;90 5 = Canis et al., 1997;80 6 = Wakahara et al., 2001;187 7 = Maggino et al., 1994;135 8 = Schelling et al., 2000;161 9 = Roman et al., 1997;42 10 = Brown et al., 1998;77 11 = Granberg et al., 1990;101 12 = Hermann et al., 1987;108 13 = Kurjak and Predanic, 1992;125 14 = Tingulstad et al., 1996;180 15 = Stein et al., 1995;168 16 = Torres et al., 2002;181 17 = Manjunath et al., 2001;139 18 = Ma et al., 2003;134 19 = Valentin et al., 2001;185 20 = Franchi et al., 1995;95 21 = Merce et al., 1998;146 22 = Davies et al., 1993;87 23 = Morgante et al., 1999;147 24 = Benjapibal et al., 2003;74 25 = Gadducci et al., 1988;97 26 = Buy et al., 1996;79 27 = Strigini et al., 1996;169 28 = Luxman et al., 1991;133 29 = Kurjak et al., 1994;127 30 = Huber et al., 2002;112 21 = Reles et al., 1997;155 32 = Mancuso et al., 2004;138 33 = Kurjak et al., 2000;124 34 = Alcazar et al., 2003;69 35 = Kurjak et al., 1992;126 36 = Komatsu et al., 1996;122 37 = Yamashita et al., 1995;192 38 = Sohaib et al., 2005166 39 = Cohen et al., 2001;33 40 = Medl et al., 1995;144 41 = Hata et al., 1992;106 42 = Schneider et al., 1993;43 43 = Weiner et al., 1992;188 44 = Jain = 1994117 45 = Buist et al., 199478 46 = Alcazar et al., 200367 47 = Lin et al., 199339 48 = Jain et al., 1993;118 = Bromley et al., 1994;76 50 = Zimmer et al., 2003195

Results. The results of pooled sensitivity and specificity using a random-effects model, along with the range of sensitivity and specificity reported in individual studies, are shown in Table 10. Included studies are shown in Figures 610. There was a great range in test results, especially in the studies not using established scoring systems. This most likely reflects the heterogeneity of the tests themselves. There was little concrete difference among the established scoring systems. Overall the tests achieved relatively higher levels of sensitivity and negative predictive value (NPV) in the diagnosis of malignancy than specificity or positive predictive value (PPV). With the exception of four studies, the NPV was above 0.80, with the majority of tests above 0.90. The PPV in the majority of studies was below 0.50. In general, there was a trade-off between sensitivity and specificity, both in the individual studies of a specific scoring system, and in pooled results of all studies of a scoring system - as sensitivity increases, specificity decreases.

Comparing the figures, studies using the Sassone criteria show greater variability in sensitivity compared to variability in specificity (Figure 6), while those using the DePriest criteria (Figure 7) show greater variability in specificity and a relatively narrow range of sensitivity. Figure 10, which depicts a variety of other studies, suggests trade-offs between sensitivity and specificity; different morphology methods for discriminating benign from malignant have different thresholds, resulting in the sensitivity/specificity trade-off.

Three articles compared different scoring systems within the same study population. Caruso et al.83 examined 112 women with adnexal masses comparing Sassone, DePriest, and Valentin scores. All performed similarly, displaying a sensitivity and NPV of 1.00, a range of specificity of 0.61 to 0.75, and a range of PPV of 0.35 to 0.48. Alcazar et al.69 also compared the performance of Sassone, DePriest, and Ferrazzi. There were no significant differences between these scoring systems when receiver operating characteristic (ROC) curves were compared. The area under the curve (AUC) was 0.89 for Sassone, 0.92 for DePriest, and 0.90 for Ferrazzi. Ferrazzi et al.93 evaluated 261 masses collected in three different centers. They compared ROC curves for scores based on Sassone, Granberg, DePriest, and Lerner's criteria and compared it with a scoring system they developed. The AUC ranged from 0.72 to 0.75 for the previously established systems. Their new scoring system (Ferrazzi) performed better, with an AUC of 0.84 (p < 0.0001). However, subsequent comparisons have not reaffirmed its superior functioning. When the Ferrazzi scoring system was compared to both Sassone and DePriest,69 its performance was almost identical.

In spite of different designs, all the scoring systems performed similarly when compared within the same study population. It has been suggested that the poor performance of scoring systems with regard to their PPV is due to the misclassification of dermoid tumors.197 Dermoids share many of the features that are characterized as “malignant” in scoring systems. The Alcazar study proposes a scoring system that was developed in part to correct this. Although this scoring system does perform well in its initial application, it has not been independently verified. The authors conclude, “a completely reliable differentiation of malignant masses cannot be obtained by sonographic imaging alone.”69

Table 11

Ultrasound morphology assessment comparing pre- and postmenopausal status
StudyScoring SystemPremenopausalPostmenopausal
SensSpecPPVNPVSensSpecPPVNPV
Finkler et al., 198856Finkler0.500.960.500.770.780.920.940.75
Franchi et al., 199595Descriptive0.730.860.440.950.890.750.820.83
Guerriero et al., 2002105Descriptive0.980.890.441.001.000.510.521.00
Reles et al., 1997155Modified score1.000.790.461.000.870.890.770.94
Roman et al., 199742Descriptive0.930.920.660.990.810.620.540.86
Schelling et al., 2000161Descriptive0.910.840.290.991.000.730.621.00
Alcazar et al., 200369Sassone1.000.880.501.000.610.880.810.73
DePriest1.000.800.381.001.000.820.821.00
Ferrazzi1.000.840.431.000.820.820.790.85
Alcazar1.000.960.751.001.000.940.931.00
Menon et al., 2000145Descriptive----1.000.940.241.00
Schutter et al., 199462Finkler----0.880.640.650.88
Bromley et al., 199476Unique scoring----0.910.520.520.92
Schutter et al., 199863Finkler----0.860.700.650.89
Luxman et al., 1991133Descriptive----0.930.550.450.95
Kuriak et al., 1992126Unique scoring----0.480.980.930.78

Abbreviations: NPV = negative predictive value; PPV = positive predictive value; Sens = sensitivity; Spec = specificity

Stratification by menopausal status. Of the 69 articles identified that addressed the assessment of adnexal morphology by ultrasound, only 13 contained data that either directly reported test characteristics by menopausal status or contained enough information to enable the stratification of results. Six were studies in a 100 percent postmenopausal patient population. Seven were studies that allowed comparison by menopausal status within the study population. They are presented in Table 11. The only significant difference in test performance appears to be in regards to the PPV. With the exception of Roman et al.,42 the PPV is slightly higher in postmenopausal women. This likely reflects the higher prevalence of ovarian malignancy after menopause. Aside from the PPV, the performance of ultrasound in the morphological assessment of adnexal masses does not appear to be significantly changed by menopausal status.

Ultrasound Doppler Studies

Color Doppler scanning allows the assessment of tumor vascularity. Malignant neoplasms have active blood vessel creation (angiogenesis) compared to normal or benign neoplasms due, in part, to their increased metabolic activity. Overall, malignancies display an increased vascularity with decreased peripheral blood flow resistance and increased blood flow velocity compared with benign tissue.152, 200 Doppler signal analysis can separate high-resistance and low-resistance vessels and has therefore been investigated as a separate test modality, as well as in combination with ultrasound morphological evaluation in the evaluation of adnexal masses.

The most common flow criteria are the resistance index (RI), the pulsatility index (PI), and the maximum systolic velocity. RI is defined as the difference between peak systolic and maximum enddiastolic flow velocity, divided by peak systolic flow velocity. Usually the lowest measured RI from a series of measurements is reported from different arteries. PI is defined as the difference between peak systolic and enddiastolic flow velocity, divided by the time-averaged flow velocity. The maximum systolic velocity is the maximum flow recorded in any visualized artery.

In order to make a measurement of either RI or PI or maximum systolic velocity, an artery must be identified on ultrasound. The inability to identify an artery in the mass means that the test cannot be performed. Therefore, not every individual included in the study population is captured with the assessment of these color Doppler modalities. Another limitation of these measurements is that the range observed in malignant masses overlaps with that observed in benign masses. For example, in Lin et al.,132 discussed in more detail below, the RI for malignant masses ranged from 0.23 to 0.82. Although they did not report a range for the benign masses, there were eight benign tumors with a RI < 0.4. This overlap limits the effectiveness of any threshold and, perhaps, contributes to the different thresholds reported in the literature.

Reproducibility of tests. Timmerman et al.196 (discussed above under ultrasound morphology) included Doppler measurements in its analysis of interobserver variability and experience. In short, operators with more experience (300 versus 15,000 scans) had greater accuracy (92 percent versus 82 to 87 percent, p = 0.0001). Interassessor agreement was also greater between the most experienced operators (kappa = 0.852) compared with the less experienced operators (range 0.581 to 0.737). None of the articles evaluating color Doppler described operator experience, nor did any address interobserver variability specifically in regards to Doppler measurement.

Trials identified. Fifty-six articles were identified that described color Doppler analysis, comprising a description of 65 tests. Thirty-two articles evaluated RI, 20 PI, and six the maximum systolic velocity. These are the most common flow criteria measured in the literature and presumably in clinical practice as well. Other Doppler parameters were described in the literature sometimes in conjunction with either RI or PI or maximum systolic velocity but were not included in this table. The other articles included 10 that involved the visualization of flow within the mass,70, 71, 104, 105, 119, 137, 160, 161, 168, 182 two that involved counting the total number of arteries (either > 4152 or > 3199), and one that measured the absence of a diastolic notch.137

Table 12

Sensitivity and specificity of Doppler studies
Doppler methodPooled sensitivity (95% CI)Pooled specificity (95% CI)Range of sensitivity in individual studiesRange of specificity in individual studiesReferences
Resistance index0.76 (0.68 to 0.73)0.89 (0.84 to 0.92)0.19 to 1.000.53 to 1.0043,68,70,75,76,79,81,86,88,95,106,107,117,124126,128,130,132,141,146,152,168,172,175,176,179,184,190,193,199,201,
Pulsatility index0.79 (0.73 to 0.83)0.74 (0.64 to 0.81)0.57 to 0.950.32 to 0.9773,79,81,94,103,109,115,120,154,155,158,163,168,169,179,182,184,188,199,201
Maximum systolic velocity0.76 (0.61 to 0.86)0.83 (0.66 to 0.93)0.48 to 0.940.43 to 0.9768,79,107,109,152,199
Results. Table 12 details the test characteristics of RI, PI, and the maximum systolic velocity in the evaluation of an adnexal mass, again using pooled values from a random-effects model. For RI the range reported was from ≤ 0.8 to < 0.4, with < 0.4 being the most common. For PI the range was relatively narrower from < 1.5 to < 1.0 with the majority of studies using either ≤ 1.0 or < 1.0. The reported range was greatest in the assessment of maximum systolic velocity, where there were also the fewest studies from > 30 cm/second to > 10 cm/second. As the threshold for RI decreases from ≤ 0.8 to < 0.4, the sensitivity and NPV decrease, and the specificity and PPV increase. This is seen most clearly in studies that evaluated a series of RI cutpoints with the same study population.132, 176

Lin et al.132 evaluated 370 women with adnexal masses who were scheduled for surgery at a single institution. They reported outcomes based on RI cutpoints of 0.4, 0.5, and 0.6. For RI < 0.4, the sensitivity, specificity, PPV, and NPV were 0.69, 0.97, 0.89, and 0.91, respectively. For RI < 0.5, they were 0.79, 0.92, 0.77, and 0.93. And for < 0.6, they were 0.91, 0.86, 0.68, and 0.98. The authors conclude that the 0.4 cutpoint yields the highest concordance rate between Doppler prediction and histopathologic diagnosis. This conclusion, however, is based more on clinical impression, as ROC curve analysis was not performed.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig11130.jpg.

   Figure 11. Performance of Doppler ultrasound resistance index

Key to Figure 11: 1 = Kurjak et al., 1991;128 2 = Wu et al., 1994;190 3 = Lin et al., 1993;132 4 = DePriest et al., 1994;88 5 = Prompeler et al., 1996;152 6 = Tepper et al., 1995;176 7 = Kurjak and Predanic, 1992;125 8 = Valentin, 2000;184 9 = Stein et al., 1995;168 10 = Anandakumar et al., 1996;70 11 = Valentin, 1996201 12 = Franchi et al., 1995;95 13 = Merce et al., 1998;146 14 = Carter et al., 1995;81 15 = Takac, 1998;172 16 = Buy et al., 1996;79 17 = Leeners et al., 1996;130 18 = Chou et al., 1994;86 19 = Hata et al., 1995;107 20 = Marret et al., 2004;141 21 = Kurjak et al., 2000;124 22 = Kurjak et al., 1992;126 23 = Timor-Tritsch et al., 1993;179 24 = Zanetta et al., 1994;193 25 = Tekay and Jouppila, 1992;175 26 = Alcazar et al., 1996;199 27 = Hata et al., 1992;106 28 = Schneider et al., 1993;43 29 = Berland et al., 2002;75 30 = Alcazar and Lopez-Garcia, 2001;68 31 = Jain, 1994;117 32 = Bromley et al., 199476

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig12130.jpg.

   Figure 12. Performance of Doppler ultrasound pulsatility index

Key to Figure 12: 1 = Rehn et al., 1996;154 2 = Guerriero et al., 1998;103 3 = Valentin, 2000;184 4 = Stein et al., 1995;168 5 = Itakure et al., 2003;115 6 = Valentin, 1999;201 7 = Valentin, 1997182 8 = Carter et al., 199581 9 = Buy et al., 1996;79 10 = Benjapibal et al., 2002;73 11 = Kawai et al., 1994;120 12 = Strigini et al., 1996;169 13 = Hillaby et al., 2004;109 14 = Salem et al., 1994;158 15 = Timor-Tritsch et al., 1993;179 16 = Reles et al., 1997;155 17 = Alcazar et al., 1996;199 18 = Fleischer et al., 1992;94 19 = Weiner et al., 1992;188 20 = Sengoku et al., 1994163

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig13130.jpg.

   Figure 13. Performance of Doppler ultrasound velocity indices

Key to Figure 13: 1 = Prompeler et al., 1996;152 2 = Buy et al., 1996;79 3 = Hillaby et al., 2004;109 4 = Alcazar et al., 1996;199 5 = Alcazar and Lopez-Garcia, 200168

The range of Doppler study performance is listed in Table 12 and shown in Figures 1113. Overall there was great heterogeneity of performance results. The range of sensitivity was largest for RI. This range did not appear to be secondary to differences in RI thresholds; however, the < 0.4 threshold did appear to narrow specificity results. In spite of the large variation in thresholds described for maximum systolic velocity, the range of test characteristics was somewhat narrower than that for RI, probably because there were fewer studies identified that used this measurement. Again, there is a trade-off between sensitivity and specificity, although this appears greatest for maximum velocity.

Table 13

Study characteristics of simple Doppler visualization
Study (N)TestSensitivitySpecificity
Prompeler et al., 1996152 (212)Total number of arteries > 4 (postmenopausal women only)0.820.92
Valentin, 1997182 (151)Color lakes visible on Doppler0.880.67
Maly et al., 1995137 (102)Demonstrable blood vessels0.950.30
Schelling et al., 2000161 (257)Central vascularity on Doppler in solid component0.930.94
Stein et al., 1995168 (170 masses)Internal flow within solid component or septation0.770.69
Guerriero et al., 2002105 (826 masses)Arterial flow visualized in an echogenic structure or irregular solid portion0.950.92
Anandakumar et al., 199670 (146)“Continuously fluctuating” vessels with turbulent flow0.770.68
Antonic and Rakar, 199571 (71)Color flow present0.890.47
Guerriero et al., 2005104 (424)Color flow present in “echogenic structure”1.000.91
Juhasz et al., 1990119 (147)Color flow present in mass0.960.84
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig14130.jpg.

   Figure 14. Performance of Doppler ultrasound for intratumoral blood flow

Key to Figure 14: 1 = Guerriero et al., 2005;104 2 = Schelling et al., 2000;161 3 = Prompeler et al., 1996;152 4 = Stein et al., 1995;168 5 = Valentin, 1997;182 6 = Maly et al., 1995;137 7 = Anandakumar et al., 1996;70 8 = Antonic and Rakar, 199571

Table 13 compares the characteristics of Doppler studies that did not use measurement or calculation of Doppler waveforms. They relied instead on either the presence of vascularity within the mass (yes/no) or on a direct count of vessels seen. These tests seem to perform as well as the RI or PI in terms of sensitivity, although specificity varies quite widely (Figure 13). Valentin182 measured both the PI (< 1.0) and the presence of color lakes visible on Doppler in the same study population. Of 151 patients, PI was measured in 135, indicating that for 16 individuals, no artery was visualized within the mass. The sensitivity reported for the PI was 0.83, specificity 0.34, PPV 0.20, and NPV 0.91. Simply documenting the presence or absence of visible color lakes on Doppler yielded a sensitivity of 0.88, a specificity of 0.67, a PPV of 0.33, and a NPV of 0.97. Not only did the direct visualization test perform better, but because its outcome was a simple binary outcome (present or absent), the results included the entire study population (n = 151). Prompeler et al.152 measured RI, maximum systolic velocity, as well as the number of arteries visualized in the mass. Their data for the simple counting of arteries also performs as well if not better than the calculated tests such as RI or PI. In a random-effects model, pooled sensitivity for the presence or absence of blood flow within a mass was 0.88 (95% CI, 0.80 to 0.92) and pooled specificity 0.78 (95% CI, 0.65 to 0.87)

Table 14

Doppler studies stratified by menopausal status
Study (N)TestPremenopausalPostmenopausal
SensSpecPPVNPVSensSpecPPVNPV
Franchi et al., 199595 (129)RI < 0.650.820.720.310.960.860.750.820.83
Guerriero et al., 2002105 (826 masses)Arterial flow visualized in echogenic structure or irregular solid portion0.940.960.671.000.960.770.690.97
Reles et al., 1997155 (98)PI ≤ 1.10.800.670.360.930.930.830.760.91
Schelling et al., 2000161 (257)Presence of central vascularization on Doppler0.910.940.530.990.930.920.840.97
Prompeler et al., 1996152 (212)Total number of arteries > 40.850.710.360.960.820.820.760.86
RI > 0.50.840.470.230.940.820.690.660.84
Maximum systolic velocity > 30cm/s0.920.650.330.980.760.880.820.84
Strigini et al., 1996169 (109)PI < 10.830.730.210.980.850.810.730.90
Salem et al., 1994158 (109 masses)PI < 11.000.840.201.000.730.710.470.88
Szpurek et al., 2004170 (464)Doppler subjective index ≥ 40.820.930.790.940.921.001.000.82
Kurjak et al., 1992126 (83)RI < 0.41----0.960.950.900.98
randomly separate vessels----0.900.980.960.95
Bromley et al., 199476 (33)RI < 0.6----0.660.810.670.81
Antonic and Rakar, 199571 (71)Presence of color flow1.000.360.111.000.870.790.810.85
Guerriero et al., 1998103 (192 masses)PI ≤ 10.860.460.080.980.880.520.660.81

Abbreviations: NPV = negative predictive value; PI = pulsatility index; PPV = positive predictive value; RI = resistance index; Sens = sensitivity; Spec = specificity

Stratification by menopausal status. Out of a total of 56 studies identified that evaluated color Doppler, only 11 contained data that either directly reported test characteristics by menopausal status or contained enough information to enable the stratification of results. Two of these studies were in a 100 percent postmenopausal population, and nine enabled comparison by menopausal status within the same study population (Table 14). When comparing test performance within the same study population stratified by menopausal status, the PPV of the test is significantly increased in the postmenopausal group. In Salem et al.,158 the PPV increased only from 0.20 in the premenopausal group to only 0.47 in the peri- and postmenopausal group. This may be a reflection of how they defined peri- and postmenopause (which was not clearly stated by the authors). After stratifying the reported results by age (> 45), the PPV is 0.73. This increase in PPV among postmenopausal women appears to be greater in the context of Doppler studies than that observed with ultrasound morphology. This finding differs from the one meta-analysis on the subject. Kinkel et al.197 did a systematic review of both ultrasound morphology and Doppler in the detection of malignant masses. Although they noted a difference in outcomes dependent on menopausal status, this difference did not reach statistical significance. Interestingly, there was a difference in terms of Doppler test performance by year of publication with better results demonstrated by earlier studies (p = 0.005), a result that was independent of sample size.

Combined Ultrasound Morphology and Doppler

A limiting feature of ultrasound morphologic assessments has been felt to be the high rate of false positive test results.196 Color Doppler, in contrast, has displayed a slightly higher PPV, especially in the earlier studies.197 There have, therefore, been attempts to combine ultrasound morphology and Doppler studies in a single test.

Trials identified. Of all the articles that investigated the use of either ultrasound morphology or color Doppler in the evaluation of an adnexal mass, nine articles containing a total of 13 tests described a combination ultrasound morphology and Doppler modality.65, 79, 91, 100, 123125, 130, 201

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig15130.jpg.

   Figure 15. Performance of combined ultrasound morphology and color Doppler

Key to Figure 15: 1 =Kurjak and Predanic, 1992;125 2 =Valentin, 1999;201 3 =Kurjak and Kupesic, 1999;123 4 =Buy et al., 1996;79 5 =Leeners et al., 1996;130 6 =Grab et al., 2000;100 7 =Fenchel et al., 2002;91 8 =Kurjak et al., 2000;124 9 = Alcazar and Castillo, 200565

Results. There is a large range in the reported study performance (sensitivity ranges from 0.71 to 0.98, specificity from 0.6 to 1.0. The relevant studies are shown in Figure 15; all but two had both sensitivity and specificity above 0.80. Pooled sensitivity in a random-effects model was 0.89 (95% CI, 0.81 to 0.93) and pooled specificity 0.91 (0.80 to 0.96). Both of these values were higher than the pooled values for any morphology or Doppler method alone.

Stratification by menopausal status. There were two studies that analyzed combined ultrasound morphology and Doppler in 100 percent post menopausal patient populations. Kurjak et al.126 reported a combined sensitivity, specificity, PPV, and NPV of 0.90, 0.94, 0.90, and 0.94, respectively. Their combined test consisted of RI < 0.41 and an ultrasound morphology scoring system unique to them. Veunto et al.186 in a population-based screening study reported a sensitivity, specificity, PPV, and NPV of 1.00, 0.83, 0.006, and 1.00, respectively. Given that these two studies are of greatly different design, it is hard to compare them directly. Comparing Kurjak et al. to the range of combined ultrasound and Dopper studies, it appears that in the postmenopausal group, the test has a better performance. However, this test performance may reflect patient selection criteria for the study that was not clearly explained. Combination modalities as a screening tool for ovarian cancer had a high false positive rate (as seen in the PPV of 0.006186).

3D Versus 2D Ultrasound

Table 15

3D versus 2D ultrasound
Study (number of persons)TestSensitivitySpecificityPPVNPV
Alcazar et al., 200367 (41 masses)2D0.900.610.680.88
3D1.000.780.811.00
Presence of one of the following fulfilled criteria for mass: > 3 mm wall, > 3 mm septum, > 3 mm papillary projections, solid areas or echogenicity
Kurjak and Kupesic, 1999123 (120)2D0.910.970.770.99
3D1.000.990.921.00
Both used a unique scoring system that included Doppler measurements
Kurjak et al., 2000124 (90)2D morphology0.670.940.550.96
2D Doppler0.890.950.670.99
2D combined0.890.980.800.99
3D morphology0.780.980.780.98
3D Doppler0.890.980.800.99
3D combined1.000.990.991.00
Both used a unique scoring system for morphological assessment. Doppler for 2D was RI ≤ 0.42, for 3D it was “complex” “chaotic” vessel arrangement
Alcazar and Castillo, 200565 (69 masses)2D0.980.880.940.96
3D0.980.790.900.95
Presence of at least one of the following fulfilled criteria for “complex mass”: >3mm wall, > 3 mm papillary projection, solid areas or purely solid echogenicity Doppler flow in mass also used in test but unclear how

Abbreviations: 2D = two-dimensional; 3D = three-dimensional; NPV = negative predictive value; PPV = positive predictive value

We identified five studies that analyzed 3D ultrasound. Four are listed in Table 15. The fifth, by Cohen et al.,33 was not included because it compared 2D ultrasound with 2D plus some component of a 3D exam (possibly 3D Doppler) that was not clearly stated in the article. Overall, 3D ultrasound appears superior to 2D especially in regards to sensitivity and PPV performance. We were unable to stratify these results by menopausal status. Test reliability and variability were not addressed specifically in terms of 3D ultrasound.

Other Imaging Modalities

Although ultrasound remains the most common imaging modality in the evaluation and diagnosis of adnexal masses, newer technologies such as MRI, CT, and positron emission tomography (PET) have been studied as well. These modalities may not be as readily available to the clinician as ultrasound, and there is less literature devoted to them than to ultrasound; however, they are included in this review because of growing interest both clinical and research in their use. Further, despite refinements in ultrasound morphology scoring systems or Doppler measurements, the overall performance of ultrasound in the evaluation of the adnexal mass may be relatively fixed by the technology itself. Therefore it is necessary to investigate other imaging modalities and see how they compare with ultrasound.

Reproducibility of tests. Unlike ultrasound, MRI, CT, and PET images are not operator dependent in terms of obtaining the images. There is, however, the potential for interobserver variability in their analysis. There are no standardized morphological scoring systems for any imaging modality other than ultrasound. We identified two articles that directly addressed the issue of test reproducibility for either MRI and/or CT in the evaluation of adnexal masses. Buist et al.,78 however, reported a series of 64 women who were evaluated by both MRI and CT and reviewed by two different radiologists. They reported a kappa value for the interobserver reliability for distinguishing between benign and malignant disease of 0.28 for CT and 0.41 for MRI. Yamashita et al.192 also calculated kappa values for interobserver variability among five radiologists. They showed far greater agreement: for precontrast MRI, kappa = 0.71 (± 0.02); for contrast-enhanced MRI, kappa = 0.73 (± 0.02).

Trials identified. We identified 17 articles comprising 22 tests. There were 15 articles for MRI, three for CT, and three for PET and one that used a combined CT/MRI test. There were two articles that investigated nuclear medicine technologies in the evaluation of adnexal masses. These, however, were not included in the review given the experimental nature of such tests at this time. The PET studies were all performed also using tracer 18-Fluorodeoxyglucose (FDG) with the test measuring uptake of FDG in the lesion.

Table 16

Sensitivity and specificity of other imaging modalities
Imaging modalityPooled sensitivity (95% CI)Pooled specificity (95% CI)Range of sensitivity in individual studiesRange of specificity in individual studiesReferences
MRI0.91 (0.86 to 0.94)0.87 (0.83 to 0.90)0.67 to 1.000.77 to 1.0044,78,91,100,106,111,112,118,121,122,129,144,156,166,192,
CT0.90 (0.83 to 0.94)0.75 (0.36 to 0.94)0.86 to 0.960.35 to 0.8939,78,129
FDG-PET0.67 (0.52 to 0.79)0.79 (0.70 to 0.85)0.58 to 0.780.76 to 1.0091,100,121

Abbreviations: CI = confidence interval; CT = computed tomography; FDG = 18-Fluorodeoxyglucose; MRI = magnetic resonance imaging; PET = positron emission tomography

Results. The results of MRI, CT, and PET modalities are summarized in Table 16. All of the articles describing CT and PET and most of the articles describing MRI either used descriptive criteria for differentiating malignant from benign appearing lesions or did not report the criteria used. Only two articles for MRI used a scoring system, slightly different from each other, which increases the difficulty in comparing studies. To date, there are no standardized scoring systems for any imaging modality other than ultrasound.

Table 17

Comparison of MRI, CT, FDG-PET, and ultrasound
Study (N)TestSensitivitySpecificityPPVNPV
Medl et al., 1995144 (73)Ultrasound morphology (descriptive)0.810.730.790.76
MRI descriptive0.970.830.880.96
Yamashita et al., 1995192 (72 women 80 masses)Ultrasound morphology (unique score)0.890.840.630.96
MRI precontrast0.780.930.790.93
MRI contrast enhanced0.910.930.810.97
Fenchel et al., 200291(99)Ultrasound combined morphology and Doppler0.920.600.240.98
MRI0.830.830.400.97
FDG-PET0.580.760.250.93
Jain et al., 1993118 (32)Ultrasound morphology (descriptive)1.000.600.181.00
MRI0.671.001.000.97
Kawahara et al., 2004121 (38)MRI descriptive0.910.870.910.87
FDG-PET0.781.001.000.75
Komatsu et al., 1996122 (82)Ultrasound morphology (unique score)1.000.460.571.00
MRI descriptive (n = 59)0.910.880.910.88
Lin et al., 199339 (80)Ultrasound morphology (descriptive)0.830.500.580.79
CT descriptive0.860.360.740.56
Buist et al., 199478 (64)CT reviewer a0.960.440.720.89
CT reviewer b0.890.830.890.83
MRI reviewer a0.960.330.680.86
MRI reviewer b0.960.940.960.94
Ultrasound morphology (NR)0.890.440.710.73
Grab et al., 2000100 (101)Ultrasound combination morphology and Doppler0.920.600.230.98
MRI descriptive0.830.840.420.97
FDG-PET0.580.800.280.93
Hata et al., 1992106 (63)Ultrasound (NR)0.850.690.680.86
MRI score0.670.970.950.80
Huber et al., 2002112 (93)Ultrasound morphology (descriptive)0.850.730.870.71
MRI descriptive0.890.860.930.79
Reuter et al., 1998156 (65)Ultrasound morphology (descriptive)1.000.660.401.00
MRI descriptive1.000.780.501.00
Sohaib et al., 2005166 (72)Ultrasound morphology (descriptive)1.000.400.531.00
MRI descriptive0.970.840.800.97

Abbreviations: CT = computed tomography; FDG = 18-Fluorodeoxyglucose; MRI = magnetic resonance imaging; NR = not reported; PET = positron emission tomography

The range of test performance of MRI, CT, and PET are shown in Table 16. Table 17 includes, for comparison, the test performance for ultrasound morphology, color Doppler (all the modalities), and ultrasound morphology and Doppler combined. Tian et al.177 was excluded from this table because there was no description how CT and MRI were combined for a single test result (in series versus in parallel). Overall the sensitivity for MRI, CT, and PET are similar to that of combined ultrasound morphology and Doppler and less heterogenous than either modality separate. The specificity, however, is equivalent to either test separate and wider than the tests combined, with the exception of FDG-PET. However, the comparatively narrow range of both CT and PET results could be secondary to the relatively few studies that use these modalities. There is a large range of results for PET PPVs and a small range for CT, again possibly reflecting the paucity of studies. The range of NPVs for MRI is comparable to that for combined ultrasound morphology and Doppler and better than either CT or PET. Overall MRI appears similar in performance to combined ultrasound. More research is needed to accurately assess the performance range of CT and PET.

Another way to compare imaging modalities is by looking at studies that compare imaging modalities within the same study population. These are listed in Table 17. There may be a small benefit in performance of MRI over ultrasound, especially in terms of PPV. There is no evidence to support the superiority of any single modality, although FDG-PET appears inferior to the rest.

Only two studies compared pre- and postcontrast enhancement with MRI.111, 192 Contrast enhancement improved evaluative performance in both studies, particularly sensitivity. In Hricak et al. the sensitivity increased from 0.87 to 0.95, specificity from 0.75 to 0.79, PPV from 0.78 to 0.83, and NPV 0.84 to 0.94.111 These results are similar to those of Yamashita et al.192 in Table 17.

Stratification by menopausal status. None of the studies describing MRI, CT, or PET reported results either by menopausal status or in data that would allow menopausal status to be stratified.

Serum Markers: CA-125

The concept of using tumor markers as either screening or diagnostic tests for ovarian cancer is dependent upon identifying an abnormal level of a particular marker in serum, reflecting a systemic effect of disease in the ovary. The most extensively investigated ovarian cancer associated antigen is CA-125. This antigen is recognized by a murine monoclonal antibody produced using an ovarian cancer cell line as an immunogen. Elevated levels are detected in approximately 80 percent of ovarian carcinomas at the time of diagnosis;136, 167 however, elevated serum levels have also been reported in a variety of benign conditions, potentially affecting specificity. In addition, CA-125 is not as commonly elevated in non-epithelial ovarian cancers. Because these stromal and germ cell tumors are proportionately more common in pre-menopausal women, the sensitivity of CA-125 may it is not as sensitive in premenopausal women.3

Reproducibility of tests. Only one study included specific information regarding the inter- and intra-assay coefficients of variation.66 They were < 7.5 percent and < 5.3 percent, respectively. The sensitivity of the assay in this study was < 5 U/ml.

Trials identified. We identified 66 studies that investigated the use of CA-125 as a serum marker in the evaluation of an adnexal mass. One study was a population-based screening study that employed CA-125 as part of the screening triage.51 Forty-six studies in total used 35 U/ml as a threshold - in 37 it was the only threshold used, whereas in five, both 35 U/ml and another threshold were reported for the same patient population. There were 24 studies that reported a threshold other than 35 U/ml ranging from >20 U/ml to >100 U/ml. In addition to the five studies that reported 35 U/ml and an additional level, there were four other studies that reported two threshold levels within the same study population. All but one of the studies were case series. Although there were a few studies that compared CA-125 results from operative cases with normal controls, only the data from the operative series were included in the 2-by-2 tables. The clinical presentation of the cases was rarely described. Some of the series were drawn from oncology clinics

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig16130.jpg.

   Figure 16. Performance of CA-125

Key to Figure 16: 1 = Adonakis et al., 1996;51 2 = Woolas et al., 1995;189 3 = Gadducci et al., 1992;98 4 = Wakahara et al., 2001;187 5 = Maggino et al., 1994;135 6 = Dowd et al., 1993;55 7 = Schutter et al., 2002;162 8 = Patsner and Mann, 1988;151 9 = Roman et al., 1997;42 10 = Schutter et al., 1994;62 11 = Gadducci et al., 1991;99 12 = Chen et al., 1988;85 13 = Vasilev et al., 1988;50 14 = Timmerman et al., 1999;178 15 = Hogdall et al., 2000;110 16 = Malkasian et al., 1988;136 17 = Torres et al., 2002;181 18 = Schutter et al., 1998;63 19 = Manjunath et al., 2001;139 20 = Troiano et al., 1997;47 21 = Chalas et al., 1992;31 22 = Mancuso et al., 2004;138 23 = Gadducci et al., 1988;97 24 = Finkler et al., 1988;56 25 = Tay and Chua, 1994;174 26 = Soper et al., 1990;167 27 = Smikle et al., 1995;46 28 = Hurteau et al., 1995;113 29 = Asif et al., 2004;72 30 = Einhorn et al., 1986;89 31 = Hillaby et al., 2004;109 32 = Alcazar et al., 1999;66 33 = Balbi et al., 2001;53 34 = Antoni and Rakar, 1995;71 35 = Hata et al., 1992;106 36 = O'Connell et al., 1987;148 37 = Schneider et al., 1993;43 38 = Weiner et al., 1992;188 39 = Tian et al., 2000;177 40 = Berlanda et al., 2002;75 41 = Sengoku et al., 1994163

Results. At the most commonly used threshold of 35 U/mL, the pooled sensitivity of CA-125 for discriminating benign from malignant lesions was 0.78 (95% CI, 0.75 to 0.81), and the pooled specificity 0.78 (95% CI, 0.71 to 0.82). Individual study sensitivities ranged from 0.45 to 1.0, and specificities from 0.46 to 0.99; see Figure 16, where the trade-off between sensitivity and specificity resulting from different thresholds is clearly seen. Not including the one screening study in this series,51 the studies ranged in size from 52 to 429 individuals. Unlike ultrasound morphology assessments, the range of CA-125 performance is not influenced by the heterogeneity of evaluative modalities. However, the results of performance have, overall, a similarly broad range. This most likely reflects heterogeneity of study populations. As very few studies actually reported how patients were diagnosed with masses, it is impossible to accurately stratify these results by patient characteristics. As with ultrasound measurements (both morphology and Doppler), the narrowest range of CA-125 test performance was with NPV, making this, perhaps, the most reliable part of the test itself.

The only screening study identified for CA-125 in our literature search51 included 2000 women. The sensitivity in this study was 1.00, specificity 0.99, PPV 0.17, and NPV 1.00. Few of the other studies achieved this degree of sensitivity, specificity, or NPV, although overall the PPV was higher. In the presence of an adnexal mass, the false negative rate increases compared with a screened population reflecting the fact that benign gynecologic disease can cause elevation of CA-125.

The most common threshold other than 35 U/ml was 65 U/ml. Most of the studies using 65 U/ml as a threshold were from Asia. The probable heterogeneity of study populations makes comparisons between these levels limited. Looking at the studies that reported results for different levels of CA-125 for within the same study population,87, 98, 134, 136, 147, 148, 162, 167, 180 in the higher threshold measurement, the specificity and PPV are higher, the sensitivity is lower, and the NPV is only slightly lower.

Table 18

CA-125 results stratified by menopausal status
StudyThresholdPremenopausalPostmenopausal
SensSpecPPVNPVSensSpecPPVNPV
Malkasian et al., 1988136> 1000.600.950.670.930.770.970.980.72
> 350.600.730.290.910.810.910.940.74
Gadducci et al., 199696> 650.670.910.670.910.801.001.000.69
Gadducci et al., 199298> 640.500.260.050.860.810.860.880.78
Franchi et al., 199595> 390.730.640.240.940.770.850.870.74
Patsner and Mann, 1988151> 350.630.780.660.760.770.810.850.72
Dowd et al., 199355> 350.740.730.600.840.860.820.900.76
Finkler et al., 198856> 350.500.690.350.810.840.920.940.80
Schutter et al., 199863> 350.690.840.730.81
Antonic and Rakar, 199571> 350.670.920.400.970.870.930.930.87

Abbreviations: CA-125 = cancer antigen 125; NPV = negative predictive value; PPV = positive predictive value; Sens = sensitivity; Spec = specificity

Stratification by menopausal status. Of the 59 studies we identified that examined CA-125, only nine contained data that either directly reported test characteristic by menopausal status or contained enough information to enable the stratification of results. One study was conducted exclusively in a postmenopausal population.63 The studies are listed in Table 18.

The incidence of ovarian cancer is higher in postmenopausal women relative to benign gynecologic conditions, which also increase CA-125 levels. This should translate into a greater accuracy of CA-125 test performance in this population. Indeed, all test parameters except NPV are both higher and the range narrower in postmenopausal women. The lowest PPV was 0.73, with the remaining above 0.85, which is significantly higher than the range of PPV observed in studies that did not stratify their results by menopausal status. The NPV is lower in the postmenopausal population, despite the higher sensitivity, because of a greater prevalence of cancer in this population. CA-125 is consistently more helpful in discriminating benign from malignant lesions in postmenopausal women compared with premenopausal women.

Other Serum Markers

The fact that CA-125 is < 35 U/ml in 20 percent of women with early stage ovarian cancer, has motivated research into other serum based tests. We identified 13 articles that described a total of 17 different sera studies in women with an adnexal mass. Some studies investigated the performance of other tumor-associated antigens such as tumor-associated glycoprotein 72 (TAG-72) or CA-19-9. Although most of the tumor-associated antigens achieved specificities of approximately 0.82 to 0.92, the sensitivity, PPV, and NPV were overall lower than those reported for CA-125. Two studies investigated carcinoembryonic antigen (CEA),114, 157 and although they employed slightly different thresholds, the sensitivity reported in both (0.16 and 0.22) are so poor as to lead both authors to conclude that assessment of CEA in the evaluation of an adnexal mass is not helpful. Roman et al.42 investigated whether the addition of human chorionic gonadotropin (hCG), alpha-fetoprotein (AFP), and lactate dehydrogenase (LDH) to CA-125 improved the test performance. In their series the sensitivity of CA-125 alone was 0.67, the specificity was 0.71, PPV 0.35, and NPV 0.90. The addition of the other three tests did not change the test results very much. The combined test (defined as any of the markers positive) sensitivity was 0.72, its specificity was 0.70, PPV 0.36, and NPV 0.94. AFP, hCG, and LDH do not appear to improve the diagnostic performance of CA-125.

Gadducci et al. investigated the role of D-Dimer in a series of 121 women with adnexal masses.96 The sensitivity for D-Dimer alone was 0.91, the specificity was 0.83, the PPV 0.82, and the NPV 0.92 - making D-Dimer one of the best performing tests identified in our review. Stratifying by menopausal status showed a greater performance in premenopausal women where the sensitivity, specificity, PPV and NPV were 1.00, 0.91, 0.75, and 1.00 respectively (n = 57). For postmenopausal women they were 0.89, 0.65, 0.85, and 0.72, respectively. Chalas et al. investigated the role of elevated platelets in 241 women.31 The specificity and PPV were similar to that reported for D-Dimer (0.84 and 0.83, respectively), but the sensitivity and NPV were significantly lower (0.56 and 0.59). These two studies are intriguing, but the results need to be established in future studies to better assess their possible contribution to the evaluation of adnexal masses.

Aside from D-Dimer, none of the studies contained information making stratification by menopausal status possible. In conclusion, none of the sera markers investigated in this review appears to perform better than CA-125, with the possible exception of D-Dimer in the premenopausal population.

Population-based Studies

Table 19

Population-based screening studies
StudyNTestSensitivitySpecificityPPVNPV
Marchetti et al., 20021404350Ultrasound screening: criteria
NR1.000.370.071.00
Operative cases only (n = 45)
Assuming all negatives were truly negative (n = 4359)1.000.960.011.00
Menon et al., 20001451027Ultrasound
Volume > 8.8 ml0.900.940.211.00
Abnormal morphology1.000.940.241.00
Complex morphology0.840.970.370.98
Vuento et al., 19951861364Combined ultrasound morphology and Doppler (PI < 1.0)1.000.880.0061.00
DePriest et al., 19933624/3220Ultrasound morphology (DePriest)1.000.710.331.00
Operative cases only (n = 24)
Kurjak et al., 199212683/1000RI < 0.410.960.950.900.98
Ultrasound morphology (unique score)0.480.980.930.78
Presence of random vessels0.900.980.960.95
Combined ultrasound and Doppler0.900.940.900.94
Kurjak et al., 199412732/5013Ultrasound “persistent mass”1.000.970.801.00
Ultrasound assuming all test negatives true negatives1.000.990.801.00
Kurjak et al., 1991128680/ 8620RI < 0.40.960.990.981.00
DePriest et al., 19973490/6470Ultrasound morphology (DePriest) (n = 90)1.000.590.171.00
Assuming all test negatives true negatives (n = 6470)0.860.990.071.00
Adonakis et al., 1996512000/ 2000CA-125 > 351.000.990.171.00
PE “palpable mass”0.670.970.031.00
Andolf et al., 199052801Combined ultrasound and BME (both positive for test to be positive)1.000.940.111.00
Ultrasound and BME criteria not well described
Jacobs et al., 1988581010CA-125 > 30 U/ml1.000.970.031.00
BME1.000.970.041.00
Ultrasound (ovarian volume > 8.8ml) (n = 58 for ultrasound)1.000.740.081.00
Tailor et al., 20031712500Ultrasound morphology (descriptive)0.860.970.071.00
N = 25001.000.990.211.00
Ultrasound for second screening episode (n = 998)1.000.990.251.00
Ultrasound for >= third screening episode (n = 733)
van Nagell et al., 20004914469Ultrasound (ovarian volume > 20 cm3 for premenopausal women, > 10 cm3 for postmenopausal women)0.810.990.091.00

Abbreviations: BME = bimanual examination; CA-125 = cancer antigen 125; NR = not reported; PE = pelvic examination; PI = pulsatility index

Almost all of the studies identified were case series. There were, however, 13 population-based screening studies included in this review. They are listed in Table 19. Although all of the women included in these studies did not have a diagnosis of an adnexal mass at the time of enrollment, these studies are included here because they highlight some important issues about test performance. The strongest studies from a methodological perspective were those by Marchetti et al.,140 Vuento et al.,186 DePriest et al.,34 Adonakis et al.,51 and Tailor et al.171 Marchetti, Vuento, Tailor and DePriest all used ultrasound as a screening modality. In all of these studies, the PPV was low, ranging between 0.006 to 0.07. Screening with CA-125 yielded a slightly higher PPV of 0.17.51 Tailor et al.171 offered followup screening within the same populations. In the first screening episode, which captured the total study population of 2,500 women, the test characteristics were similar to those reported in the other screening studies. The test characteristics improved, however, with subsequent screening. Women who had a negative screen were offered either a 12- or 6-month repeat ultrasound (depending on individual risk factors). Nine hundred and ninety-eight women received a second ultrasound screening. For this subset, the PPV improved to 0.21. For women screened greater than two times, the PPV was 0.25. However, not all women offered additional screening returned for the ultrasound. This potential bias was not discussed by the authors, and it is unclear how it may have influenced the performance of repeat screening. The three studies by Kurjak et al. each had various biases that could have accounted for their markedly different reported test performances. One did not report followup on test negatives and therefore included no false negative in the series,126 another study population was an undescribed subset of a larger still incomplete screening series,127 and the last study did not describe inclusion criteria.128 Van Nagell et al.49 screened 14,469 women with ultrasound. They reported their results 12 months from the time of screening. However they note that four women were diagnosed with cancer greater than 12 months after screening. These women had all screened negative and were included in their analysis as true negatives. Reclassifying these individuals as false negatives changes the sensitivity from 0.81 to 0.68.

Methodological Issues

In reviewing the literature on evaluation modalities, numerous methodological problems consistently reduced our ability to draw conclusions about the performance of various tests both individually and in comparison with each other. Some of these problems concerned study design, others related to statistical issues.

Patient population. With the exception of the 13 population-based screening studies, all of the articles were case series. Some were consecutive and others non-consecutive. Some were based on operative cases within a specific time frame at one or several institutions, whereas others were referral series, often located in oncology clinics. The path to diagnosis was almost never described, making it difficult to asses the generalizability of the results. Further, age was the only patient characteristic that was reliably documented. Other characteristics, such as family history, were almost never included. This has several implications. The overrepresentation of operative cases especially from academic facilities, likely overrepresents the prevalence of malignancy in the study populations when compared with the population of women with adnexal masses in general. It also exaggerates the performance of the evaluative modalities, especially in regards to sensitivity and PPV. Finally, it limits the generalizability of the evidence.

Table 20

Effect of classification of LMP tumors as malignant or benign on diagnostic test characteristics
StudyTestLMP classified as malignantLMP classified as benign
SensSpecPPVNPVSensSpecPPVNPV
Roman et al., 1998157CEA0.160.930.350.830.190.930.250.90
Wakahara et al., 2001187Ultrasound morphology0.820.820.650.920.860.780.540.95
CA-1250.450.860.740.630.770.610.370.90
Timmerman et al., 1999178CA-1250.800.820.630.910.770.790.560.91

Abbreviations: CA-125 = cancer antigen 125; CEA = carcinoembryonic antigen; LMP = low malignant potential (tumors); NPV = negative predictive value; PPV = positive predictive value; Sens = sensitivity; Spec = specificity

Definition of malignant. There was inconsistency between studies regarding whether the malignant classification included any malignancy or whether it included only ovarian malignancies. The inclusion of all malignancies would exaggerate the test's specificity and PPV at the expense of its sensitivity and NPV. From a practical standpoint, this difference may not be that problematic, as all malignancies are important. However, this classification bias increased the heterogeneity of test performance and limits generalizability. Finally, almost all of the articles that reported series containing tumors of low malignant potential (LMP) (also called borderline) classified these tumors as malignant. This changes the reported performance of the various evaluative modalities in these studies. There were three studies identified where stratification by LMP was possible. These are listed in Table 20. Classifying LMP tumors as malignant increases the specificity and PPV relative to classifying them as benign, while decreasing the specificity and NPV. Overall, PPV tended be somewhat low (even in populations with high prevalences of disease). The inclusion of LMP tumors into the malignant category inflated this measurement somewhat. Obviously, because of uncertainty about the natural history of LMP tumors, the most appropriate way of classifying them as part of diagnostic test evaluation is also uncertain. Given this uncertainty, ideally investigators would report results using alternative methods of classifying LMP tumors.

Variability in test criteria. Of the 69 articles that evaluated ultrasound morphology, only 31 used established scoring criteria; 38 used a novel method. This resulted in a great heterogeneity of tests for ultrasound morphology and contributed to the range in performance noted. Many of the studies employed purely descriptive analysis to arrive at a benign versus malignant diagnosis. This limits the reproducibility of those results. Many of the scoring systems and descriptive categories had never been independently verified, and the paucity of details regarding what constituted a positive test makes such verification impossible. In terms of ultrasound evaluation by color Doppler, there was also a range of reported thresholds. Some of the variability in test criteria reflects the limitations of ultrasound technology. However, such differences limited the comparability between studies.

Masses as numerator. While most studies examined persons as the unit of 2-by-2 analysis, there were many studies that analyzed their data by masses. Even though the number of persons in the study was usually reported, it was often impossible to reconfigure the 2-by-2 table to refer to persons not masses. This was especially true in the radiology literature. This influenced the comparability between studies.

Menopausal status. Most of the studies did describe the patient population in terms of age. We were able to calculate the proportion of menopausal patients in most studies. However, the results were rarely reported in a way that allowed stratification by menopausal status. Where stratification was possible, a difference in test performance was seen. The heterogeneity in test performances was magnified by the different proportions of pre- and postmenopause in the different study populations.

Sample size. Few studies discussed sample size issues, potentially leading to inappropriate conclusions, especially regarding comparability of test characteristics.

Failure to account for observer variability. No studies attempted to account for the effects of observer variation on the precision of estimates, although a few did calculate interobserver coefficients. For tests where the thresholds for normal and abnormal were based on either qualitative assessments (such as descriptions of ultrasound morphology) or quantitative measures (such as ultrasound morphology scores), this variability will have implications for the precision of sensitivity and specificity.

Prevalence and predictive value. We did not limit our analysis of test characteristics to studies from the United States. As the incidence of ovarian cancer is different in different countries, this influences the range of predictive values reported in the literature. Locations with low disease prevalence will have low PPVs compared with higher prevalence areas. The heterogeneity of study locations influenced the range of reported test characteristics and somewhat limits the comparability of the results.

Summary

Table 21

Pooled sensitivity and specificity estimates
Diagnostic TestPooled Sensitivity (95% CI)Pooled Specificity (95% CI)
ULTRASOUND: MORPHOLOGY
Scoring system: Sassone0.86 (0.79 to 0.91)0.77 (0.73 to 0.81)
Scoring system: DePriest0.91 (0.84 to 0.95)0.68 (0.49 to 0.82)
Scoring system: Ferrazzi0.87 (0.80 to 0.92)0.81 (0.62 to 0.91)
Scoring system: Finkler0.82 (0.65 to 0.91)0.78 (0.59 to 0.91)
Other0.86 (0.82 to 0.89)0.83 (0.76 to 0.88)
ULTRASOUND: DOPPLER
Resistive index0.72 (0.61 to 0.82)0.90 (0.84 to 0.94)
Pulsatility index0.80 (0.74 to 0.85)0.73 (0.62 to 0.81)
Maximum systolic velocity0.74 (0.56 to 0.86)0.81 (0.59 to 0.83)
Presence of vessels0.88 (0.80 to 0.92)0.78 (0.65 to 0.87)
MORPHOLOGY PLUS DOPPLER0.86 (0.79 to 0.91)0.91 (0.80 to 0.97)
MRI0.91 (0.86 to 0.94)0.87 (0.83 to 0.90)
CT0.90 (0.83 to 0.94)0.75 (0.36 to 0.94)
FDG-PET0.67 (0.52 to 0.79)0.79 (0.70 to 0.85)
CA-125 (threshold > 35)0.78 (0.75 to 0.81)0.78 (0.71 to 0.82)

Abbreviations: CA-125 = cancer antigen 125; CI = confidence interval; CT = computed tomography; FDG = 18-Fluorodeoxyglucose; MRI = magnetic resonance imaging; PET = positron emission tomography

Table 21 summarizes the pooled sensitivity and specificity estimates for CA-125 and the various imaging modalities.

The use of established scoring systems in the evaluation of an adnexal mass by ultrasound morphology appears to perform slightly better than simple descriptive assessment. However, there does not appear to be a benefit of one scoring system over another. Based on small numbers of studies, 3D ultrasound shows some improvement over 2D. Although the pooled sensitivity and specificity of MRI was the highest of any imaging modality, its performance was less consistent in studies where it was directly compared to other modalities such as CT and ultrasound.

Color Doppler assessment by RI, PI, and maximum systolic velocity are not superior to the more simple assessment of the presence or absence of arterial vessels within the mass. The efficacy of RI, PI, and maximum systolic velocity are hampered by the overlap in values of these measurements between benign and malignant masses.

Combined ultrasound morphology and color Doppler assessments have higher sensitivity and specificity compared to either alone. Although ultrasound morphologic evaluation by a gynecologist appears to be as reliable as that performed by a radiologist, there was no evidence of Doppler measurements done outside of the context of a radiology referral.

In postmenopausal women, an elevated CA-125 is useful for helping rule in ovarian cancer.

Qualitatively, there was a consistent trade-off across all tests between sensitivity and specificity.

The relatively low PPVs in all of the tests are particularly striking given that many of the included studies were done in preoperative patients; the likely “screening” done prior to a decision for surgery suggests that the PPV of a particular test in the initial evaluation of an adnexal mass is likely to be even lower.

Question 4: Explicit Scoring Systems

Question 4 is: What is the accuracy of explicit scoring systems which incorporate various combinations of imaging findings, patient risk factors, and/or CA-125 levels for detecting malignancy? Have these scoring systems been applied to a population of women before laparoscopy or laparotomy?

Approach

Explicit scoring systems were sought in the medical literature from among all studies of diagnostic assessment of adnexal or pelvic masses. We considered only scoring systems that combined data from more than one category of the following types of information: (1) imaging findings; (2) patient risk factors; and (3) laboratory data. Clinical prediction rules that utilized data entirely from only one category (for example, ultrasound based morphological indices56) are described as part of Question 3.

Imaging findings could include: (1) ultrasound based tests, such as transabdominal or transvaginal 2D ultrasound or Doppler ultrasound; (2) radiographic tests, such as CT; or (3) other imaging studies, such as MRI or PET scans.

Patient risk factors include menopausal status, age, or other risk factors.

Laboratory data was primarily CA-125, but we recorded data on other serum tumor markers as well.

Results of Literature Search and Screening

We identified 36 studies that met the inclusion criteria.42, 48, 5153, 55, 62, 63, 66, 72, 86, 87, 97, 103, 105, 116, 134, 135, 138, 139, 147, 169, 178, 180, 181, 185, 202211 These are described in Evidence Table 4 (Appendix D *).

Study Characteristics

Scoring systems identified. The scoring systems were of several types. The most common were models developed using statistical modeling techniques such as logistic regression (or artificial neural networks) to develop estimates for predicted probability of malignancy. Such estimates were then used to construct clinical prediction rules (e.g., the Risk of Malignancy Index [RMI], which calculates a numeric score based on CA-125 level multiplied by a menopausal score and an ultrasound morphology score) and decision thresholds (e.g., for RMI, the most common threshold is 200). Other scoring systems used simple combinations of criteria based on individual modalities, which were then combined using Boolean and or or (e.g. CA-125 > 65 U/ml and ultrasound morphology score > 10 points). Some models were validated in separate populations from the data set used to develop the scoring systems either described as part of its initial development, or in subsequent publications by the original developers or others.

Types of data incorporated. The most common scoring systems used ultrasound, CA-125 and menopausal status. Some type of ultrasound data was used in all 36 publications; studies varied with regard to the type of ultrasound technology that was used. All used 2D ultrasound to evaluate morphology, some using transabdominal and many using transvaginal probes. Studies that used Doppler ultrasound used a variety of parameters, including measures as simple as detection of flow, or as complex as specific indices derived from Doppler-measured flow rates, such as the RI or PI. Many described scoring rules based on combinations of features of morphology (Finkler score) or combined morphology and blood flow.

CA-125 was a component of the scoring system in 30 reports; other serum tumor markers included CA-72-4, incorporated into two reports,53, 63 and the markers AFP, LDH, and hCG, were used in one report.42 All studies that used these other serum markers also used CA-125.

Menopausal status was incorporated into scoring systems of 19 reports. The definition of menopausal status varied across studies, and in a few cases age was used as a proxy for clinically determined menopausal status. Three studies included only postmenopausal women,62, 63, 135 and thus could not use this variable in the scoring system.

Physical examination was a component of scoring systems in six reports.42, 5153, 62, 63

Type of study populations. Most study populations were case series assembled at the time of referral for surgery and collected either at the point of preoperative ultrasound imaging or preoperative surgical evaluation. No studies were based in primary care clinical populations. One study described evaluation of adnexal masses detected during an ovarian cancer screening program.51

Reporting of study populations. Menopausal status of the study populations was described in 28 of the 36 reports; three reports included only postmenopausal women.62, 63, 135

Age was reported for the study population as a mean or median in 18 of 36 studies; it was reported in categories in one additional study. Symptom status was seldom described in the candidate reports.

Race/ethnicity was not reported in any of the studies.

Risk factors for ovarian cancer (besides menopausal status and age, describe above) were not reported, except in one study that reported the proportion of the study population that was nulliparous versus multiparous.138

Methodology. The methodological quality of the included studies may be described as follows:

Reference standard (handling of borderline). Some studies, particularly those assembled at the time of ultrasound investigation rather than surgery, encountered women with masses due to simple cysts with low risk of malignancy. Two studies allowed use of an operative report in lieu of histopathology as a reference standard,87, 116 and one used clinical followup without surgery as an alternate reference standard.48

Verification bias. Fourteen studies failed to verify disease status for all or a significant sample of test-negative women.

Test reliability. Only nine studies provided data on the reliability of test assessments.

Sample size. Only 11 of the reports described a priori recruitment targets or sample size calculations. We excluded studies with fewer than 50 women; however, some studies report subgroup analyses with fewer than 50 women, for example, the subset of postmenopausal women in Strigini et al.169

Use of appropriate statistical tests. The majority of reports (n = 28) used appropriate statistical analysis of the diagnostic data; however seven reports reported inadequate analyses.

Blinding. None of the reports described the use of techniques to blind investigators to the disease status of study patients.

Definition of positive and negative test. Most studies (n = 24) provided a priori definitions of a positive and negative test result; studies failed to meet this criterion most often when no explicit threshold was set a priori, but it was set based on study data.

Explicit validation method. Half of the reports (18/36) used some explicit validation method; many of the reports replicated previously described scoring systems in a new population. In many cases, these studies described new scoring systems which were not always validated.

The most common validation method was replication in a separate population. Two studies used validation techniques within a single study population: one split-sample,209 and one bootstrap.205

Diagnostic Accuracy of Scoring Systems

This section considers the diagnostic accuracy of the RMI (Jacobs 1990) and subsequent replications and refinements (RMI2, RMI3, Jacobs 1993, and Timmerman models).

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig17130.jpg.

   Figure 17. Performance of RMI model of Jacobs et al. (1990)116 in development set (study 6) and subsequent validation studies using cutoff score of 200

Key to Figure 17: 1 = Tingulstad et al., 1996;180 2 = Timmerman et al., 1999;210 3 = Mol et al., 2001;207 4 = Lu et al., 2003;206 5 = Manjunath et al., 2001;139 6 = Jacobs et al., 1990;116 7 = Davies et al., 1993;87 8 = Morgante et al., 1999;147 9 = Obeidat et al., 2004;208 10 = Asif et al., 2004;72 11 = Aslam et al., 2000;203 12 = Dowd et al., 199355

RMI. The first scoring system based on a statistical model was published in 1990;116 it has been replicated in 11 subsequent clinical populations.55, 72, 87, 139, 147, 180, 204, 206208, 210 The diagnostic performance in these 12 studies is shown in Figure 17.

The RMI is a clinical prediction rule based on ultrasound, CA-125, and menopausal status data defined as follows:

RMI = U × M × CA-125

where ultrasound (transabdominal) is scored 1 point for each of the following characteristics: multilocular cyst, evidence of solid areas, evidence of metastases, presence of ascites, and bilateral lesions.

U = 0 for ultrasound score of 0

    = 1 for ultrasound score of 1

    = 3 for ultrasound score ≥ 2

CA-125 = Serum CA-125 in U/ml

Menopausal status

M = 1 if premenopausal

     = 3 if postmenopausal

In the initial report, Jacobs et al.116 used the cutoff value of 200. At this cutpoint, sensitivity was 85 percent and specificity was 97 percent among a population of 143 women undergoing surgical investigation for an adnexal mass. The performance of the initial model (study 6 in Figure 17) has, in most studies, failed to be equaled in subsequent attempts at validation. Three of the subsequent 11 studies have similar performance (studies 7, 9, 10 in Figure 17).72, 87, 208 It is notable that these three studies have fewer quality features (n ≤ 4) than the other eight studies (n ≥ 5 of 7 quality features).

When sensitivity and specificity are combined separately using a random-effects model, the pooled sensitivity is 0.78 (95% CI, 0.72 to 0.84) and the pooled specificity is 0.90 (0.81 to 0.95).

RMI2. In 1996, Tingulstad et al.180 reported a refinement to the original RMI scoring system, commonly referred to as RMI2. RMI2 is defined identically to RMI except that new weights were used for the ultrasound and menopause components as follows:

U = 1 for ultrasound score of 0–1

    = 4 for ultrasound score ≥ 2

M = 1 if premenopausal

     = 4 if postmenopausal

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig18130.jpg.

   Figure 18. Performance of RMI2 model of Tingulstad et al. (1996)180 in development set (2) and subsequent validation studies using cutoff score of 200

Key to Figure 18: 1 = Andersen et al., 2003;202 2 = Tingulstad et al., 1996;180 3 = Manjunath et al., 2001;139 4 = Ma et al., 2003;134 5 = Morgante et al., 1999;147 6 = Aslam et al., 2000203

A cutoff value of 200 was also recommended for RMI2. Like the RMI, the RMI2 scoring system has been replicated.134, 139, 147, 207 The original report of RMI2 found sensitivity of 0.8 and specificity of 0.92. Subsequent validation studies have performed no better. These validation studies all exhibited five or more quality features. The pooled sensitivity of all five studies is 0.77 (0.71 to 0.82), and pooled specificity 0.89 (0.85 to 0.91). The summary ROC curve is shown in Figure 18.

RMI3. Subsequently, a further refinement to the RMI and RMI2 was reported by Tingulstad et al.211 This third scoring system is defined identically to RMI and RMI2 except that new weights were used for the ultrasound and menopause components as follows:

U = 1 for ultrasound score of 0–1

    = 3 for ultrasound score ≥ 2

M = 1 if premenopausal

     = 3 if postmenopausal

A cutoff value of 200 was also recommended for RMI3. The RMI3 scoring system has been replicated in one additional study.139 The original report of RMI3 found sensitivity of 0.71 and specificity of 0.92, while the validation study reported very similar performance, with sensitivity of 0.74 (0.65 to 0.83) and specificity of 0.91 (0.83 to 0.99).

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig19130.jpg.

   Figure 19. Performance of model of Tailor et al. (1999)209 in development set (4) and subsequent validation studies

Key to Figure 19: 1 = Mol et al., 2001;207 2 = Valentin et al., 2001;185 3 = Aslam et al., 2000;203 4 = Tailor et al., 1999;209 5 = Aslam et al., 2000204

Tailor and subsequent replications. Tailor et al.209 reported a scoring system based on an artificial neural network method that was based on a small population of 67 women total, 15 of whom had malignancies. Unlike the RMI family of systems described above, this system did not include CA-125, but considered age, menopausal status, and a variety of ultrasound morphological features and Doppler indices. While this system reported using 52 cases as a training set and 15 cases as a test set, the performance of the system was reported only for the study population as a whole: sensitivity 0.93 (95% CI, 0.81 to 1.0) and specificity 1.0 (0.94 to 1.0). Subsequently four studies have replicated this system showing markedly poorer diagnostic performance (Figure 19) when applied to separate populations, consistent with over-fitting in the initial model development.185, 203, 204, 207

Table 22

Performance of other scoring systems at initial derivation and subsequent replication in another population
Initial descriptionSubsequent validationSensitivity (95% CI)Specificity (95% CI)
Initial developmentReplicationInitial estimateReplication
Timmerman LR1210Valentin 20011850.87 (0.79 to 0.97)0.62 (0.44 to 0.80)0.92 (0.87 to 0.97)0.79 (0.68 to 0.90)
Timmerman AAN1178Mol et al. 20012070.94 (0.87 to 1.0)0.90 (0.79 to 1.0)0.90 (0.85 to 0.96)0.60 (0.52 to 0.68)
Timmerman AAN2178Mol et al. 20012070.96 (0.90 to 1.0)0.90 (0.79 to 1.0)0.94 (0.89 to 0.98)0.46 (0.38 to 0.54)
Timmerman LR2178Mol et al. 20012070.96 (0.90 to 1.0)0.90 (0.79 to 1.0)0.86 (0.79 to 0.92)0.56 (0.48 to 0.64)
Jacobs 1993212Mol et al. 20012070.85 (0.74 to 0.96)0.90 (0.79 to 1.0)0.97 (0.94 to 1.0)0.61 (0.53 to 0.69)
Twenty other scoring systems have been described, none of which has been as extensively replicated as the systems described above. Five of these other scoring systems have been validated in one other population as shown in Table 22; each of the systems was based on ultrasound morphology, CA-125, Doppler, and menopausal status. The models were: Timmerman LR1,178, 210 Timmerman AAN1,178, 207 Timmerman AAN2,178, 207 Timmerman LR2,178, 207 and Jacobs 1993.207, 212

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig20130.jpg.

   Figure 20. Performance of various other scoring systems in development and validation studies in separate populations

Arrows indicate change in performance estimate from development (start of arrow) to validation (end of arrow) for paired studies of each scoring system.

Key to Figure 20: 1–4 = Timmerman et al., 1999;210 5–8 = Mol et al., 2001;207 9 = Jacobs et al., 1993;212 10 = Valentin et al., 2001185

In each case, the initial diagnostic performance described by the system significantly degrades on replication in another population (Figure 20).

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig21130.jpg.

   Figure 21. Performance of various other unvalidated models

Key to Figure 21: 1 = Twickler et al., 1999;48 2–3 = Biagiotti et al., 1999;205 4 = Torres et al., 2002;181 5 = Schutter et al., 1998;63 6 = Balbi et al., 2001;53 7 = Roman et al., 199742

Ten additional systems were described in seven reports.42, 48, 53, 63, 181, 203, 205 Most of these studies used logistic regression or artificial neural network modeling methods to derive a new model. One used bootstrap validation techniques,205 but none was validated in another study population. One of these studies203 reported on newly fitted logistic regression models created by forcing variables that were include in previously described scoring systems.178, 209, 213 Aslam et al.204 constructed three separate models based on each possible pairwise combination of the three previously described models. The diagnostic performance of these miscellaneous unvalidated models is shown in Figure 21.

Thirteen further reports describe the diagnostic performance of simple rules for combining single test or single modalities into a decision rule.42, 51, 52, 62, 63, 66, 86, 97, 103, 105, 135, 138, 169 None of these criteria has been validated in another population. Each of these studies used dichotomous rules for two or more tests (or modalities) and then combined them using a simple rule like “malignant if any test positive” (Boolean or) or “malignant if all tests positive” (Boolean and). Some of the studies reported diagnostic performance of several different simple rules.

Twelve of these studies used ultrasound and CA-125, five incorporated physical exam, two included other serum tumor markers42, 63 and one used age over 50 years.138

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig22130.jpg.

   Figure 22. Performance of unvalidated simple combination rules in postmenopausal women only

Key to Figure 22: 1–4 = Maggino et al., 1994;135 5–8 = Schutter et al., 1998;63 9–11 = Schutter et al., 1994;62 12–13 = Strigini et al., 1996;169 14–17 = Guerriero et al., 1998;103 18–19 = Guerriero et al., 2002105

Six of these studies reported results for postmenopausal women separately: in three studies, the entire study population was postmenopausal62, 63, 135 while three studies reported diagnostic performance for the postmenopausal subgroup separately.103, 105, 169 The diagnostic performance of 18 simple combination rules in these six studies is shown in Figure 22.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig23130.jpg.

   Figure 23. Performance of unvalidated simple combination rules in mixed pre- and postmenopausal populations

Key to Figure 23: 1–2 = Andolf et al., 1990;52 3–4 = Adonakis et al., 1996;51 5–9 = Mancuso et al., 2004;138 10 = Gadducci et al., 1988;97 11–12 = Chou et al., 1994;86 13 = Alcazar et al., 1999;66 14–15 = Roman et al., 1997;42 16–17 = Strigini et al., 1996169

In contrast, the diagnostic performances of 17 simple combination rules in studies that include both premenopausal and postmenopausal women in the study population are shown in Figure 23.

The results show a wide range of sensitivity and specificity. This variation reflects differences in decision thresholds (e.g., CA-125 > 35 U/ml versus CA-125 > 65 U/ml) and in the rules for combining tests (e.g., use of Boolean or versus and when combining results of two or more tests).

Discussion

No scoring systems were both developed and validated expressly for evaluating adnexal masses in postmenopausal women. Existing scoring systems that have been validated have all been developed in mixed pre- and postmenopausal populations. Those scoring systems that have been described in populations of postmenopausal women were neither rigorously developed (they consist of simple combination rules) nor validated in other populations.

The highest demonstrated specificity obtained with these scoring systems appears to be in the range of 90 to 95 percent and, at this range of specificity, the sensitivity appears to be in the range of 65 to 80 percent. However, as suggested by the performance in the few populations of postmenopausal women studied, the same degree of sensitivity and specificity is unlikely to be possible. Reliable estimates of the diagnostic performance of scoring systems cannot be determined from these studies.

This review of scoring systems demonstrates several important limitations of predictive models and has important implications for the clinical usefulness of these models and the future research in this area of inquiry. First, validation in an external population is critical to obtain accurate estimates of diagnostic performance, because all modeling techniques lead to overestimation of diagnostic performance in the data from which it was derived. This overestimation of diagnostic performance is clearly demonstrated by comparing the development and validation studies described for RMI, Tailor, and other scoring systems (Figures 1720). The studies described here suffer from being relatively small for modeling; reliable variable selection and parameter estimation requires at least 10 to 15 cases (in this case, ovarian malignancies) for every term selected in a predictive model. Few, if any, met this statistical rule of thumb. This limitation is particularly apparent in the case of the Tailor model, where subsequent studies demonstrated a high degree of overestimation of the original model. Third, these studies used populations that were identified following referral for surgery in most cases, after some filtering had already occurred. Furthermore, these studies failed to describe the initial presentation (symptomatic or asymptomatic, palpable or non-palpable mass) of women eventually enrolled. Thus, the applicability of these studies to women in primary care, where an adnexal mass is often first noted, is uncertain.

Question 5: Monitoring Women with Suspected Benign Masses

Question 5 is: Among women with suspected benign masses on initial investigation, what are the sensitivity and specificity of monitoring with periodic CA-125 and/or interval ultrasound examinations for detecting malignant masses? How does the interval of testing/definition of change affect sensitivity and predictive value?

Approach

For each study we sought to identify a population of patients with a screening abnormality which was “probably benign” and which the authors felt did not meet criteria for immediate surgical intervention. We then attempted to define the outcomes of further testing in the defined population, including the results of subsequent testing and final clinical outcome as defined by a pathology report or extended clinical followup. The interpretation of results is limited by the narrow scope of Question 5. Specifically, it is often difficult to identify a subgroup of patients with a screening abnormality which could be defined as a “suspected benign lesion” within larger screening studies. Often, results are not stratified with respect to these sub-populations, making it difficult to calculate sensitivity and specificity of the followup testing. In addition, by definition, it is also difficult to estimate the “sensitivity” of a followup regimen. We assumed that this refers to detection of cancer as part of the followup regimen, and that women with cancer diagnosed outside of the followup were “false negatives.”

Results

Table 23

Studies of followup regimens for benign-appearing lesions
StudyPopulationNFollowup intervalLength of followupLoss to followupTrue/false positivesdetected during followupCancers missed
Population-based studies (followup of “benign” masses identified in screening)
Menon et al., 200145Followup of scans considered “equivocal”17“Equivocal” scans followed every 6 weeks until clearly normal or abnormal; normal scans followed with CA-125 every 3 monthsMedian 6.8 yearsNot reported1 cancer/5 benign lesions0 (1 within 6 weeks of initial test, before first followup scan)
Modesitt et al., 200340Followup of simple cysts < 10 cm2,763TVUS every 3–6 months for simple cystsMean 6.3 yearsNot reported7 cancers/0 benign lesions3 cancers, none developed in the original cyst
Schin-caglia et al., 1994217Followup of post-menopausal ovaries > 9 cc, or with simple cyst347If cyst: followed with ultrasound every 6 months; if change, referred; others: referral if unchanged at 3 and 6 months“At least” 1 yearNot reported, but all had “at least 1 year”2 cancers/96 benign lesionsNone in 249 not referred
Kurjak et al., 1994127Followup of post-menopausal women with simple cyst > 2.5 cm but < 5 cm, resistive index ≥ 4.1)88 (of 404 with simple cysts)Repeat ultrasound every 6 months6 monthsNot reported1/17 with benign lesions0
Castillo et al., 2004214Followup of post-menopausal women with simple cyst < 10 cm215Repeat ultrasound and CA-125 in 3 months, then every 6 monthsMedian 27 months30.6%0/44 benign masses1
Case series (clinical history prior to identification of mass not routinely described)
Valentin and Akrawi, 2002218Followup of post-menopausal women with low score on ultrasound malignancy risk scale162Repeat ultrasound 3, 6, 9, and 12 months, then every 12 months; test positive if increase in size or cyst more complexMedian 3 years0 (cancer and mortality tracked through registry)0 cancers/7 patients underwent surgery for change0
Maggino et al., 1994135Followup of post-menopausal women with cysts < 5 cm, thin wall, no septae, no free fluid45Details on followup strategy not reportedNot reported4.4%0/00
Levine et al., 1992216Followup of voluntary screening of post-menopausal women with unilocular simple cyst32Repeat ultrasound every 3 months × 1 year, then every 6 months“Over half at least one year”22.2%0/00
Goldstein et al., 1989215Followup of post-menopausal women with simple cysts ≤ 5 cm16Repeat ultrasound (abdominal)Mean 29 months6 (12% of original 48)0/2 with benign lesions0

Abbreviations: CA-125 = cancer antigen 125; TVUS = transvaginal ultrasound

We identified nine articles meeting the criteria for this question;40, 127, 135, 145, 214218 these are summarized in Table 23, with details in Evidence Table 5 (Appendix D *). Five were population-based screening studies of asymptomatic, postmenopausal patients without known ovarian masses;40, 127, 145, 214, 217 one was a voluntary screening program.216 All addressed to some degree the use of interval ultrasound for detecting malignant masses. Although several used CA-125 as part of their followup, none reported any results based on the use of interval CA-125 in a population with adnexal lesions. None addressed the effects of changing the interval of testing on sensitivity and predictive value; the disparate nature of the studies prohibited any inferences on the effect of test interval on sensitivity.

Menon et al.145 performed a large prospective screening study of 22,000 postmenopausal women older than 45 years. Initial screening consisted of CA-125; patients with CA-125 ≥ 30 underwent endovaginal ultrasound evaluation. Results were interpreted as normal (ovarian volume < 8.8 ml/normal morphology), equivocal (volume < 8.8 ml, abnormal morphology), or abnormal (volume ≥ 8.8 ml). Normal morphology was defined as uniform hypoechogenicity and smooth outline. Abnormal morphology was defined as simple cysts or complex lesions. Patients with normal scans were triaged to repeat CA-125 every 3 months for a year and subsequently returned to yearly screening; median followup was 6.8 years, with loss to followup not reported. Patients with abnormal scans were referred to a gynecologist for consideration of surgical intervention. Patients with equivocal scans were triaged to repeat ultrasound at 6-week intervals until a scan could be classified as normal or abnormal. Of 741 patients who were triaged to ultrasound, 20 (2.7 percent) index cancers were identified. We focused on the group of patients with “equivocal” scans who were triaged to interval testing in an attempt to answer the study question. There were 17 equivocal scans. Of these, nine had simple cysts which were followed and did not result in a cancer diagnosis (true negatives). One patient died of pneumonia prior to her first repeat ultrasound, and one died of advanced ovarian cancer prior to her first repeat ultrasound; this cancer death could possibly be considered a false negative for the followup strategy, although it could also be considered a false negative from the original study since the death occurred within 6 weeks of the initial scan. Six patients were scheduled for surgery following an equivocal scan, presumably due to abnormal followup ultrasound. One of these had ovarian cancer (true positive), and the other five had benign disease (false positive). Because the number of equivocal scans was so small, and because the classification “equivocal” does not necessarily imply that the lesions were felt to be “suspected benign” as designated in Question 5, it is not possible to calculate the sensitivity and specificity of prolonged monitoring strategies using this study. The authors do not draw any conclusions regarding the appropriateness of interval testing.

Modesitt et al.40 performed a large screening study of 15,106 asymptomatic women at least 50 years old without a history of ovarian cancer. Patients were screened with TVUS. Criteria for abnormality were ovarian volume > 10 ml and any morphologic abnormality, including simple or complex cysts. Patients with abnormal TVUS were triaged to repeat TVUS in 4 to 6 weeks, with Doppler flow ultrasound, CA-125 level, and tumor morphology indexing performed at the second visit. Patients with simple unilocular cysts which were considered likely benign were triaged to repeat TVUS every 3 to 6 months. Mean followup was 6.3 years. Two thousand and seven hundred and sixty-three (2,763) women were diagnosed with 3,259 unilocular cysts. Spontaneous resolution of unilocular cysts occurred in 2,261 (69.4 percent) of lesions. Ten patients subsequently developed ovarian cancer. Seven of these had additional abnormal areas which subsequently developed on TVUS (considered true positives because they were subsequently identified by interval testing). Two developed ovarian cancer after the cyst in question had resolved on sonographic followup (these might be considered false negatives). One patient developed cancer in the ovary opposite the cyst being followed (this might also be considered a false negative). Calculated on a per-patient basis, the sensitivity and specificity of followup testing in the population with a simple unilocular ovarian cyst are 70 percent (95% CI, 41.6 to 98.4 percent) and 100 percent (99.9 to 100 percent), respectively. Because none of the unilocular cysts subsequently developed into a cancer, the sensitivity and specificity improve to 100 percent (57.1 to 100 percent) and 100 percent (99.9 to100 percent), respectively, when calculated on a per-lesion basis. Followup time is a major strength of this study. The authors conclude that unilocular ovarian cysts are associated with a very low risk of malignancy and can be safely followed with serial ultrasound.

Schincaglia et al.217 performed a screening study of 3,541 asymptomatic postmenopausal patients. All patients underwent transabdominal ultrasonography with assessment of ovarian volume and morphology. Patients were divided into four groups based on the results of the initial ultrasound. All patients with ovarian volume > 15 ml (Group 4) were referred for repeat “level II” ultrasonography for morphologic assessment and fine needle aspiration (FNA) when feasible. Patients with ovarian volume between 9 and 15 ml (Group 3) were triaged to followup ultrasound at 3 and 6 months. Patients with ovarian volume < 9 cm but a cystic appearance (Group 2) were triaged to followup ultrasound in 6 months. Patients with ovarian volume < 9 ml and homogeneous appearance (Group 1) were considered negative and had no further intervention. Clinical followup at 1 year and pathology results if surgery was performed were considered the reference standard. Two hundred and eighty-three (283) patients (Groups 2 and 3) were deemed appropriate for followup using repeat ultrasound at 3- to 6-month intervals without the need for immediate referral for FNA/surgery. Of these 283 patients, 34 subsequently developed concerning ultrasound findings and were referred for a level II scan and/or possible FNA. The clinical results of this group of 34 are not given separately. Of the 249 who had non-concerning followup scans, none developed cancer with followup of at least 1 year (“true negatives”). Therefore, the specificity of ultrasound followup is 100 percent (95% CI, 98.8 to 100 percent) for patients with an initial abnormal but “probably benign” ultrasound. Sensitivity within this group cannot be calculated with the information given in the publication. The ability to answer Question 5 would be enhanced if specific outcomes of each of the four groups defined by the authors had been given. The study was also limited by the fairly short followup interval and the lack of prior or concurrent validation of the ultrasonographic groups defined in the study.

Kurjak et al.127 screened 5,013 women 40 years old or older (30.6% postmenopausal), of whom 404 had simple cysts with a diameter between 2.5 and 5 cm and a resistive index greater or equal to 0.41. These women received a followup scan in 6 months. Investigators reported the results of 88 women for whom the 6-month scan results were available. The definition of change prompting further diagnosis was not explicitly described. Of the 88 women, 18 ultimately underwent surgery based on the findings at 6 months, with one cancer detected and 17 benign lesions. Results stratified by menopausal status were not provided. This study was limited by lack of details on clinical decision rules, and short followup.

Castillo et al.214 screened 8,794 postmenopausal women; 215 had simple unilocular cysts less than 10 cm in diameter. Twelve percent of these masses were asymptomatic. These women underwent repeat ultrasound and CA-125 in 3 months, with subsequent followup studies every 6 months. Progression was defined as an increase in diameter of 1 cm or more, regression as a decrease of 1 cm, and resolution as absence of the cyst at 2 consecutive visits 6 months apart. Median followup was 27 months. There was one interval ovarian cancer between studies, and 44 women had benign masses removed. Although this study was among the highest quality studies in terms of reporting of relevant data, it is limited by the relatively small size and the high loss to followup (30.6%).

The remaining four studies135, 215, 216, 218 were all small (less than 200 patients), and of variable quality (Table 23). None reported any interval cancers in patients receiving followup, or cancers detected during followup. The study of Valentin et al.218 was notable for length of followup (median 3 years) and complete ascertainment of followup status using Swedish cancer and death registry data.

Discussion

There are limited data available to support a global definition of “probably benign” ovarian lesions or to support a specific method of interval testing to identify ovarian malignancy among patients in whom such lesions have been identified. For the most part, studies are limited by small size, variable length of followup, variable definitions of significant change and thresholds for intervention, and methods for followup.

The question of how best to define and evaluate “sensitivity” of followup regimens is a difficult one. Several factors need to be considered. First, interval cancers presenting between the initial study and the first followup visit may well be considered false negatives of the initial study; alternatively, they may reflect a too-long followup interval. Second, given the lack of data on the natural history of ovarian cancer, it is unclear whether cancers developing in benign-appearing lesions represent subclinical cancers present at the time of the initial diagnosis, or new cancers representing malignant transformation of a benign cyst. If the latter, then the ultimate success of any followup regimen may depend as much on the natural history of a given malignancy as on the sensitivity and specificity of the tests used for followup. Finally, cancers identified during followup should ideally have high survival rates (although whether such high survival rates would reflect the efficacy of the followup or the biology of cancers which are associated with benign-appearing cysts is unclear). The number of cancers identified in the reviewed studies was too small to draw any inferences about relative survival.

Overall, only two interval cancers occurred during followup in the studies identified (one prior to the first followup scan), and 10 cancers were identified during followup. As noted, an additional three cancers developed after resolution of a cyst or in the contralateral ovary. The highest quality study40 provides good evidence for the safety of prolonged followup with interval TVUS at 3- to 6-month intervals for patients with unilocular ovarian cysts of up to 10 cm in diameter, and the findings of the other studies are consistent with this conclusion.

Question 6: Surgical Morbidity and Mortality

Question 6 is: Among women with adnexal masses, what are the morbidity and mortality from diagnostic surgery (laparoscopy or laparotomy)? At what point does the risk of surgery outweigh the risk of detecting malignancy?

Approach

We searched the literature for studies that reported the morbidity and mortality of surgical management of adnexal masses. We also used the Nationwide Inpatient Sample (NIS) discharge database, maintained by the Agency for Healthcare Research and Quality (AHRQ), to obtain estimates of morbidity and mortality associated with diagnostic laparoscopy or exploratory laparotomy for a range of diagnoses associated with adnexal masses. The NIS is limited to inpatient procedures and does not cover ambulatory surgical centers, where some adnexal masses are likely to be managed, especially those masses thought to have a low likelihood of cancer. In addition to surgical complications, we also examined articles that provided data on the test characteristics of frozen section pathologic diagnosis; especially in the setting of minimally invasive procedures, false negative results on frozen section might lead to suboptimal surgical management and delayed therapy, while false positive results might lead to more extensive surgery than necessary, with possible implications for increased surgical morbidity and affects on ovarian function.

Results of Literature Search and Screening

We identified 24 articles that met our inclusion criteria;32, 37, 41, 219239 these are summarized in Evidence Table 6 (Appendix D *). Twenty-two articles reported on the morbidity and mortality of surgical management of adnexal masses.32, 37, 41, 219234, 237239 In addition, two of the included articles reported on the sensitivity and specificity of frozen section;220, 236 false negative frozen section results could lead to inadequate surgical management and delayed treatment, while false positive results could lead to more extensive surgery than necessary. Finally, one of the included articles addressed the potential effect of conservative surgery for removal of an ovarian cyst resulting from endometriosis (endometrioma) on subsequent fertility.235

Methodological Quality of Included Studies

Size of population. None of the papers provided a description of the referral base; two32, 37 were limited to gynecologic oncology practices. Lack of information on the referral base prevents assessment of generalizability. Since all of these studies were performed in centers experienced in laparoscopic surgery, the generalizability may well be limited.

Number of cases. Five studies had fewer than 200 cases, with correspondingly wide confidence intervals for reported event rates. Two studies had larger numbers of cases, 683230 and 757.219 However, the study by Marana et al.230 was limited to women under 40.

Patient selection. None of the studies reported how patients were referred to the surgical practices. All provided criteria for laparoscopic management of masses, based on various criteria to suggest high or low risk of malignancy. We found two trials where patients were randomized to laparoscopy or laparotomy,224, 225 but randomization methods were not well described.

Application of reference standard. In this sense, “reference standard” refers to the method by which a complication was diagnosed. Only two studies described followup beyond 8 weeks, but they did not detail whether all patients underwent similar followup protocols.

Results

There were three deaths in one study of 146 patients (all undergoing laparoscopy), and none in any of the other studies (a total of 5,599 patients). Pooling all patients, the mortality was 0.05 percent, with a 95% CI of 0.01 to 0.17 percent.

Table 24

Morbidity in series of patients undergoing surgical management of adnexal masses
StudyNPatient populationComplication rate (95% CI)Notes
Randomized trials of laparoscopy versus laparotomy
Deckardt et al., 199422419222.4% laparoscopy,Laparotomy: 30.3% (21.8 to 42.3%)“Randomized,” but some differences between two arms
26.4% laparotomy postmenopausalLaparoscopy: 11.2% (6.8 to 18.7%)3.5% conversion
Fanfani et al., 2004225100Laparoscopy: 10% postmenopausalLaparotomy 6% (1.8 to 17.5%)No malignancies
Laparotomy: 20% postmenopausalLaparoscopy 0% (0 to 10.6%)Small sample size
Non-randomized comparisons
Hidlebaugh et al., 1997227405199 laparoscopyLaparotomy 27.2% (21.8 to 34.0%)Selection criteria for laparoscopy not defined
206 laparotomyLaparoscopy 2.5% (1.0 to 6.0%)Potential other risk factors for complications not described
20.2% postmenopausal
Yuen et al., 1997239110Laparotomy: 6% postmenopausalLaparotomy 28% (18.5 to 43.1%)Difference between complication rates attributable to higher number of postoperative complications in laparotomy group
Laparoscopy: 3.8% postmenopausalLaparoscopy: 9.6% (4.2 to 21.8%)
Carley et al., 200222110644 laparotomyLaparotomy 4.6% (0.7 to 16.7%)
62 laparoscopyLaparoscopy 0% (0 to 8.6%)
Menopausal status not reported
Chapron et al., 1997222186121 laparoscopy,Laparotomy: 15.4% (8.9 to 27.0%)Patients with high suspicion of malignancy went directly to laparotomy
65 laparotomyLaparoscopy: 8.3% (4.6 to 15.0%)Results not analyzed by “intention to treat”—19 of laparotomy patients started as laparoscopy
43% postmenopausal13.6% of laparoscopies converted to laparotomy
Laparoscopy only
Childers et al., 199632138Not described in detail; age range 9–9110.1% (6.2 to 16.7%)Length of followup not given for benign cases
Gynecologic oncology service
Results not stratified by age or menopausal status
8.0% conversion to laparotomy
Canis et al., 199421975711.4% postmenopausal1.1% (0.53 to 2.1%)Mean followup 42 months (range 3–153 months)
Dottino et al., 19993716053% postmenopausal7.5% (4.3 to 12.9%)Gynecologic oncology service
Marana et al., 2004230620All less than 40 years old0.9% (0.4 to 2.0%)Mean followup 30 months
Single surgeon
Parker et al., 19944161100% postmenopausal3.3% (0.4 to 12.3%)Masses “presumptively benign” based on imaging, exam, clinical history
4.9% conversion
Sadik et al., 19992322203.2% postmenopausal0.9% (0.06 to 3.5%)Malignant masses “excluded from study”
Chi et al., 2004223146Menopausal status not reported; median age 54Mortality 2.5% (0.5 to 6.3%)Clinical history not described—not clear if other conditions besides adnexal mass included
Morbidity 22.1% (15.1 to 32.7%)
Havrilesky et al., 200322639637.2% postmenopausalLaparoscopy 8.3% (6.0 to 11.6%)Risk of complication increased with concurrent hysterectomy
Lok et al., 20002285135.5% postmenopausalLaparoscopy 13.3% (10.6 to 16.6%)No malignancies 75.% symptomatic
Mann and Reich, 199222944100% postmenopausalLaparoscopy 4.6% (0.7 to 16.7%)1/44 had cancer
Parker and Proietto, 199723186Menopausal status not reportedLaparoscopy 22.1% (15.1 to 32.7%)1/86 had cancer
Serur et al., 200123310049% postmenopausalLaparoscopy 10% (5.6 to 19.0%)-
Shalev et al., 199423455100% postmenopausalLaparoscopy 10.9% (5.2 to 22.9%)-
Tarik and Fehmi, 20042371478Menopausal status not reported (but mean age 30)Laparoscopy: Diagnostic procedures 1.8% (0.8 to 3.8%)Proportion with preoperative diagnosis of adnexal mass not reported
Minor procedures: 1.4% (0.8 to 2.3%)
Van Herendael et al., 1995238121Menopausal status not reportedLaparoscopy: 1.7% (0.1 to 6.4%)-

Abbreviation: CI = confidence interval

Table 24 shows the results from individual studies. The two randomized studies224, 225 both showed lower morbidity with laparoscopy compared to laparotomy, although only one of them224 had sufficient power to show a statistically significant difference. Although the study of Deckardt and colleagues224 was randomized, there were substantial differences in the procedures performed in each arm. Laparoscopy patients tended to undergo more conservative procedures: they were significantly more likely to have cystectomy (60.0 vs. 20.2 percent), less likely to have oophorectomy (0.8 vs. 20.2 percent), and less likely to have bilateral salphingo-oophorectomy (4.0 vs. 21.4 percent). Both studies where laparoscopy was directly compared to laparotomy showed increased complication rates (primarily postoperative complications) among the laparotomy patients. The four non-randomized studies all showed higher morbidity rates with laparoscopy, but there were substantial differences in patient selection criteria.

In series of laparoscopy cases, morbidity rates ranged from 0.9 percent to 22.1 percent (Table 24); series differed widely in their selection criteria for laparoscopic management of the mass. Few stratified results based on menopausal status; in some cases, postmenopausal patients were explicitly excluded. In one study where multivariate analysis was performed to assess for risks of morbidity, performance of additional procedures (hysterectomy) significantly increased the risk of morbidity, while a history of hysterectomy increased the likelihood of conversion to laparotomy (presumably because of increased technical difficulty secondary to postoperative adhesions).226

Nationwide Inpatient Sample

Table 25

Estimated U.S. discharges for exploratory laparotomy and diagnostic laparoscopy with discharge diagnoses consistent with adnexal mass, with mortality and complication rates
Number of dischargesDiedMortality rateComplicationsComplication rate
OVARIAN CANCER118,04270996.0%5150.4%
Laparoscopy (no ovarian procedures)22252.3%00.0%
Laparoscopy plus conservative ovarian procedure2700.0%00.0%
Laparoscopy plus oophorectomy1600.0%00.0%
Laparotomy (no ovarian procedure)566111.9%50.9%
Laparotomy plus conservative ovarian procedure6800.0%00.0%
Laparotomy plus oophorectomy3600.0%00.0%
OTHER ADNEXAL CANCER780151.9%50.6%
Laparoscopy (no ovarian procedures)000.0%00.0%
Laparoscopy plus conservative ovarian procedure000.0%00.0%
Laparoscopy plus oophorectomy000.0%0
Laparotomy (no ovarian procedure)1515100.0%00.0%
Laparotomy plus conservative ovarian procedure000%0
Laparotomy plus oophorectomy000%0
BENIGN OVARIAN NEOPLASM145,0242550.2%9640.7%
Laparoscopy (no ovarian procedures)1,56050.3%352.2%
Laparoscopy plus conservative ovarian procedure7500.0%00.0%
Laparoscopy plus oophorectomy2400.0%00.0%
Laparotomy (no ovarian procedure)70040.6%162.3%
Laparotomy plus conservative ovarian procedure7200.0%00.0%
Laparotomy plus oophorectomy3100.0%00.0%
PELVIC MASS13,625300.2%600.4%
Laparoscopy (no ovarian procedures)
Laparoscopy plus conservative ovarian procedure41
Laparoscopy plus oophorectomy
Laparotomy (no ovarian procedure)35514.3%
Laparotomy plus conservative ovarian procedure
Laparotomy plus oophorectomy
OVARIAN CYSTS474,4853760.08%30450.6%
Laparoscopy (no ovarian procedures)5,5080.00%651.2%
Laparoscopy plus conservative ovarian procedure2740.00%0.0%
Laparoscopy plus oophorectomy1730.00%0.0%
Laparotomy (no ovarian procedure)1,429795.53%191.3%
Laparotomy plus conservative ovarian procedure990.00%0.0%
Laparotomy plus oophorectomy860.00%0.0%
PARA-OVARIAN CYST21,80750.0%920.4%
Laparoscopy (no ovarian procedures)2710.0%0.0%
Laparoscopy plus conservative ovarian procedure2400.0%00.0%
Laparoscopy plus oophorectomy900.0%00.0%
Laparotomy (no ovarian procedure)611016.4%00.0%
Laparotomy plus conservative ovarian procedure500.0%00.0%
Laparotomy plus oophorectomy50.0%0.0%
PELVIC INFLAMMATORY DISEASE430,0274390.1%47931.1%
Laparoscopy (no ovarian procedures)7,18440.1%1502.1%
Laparoscopy plus conservative ovarian procedure44500.0%92.0%
Laparoscopy plus oophorectomy15900.0%53.1%
Laparotomy (no ovarian procedure)2,129100.5%532.5%
Laparotomy plus conservative ovarian procedure16000.0%00.0%
Laparotomy plus oophorectomy4500.0%00.0%
NORMAL PELVIS108.80000
Table 25 shows the estimated numbers of discharges in the United States in 2000-2001 under each diagnostic class and procedure (standard errors not shown for simplicity). The results illustrate the difficulty in using discharge data to attempt to estimate morbidity and mortality rates for surgical procedures. Both morbidity and mortality are highest for cancer diagnoses, but there is no way to determine the extent to which the underlying disease process contributed to either complications or death; for example, “exploratory laparotomy” or “diagnostic laparoscopy” in many ovarian cancer patients likely represents a “second-look” procedure done to determine response to chemotherapy. Outcomes of these procedures are not relevant to estimating the risks of a primary diagnostic procedure. The laparoscopies that are included in the NIS are likely not representative of all laparoscopies for adnexal masses; since the NIS does not capture surgeries performed at ambulatory surgery centers, the cases within the NIS may represent those for which surgeons had a higher index of suspicion of malignancy, or anticipated higher technical difficulty. Another major limitation is the inability to distinguish between the initial indication for surgery and the final diagnosis. Finally, in order to try to eliminate confounding by additional procedures, we excluded cases in which hysterectomy was performed - however, because hysterectomy is part of the standard initial surgical treatment of ovarian cancer, many cases of initial management are excluded.

Other Outcomes

We identified two studies that reported on the sensitivity and specificity of intraoperative frozen section done to determine pathologic diagnosis.220, 236 They reported similar findings. Both studies defined low malignant potential tumors as cancer. Canis et al.220 reported a sensitivity of 92.2 percent and a specificity of 92.2 percent in 141 women (29.8 percent postmenopausal, 35 percent with cancer or low malignant potential tumors). Tangjitgomol et al.236 estimated similar values, with a reported sensitivity of 91.3 percent and specificity of 93.3 percent in 212 women (menopausal status not reported, cancer prevalence 77 percent). Defining low malignant potential cancers as benign decreased sensitivity in both cases.

We identified only one article that addressed the potential impact of surgical management of benign cysts on fertility. Somigliana et al.235 followed 32 women who received ovarian stimulation after removal of an endometriotic cyst. The mean number of follicles observed in the ovary where the cyst had been removed (2.0 ± 1.5) was significantly lower than in the contralateral ovary (4.2 ± 2.5), suggesting that the surgical procedure may have led to decreased ovarian reserve. An alternative explanation is that the cyst itself had an adverse effect on ovarian reserve.

Discussion

Ideally, reports of adverse outcomes of diagnostic surgery for adnexal masses would be divided into four separate categories, based on preoperative symptoms and postoperative findings: (1) women with symptomatic masses which ultimately proved malignant; (2) women with symptomatic masses which ultimately proved benign; (3) women with asymptomatic masses which ultimately proved malignant; and (4) women with asymptomatic masses which ultimately proved benign. For the first three groups, the operative procedure could be considered appropriate even in the event of morbidity, since there is some benefit (primary surgical therapy for malignancy, or management of symptomatic nonmalignant adnexal pathology) to be gained from surgical diagnosis and treatment. For women with asymptomatic benign masses, there are theoretical benefits for detecting some benign masses, including (1) prevention of subsequent malignant transformation, (2) avoidance of rupture which, for certain benign masses (endometrioma and mature teratoma) could cause acute symptoms, (3) easier surgical management, with fewer complications, compared to management of a larger symptomatic mass, (4) avoidance of torsion (twisting of the adnexa) and emergent surgical management and (5) avoidance of effects on fertility, either from the underlying condition itself or from more extensive surgery for a larger mass. However, we did not identify any evidence for these benefits; the probabilities of these potential benefits also would differ widely depending on the underlying pathology and natural history of a particular mass, the patient's age and reproductive status, and other comorbidities.

Unfortunately, neither the literature nor available discharge data allow estimates of the probabilities of outcomes based on initial presentation. In the case of the literature, this is because of a lack of reporting of the clinical path by which patients come to undergo surgery. In the case of discharge data, it is because of the inherent limitations of the International Classification of Diseases, Ninth Revision (ICD-9) coding. Even if more recent data on ambulatory surgery were available, it would still be limited by coding.

Summary

Mortality for laparoscopic management of adnexal masses at experienced centers appears to be quite low, although the upper bound of this low rate is unclear.

Patient characteristics that determine risk of morbidity are unclear, although the need for more extensive procedures appears to increase the risk. Laparoscopy may have a lower morbidity rate than laparotomy, but this appears to be due, at least in part, to different patient selection criteria and surgical procedures performed.

Two small studies suggest that the false negative rate of intraoperative frozen section diagnosis is approximately eight percent, and the false positive rate is approximately five to seven percent. Whether either type of false result has a significant impact on outcome is unclear.

There is suggestive evidence that removal of a cyst in premenopausal women may affect ovarian reserve, potentially affecting fertility and/or age of menopause, but the underlying pathologic process may also play a role. More data are needed.

There are no data to allow estimation of the risks of a diagnostic procedure in the patient with an asymptomatic mass, or to assess the benefits of surgery in that patient compared to the risk of malignancy.

Question 7: Modeling Diagnostic Strategies

Question 7 is: What are the estimated trade-offs resulting from various strategies for evaluation of the adnexal mass?

Approach

A formal decision analytic approach is often quite helpful for synthesizing evidence coming from a range of sources, of varying quality, and of varying degrees of precision in estimates. Such models are also helpful in identifying which parameters are most important, in order to prioritize future research. Ideally, the underlying natural history of the disease in question can be modeled, with the impact of subsequent clinical interventions estimated based on test characteristics, effectiveness and morbidity from treatment, patient preferences, etc. In addition, the effect of varying both the incidence and natural history of ovarian cancer based on risk factors such as genetic predisposition can also be taken into account if adequate data are available. For example, such models have been quite helpful in exploring the impact of various interventions for cervical cancer prevention.240 In addition, data from currently ongoing trials of ovarian cancer screening will also provide valuable data on natural history. 241

Because of the methodological limitations of the literature on management of adnexal masses cited in the previous sections, a formal decision analysis does not seem appropriate at this time. In order to illustrate some of the key areas for future research, we did a simple estimate of the expected outcomes of several strategies for evaluation of the adnexal mass based on the findings of this review. Because models will ultimately need to incorporate the natural history of ovarian cancer, either to evaluate screening or to estimate the consequences of false negative diagnoses, we also performed a literature review of existing models of the natural history of ovarian cancer and the impact of screening or testing and developed an alternative model.

Predicting Outcomes of Management Strategies

Table 26

Estimates for key model parameters
ParameterValue
Prevalence of adnexal masses in postmenopausal women (Question 1)Malignant: 0.1%
Benign: 1.0%
Sensitivity of the pelvic examination to detect adnexal masses (Question 2)0.45
Specificity of the pelvic examination to detect adnexal masses (Question 2)0.90
Sensitivity of combined morphology and Doppler (Question 3)0.86
Specificity of combined morphology and Doppler (Question 3)0.91
Note: We assumed that the specificity of ultrasound for determining the absence of pelvic mass was 100%.
Sensitivity of CA-125 in postmenopausal women (Question 3)0.80
Specificity of CA-125 in postmenopausal women (Question 3)0.87
Sensitivity of RMI (Question 4)0.74
Specificity of RMI (Question 4)0.91

Abbreviation: RMI = Risk of Malignancy Index

As an example, we can consider one clinical scenario: an asymptomatic postmenopausal woman undergoing a routine bimanual pelvic examination. If the bimanual examination is abnormal, she can undergo a variety of additional tests. We compared several strategies: (1) performing CA-125 only, then operating on women with values greater or equal to 35;(2) performing an ultrasound with Doppler velocimetry (the strategy with highest sensitivity and specificity in our review) and operating on women with positive results both morphologically and with Doppler; (3) performing CA-125, then performing ultrasound with Doppler on women with elevated CA-125 and operating on women with positive ultrasounds; (4) performing ultrasound with Doppler first, then performing CA-125 on women with positive ultrasound results, and operating on women with elevated CA-125, and (5) performing both ultrasound and CA-125 and combining these results with menopausal status to use the RMI (discussed in detail under Question 4); women with RMI scores above the threshold undergo surgery. Strategies 3 and 4 are examples of serial testing, Strategy 5 an example of parallel testing. Table 26 provides estimates for key parameters based on the previous chapters; estimates for test characteristics are taken from the point estimates of the pooled random-effects models.

At the initial pelvic examination, the probability of detecting a mass equals:

Probability of true positive test + Probability of true negative test, or (Prevalence of mass * Test sensitivity) + (1-Prevalence of mass)*(1-Test Specificity)

Similarly, the probability of a negative test equals:

Probability of true negative + Probability of false negative, or (1-Prevalence)*Test Specificity + Prevalence*(1-Sensitivity)

At the time of ultrasound, the “prevalence” of disease is equal to the positive predictive value of the preceding test, the ultrasound, or:

Probability of true positive pelvic/(Probability of true positive pelvic + Probability of false negative pelvic)

Similar calculations were made for each test or combination of tests.

Table 27

Predicted outcomes of ultrasound plus Doppler or CA-125 testing to determine surgical management in a hypothetical cohort of 100,000 postmenopausal women*
Underlying pathology Prevalence of malignancy among test positivesProportion of all tests positiveMissed cancers
CancerBenign massNormalTotal
Baseline cases1001,00098,900100,0000.1%
Pelvic exam
Positive454509,89010,385
Negative5555089,010896150.4%10.4%55
STRATEGY: CA-125 only
CA-125
Positive36591,2861,380
Negative93928,6049,0052.6%15.3%9
Surgery
Positive3636
Negative5912861,3452.6%
STRATEGY: Morphology/Doppler only
Morphology/Doppler
Positive3941080
Negative64109,89010,30649.8%0.8%6
Surgery
Positive390039
Negative04104149.8%
*

Some numbers may not add up correctly because of rounding.Abbreviation: CA-125 = cancer antigen 125

Table 27 shows the predicted outcomes (in terms of detected and missed cancers) of testing with either ultrasound morphology with Doppler velocimetry or CA-125 alone in a hypothetical cohort of 100,000 postmenopausal women.

Table 28

Predicted outcomes of serial testing or parallel testing with ultrasound plus Doppler or CA-125 testing to determine surgical management in a hypothetical cohort of 100,000 postmenopausal women*
Underlying pathology Prevalence of malignancy among positive testsProportion of all tests positiveMissed cancers
CancerBenign massNormalTotal
Baseline cases1001,00098,900100,0000.1%
Pelvic exam
Positive454509,89010,385
Negative5555089,01089,6150.4%10.4%55
STRATEGY: CA-125, followed by morphology/Doppler
CA-125
Positive36591,2861,380
Negative93928,6049,0052.6%13.2%9
Morphology/Doppler
Positive325037
Negative4531,2861,34386.5%2.7%4
Surgery
Positive320032
Negative0505
STRATEGY: Morphology/Doppler followed by CA-125
Morphology/Doppler
Positive4041081
Negative54109,89010,30549.4%0.8%5
CA-125
Positive325037
Negative83504386.5%45.7%8
Surgery
Positive3232
Negative050586.5%
STRATEGY: RMI (morphology + CA-125 + menopausal status)
RMI
Positive3341074
Negative124109,89010,31244.6%13.2%9
Surgery
Positive330033
Negative04104144.6%
*

Some numbers may not add up correctly because of rounding.Abbreviations: CA-125 = cancer antigen 125; RMI = Risk of Malignancy Index

Table 28 shows the predicted outcomes of the serial and parallel testing strategies.

Table 29

Estimated numbers of tests, missed cancers, and surgeries for each strategy
Strategies
Single tests Serial tests Parallel tests
CA-125Ultrasound*CA-125 followed by ultrasoundUltrasound followed by CA-125Risk of Malignancy Index
Total tests10,38510,38511,76510,46620,770
Total missed cancers9913139
Total surgeries1,38080373774

Abbreviation: CA-125 = cancer antigen 125

Table 29 summarizes the outcomes of the five strategies in terms of total number of tests, total number of missed cancers, and total number of surgeries.

Table 30

Estimated numbers of tests, missed cancers, and surgeries for each strategy in 1,100 women with known adnexal mass and underlying prevalence of ovarian cancer 10%
Strategies
Single tests Serial tests Parallel tests
CA-125Ultrasound*CA-125 followed by ultrasoundUltrasound followed by CA-125Risk of Malignancy Index
Total tests1,1001,1001,3171,2872,200
Total missed cancers2015323226
Total surgeries1971849090155

Abbreviation: CA-125 = cancer antigen 125

Table 30 illustrates the effect of increasing the prevalence of cancer (for example, in symptomatic women with a known mass) from 0.1 percent to 10 percent. The size of the cohort here is 1,100 women with masses (the same as in the screening cohort).

This simple “model” illustrates several key points:

  • The prevalence of malignancy increases as additional diagnostic tests are performed. This is certainly clinically appropriate and reflects the effects of sequential testing strategies. However, specificity and, to some extent, sensitivity for many of the tests reviewed appear to vary with underlying disease prevalence. Thus, estimates for test characteristics calculated at one point in the clinical pathway may not be appropriate for other points.

  • Despite a poor sensitivity of 45 percent, the negative predictive value of a negative pelvic examination for malignancy is quite high (99.94 percent). The reassurance provided by a “normal” exam reflects the epidemiology of the underlying disease, rather than the intrinsic value of the test in discriminating benign from malignant. This reflects the low prevalence of ovarian cancer in the population. Conversely, the positive predictive value is only 0.4 percent, despite a specificity of 92 percent.

  • In order to judge the trade-offs between detection of masses that ultimately prove malignant compared with the risks of diagnostic surgery, we would need better estimates of morbidity and mortality within different diagnostic categories - as noted previously, these do not exist.

  • The most “efficient” strategy in terms of number of tests and surgeries is serial testing with ultrasound followed by CA-125; however, this results in four missed cancers compared with parallel testing using the RMI. However, parallel testing doubles the number of tests to be performed. A formal cost-effectiveness analysis requires significantly more data on test characteristics and ovarian cancer natural history, as well as the morbidity of surgical management.

  • Modeling parallel testing beyond the data in scoring systems is difficult. Besides requiring specific assumptions about how results that were positive for one test but negative for another would be managed, one would also need to know if the sensitivity and specificity of each test were independent or correlated in some way. For example, it seems likely that the sensitivity of both ultrasound and CA-125 would be greater for larger masses than for smaller masses.

  • In scenarios where the likelihood of ovarian cancer is higher, the negative predictive value of any diagnostic strategy will decrease (more missed cancers), and the positive predictive value will increase (the proportion of surgical cases where cancer is found will be higher). This is seen clearly by comparing Tables 29 and 30. The number of women with adnexal masses is the same, but the number of missed cancers is substantially higher with each strategy.

  • In addition, for any screening modality, there needs to be evidence that early detection reduces disease-specific morbidity and mortality. In addition, in order to judge the impact of false negative results, data on the natural history of ovarian cancer are also needed. Since data from large trials are still pending, one way to examine the potential impact of different testing strategies for both initial screening and subsequent testing is through the development of simulation models.

    We next review published models of the natural history of ovarian cancer.

Models of Ovarian Cancer: Literature Review

Four articles were identified from the literature review that used modeling to determine the effectiveness and cost-effectiveness of different screening strategies for the detection and treatment of ovarian cancer. These are described in Evidence Table 7 (Appendix D *). Studies were included if they were directly relevant to Question 7,242244 or provided natural history information that could be used in the construction of a model.245

Table 31

Key input parameters and ranges for the Schapira model242
ParameterValueRangeSource
Prevalence of ovarian cancer28.6/100,00020 to 200/100,000NCI monograph No. 41; 1975
Percentage of prevalent cases in early stage50%20 to 80%Assumed
Percentage of early stage disease diagnosed clinically25%20 to 80%ACS Cancer Statistics 1990
Sensitivity of CA-125 and TVUS (combined) for early stage disease45%20 to 80%Literature review
Sensitivity of CA-125 and TVUS (combined) for late stage disease81%50 to 100%Literature review
Specificity of CA-125 and TVUS99.95%96 to 100%Literature review
Probability of post-laparotomy death0.23%0 to 10%National Halothane Study JAMA 1966

Abbreviations: ACS = American Cancer Society; NCI = National Cancer Institute; TVUS = transvaginal ultrasound

Schapira et al.242 conducted a decision analysis comparing a one-time screen using transvaginal sonography and CA-125 either alone or in combination to determine life-expectancy gains in a cohort of 40-year-old women in the United States. In the model women could either be screened or unscreened. Probabilities were derived from the literature for the following: prevalence of disease in 40-year-old women, percentage of early stage disease, clinical detection of disease, sensitivity of the screening test for detection of early stage disease, specificity of the screening test, and the mortality rate associated with diagnostic laparotomy. Life expectancy was calculated for women who had no disease, early stage disease, and late stage disease. Table 31 summarizes key input parameters and ranges.

Assumptions in the model were that survival time for clinically and screen-detected early stage disease is the same; morbidity and mortality rates associated with diagnostic laparotomy are the same for people with and without the disease; and there is no benefit gained from identifying benign disease.

The results of the analysis suggested that use of the combined strategy would result in a gain in life expectancy (compared to no screening) of one third of a day of life. No screening was preferred if the postoperative mortality rate exceeded 7.32 percent or the specificity of the test was less than 98.53 percent. An additional analysis, examining the use of testing for women aged 65+ suggested that the combined strategy would result in an average gain in life expectancy of approximately 3/4 of a day of life.

Skates and Singer244 developed a stochastic model to evaluate screening with CA-125. Key assumptions in this model included:

  • Stepwise progression from Stage I through Stage II through Stage III through Stage IV;

  • Log-normal distributions of progression rates;

  • Stage at clinical detection independent of duration of disease;

  • The coefficient of variation in stage length is constant across all stages;

  • Estimates for the duration of each stage were provided by two gynecologic oncologists.

In the base case, the model predicted that screening would save 3.4 years of life per detected case; of note, estimates for the gains in life expectancy for the entire population undergoing screening were not provided.

Table 32

Key assumptions and data sources used to derive values for parameters in the Urban model 243
ParameterEstimateSource
Stage of ovarian cancerFIGO
Relative stage lengths (relative to Stage 1)0.5, 1.333, 0.333Skates et al.244 FIGO stages III and IV assumed to comprise SEER stage 3
Geometric mean stage length in months9; 4.5, 12 and 3 months
Probability of disease during testing period0.0121Not stated
Probability of age at clinical detectionAge 50–54 - 0.153SEER
Age 55–59 - 0. 184
Age 60–64 - 0.202
Age 65–69 - 0.179
Age 70–74 - 0.150
Age 75–80 - 0.132
Probability of stage at clinical detectionStage 1 - 0.223SEER
Stage 2 - 0.153
Stage ¾ - 0.624
Point in stage at clinical detection0.5 of stage lengthAssumed
Stage length distributionLog normal (9, 4.5)Assumed
TVUS sensitivity100%van Nagell, CA 1990
van Nagell, CA 1991
TVUS - false positive1st screen 0.019;Campbell, Br J Obstet and Gynecol 1990
2nd screen 0.010;
3rdscreen 0.006
CA-125 level in casesRefer to page 254 of article for formulaSkates et al.244 Einhorn, Proc Am Soc Clin Oncol 1990
% of false negatives for CA-1255%Assumption
CA-125 specificity in women with false positive TVUS0.85Bast, Gyn Onc 1985
Woolas, JNCI, 1993
Return to normal life-expectancy post-diagnosis15 yearsAssumption
Probability of death in surgery among false-positive0.001Assumption

Abbreviations: FIGO = International Federation of Gynecology and Obstetrics; SEER = Surveillance, Epidemiology, and End Results; TVUS = transvaginal ultrasound

Urban et al.243 examined the cost-effectiveness of screening using CA-125 and TVUS alone or in combination in a cohort of 1 million 50-year-old women using a stochastic simulation model, building on the model of Skates and Singer (Table 32). Screening and case ascertainment was assumed to occur over a 3-year period; women were assumed to be followed until age 80 or death.

Six screening strategies using TVUS and CA-125 either alone or in combination: annual TVUS; annual CA-125, elevated (35U/ml used for referral to laparoscopy); annual CA-125, rising or elevated (rising defined as CA-125 level that has doubled since last screen); annual TVUS conditional on rising or elevated CA-125; 6-month TVUS condition on rising or elevated CA-125; 2-year TVUS conditional on rising or elevated CA-125. Of these, the strategy of annual TVUS conditional on rising or elevated CA-125 was identified as efficient, meaning it saved an equivalent if not higher amount of life at lower costs compared to other strategies. The model was especially sensitive to assumptions about the duration of Stage I disease.

Discussion

Secondary prevention of cancer mortality through screening has been remarkably effective in the case of cervical cancer. Mammography has also reduced mortality from breast cancer, although there remains some controversy. To date, although survival in early stage ovarian cancer is considerably higher than survival in later stage cancers, trials of screening have not yet demonstrated reduction in disease-specific mortality. Although the relative lack of effectiveness of ovarian cancer screening to date may reflect the lack of an appropriate test, differences in the biology and natural history of the different cancers may also result in some of the differences.

As outlined in a recent review,246 the most critical criteria for an effective screening strategy for ovarian cancer is that there is a time of sufficient duration during the development of ovarian cancer when cancer is detectable but in a stage when treatment effectiveness is high. As shown in the two most sophisticated models reviewed, estimates of the effectiveness of screening are highly dependent on assumptions about the duration of Stage I cancer. The basis for the estimates used in both models was the opinion of two clinicians; the methods used to derive these estimates were not described.

Cervical cancer is, in the majority of cases, a squamous carcinoma, which spreads primarily through direct extension and secondarily through lymphatic invasion. The most common type of ovarian cancer, on the other hand, is typically an adenocarcinoma, which spreads by dissemination of tumor cells throughout the peritoneal cavity.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig24130.jpg.

   Figure 24. Schematic of Markov or stochastic model of ovarian cancer natural history

One assumption commonly made in the models of ovarian cancer we identified is that ovarian cancer staging represents the natural history. Figure 24 illustrates a simplified schematic model used in all three of the reviewed papers. Patients can develop ovarian cancer, die of other causes, or remain healthy. Those who develop ovarian cancer can present with symptoms or through testing to become an incident case, or remain undetected, and can either remain within the same stage or progress to the next.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig25130.jpg.

   Figure 25. Alternative model of ovarian cancer history

Although this stepwise progression through stages is the case for cervical cancer, there is no evidence to suggest that tumors limited to the ovary (Stage I) must necessarily spread first to adjacent pelvic organs (Stage II) prior to spread throughout the peritoneal cavity (Stage III). Although staging systems represent the extent of disease, they are developed to help with prognosis, and to allow comparison of treatment effectiveness - there is no explicit assumption that each stage necessarily must be preceded by the next lowest one. Figure 25 depicts an alternative model, which allows some Stage I cancers to progress directly to Stage III:

Using the Markov model described in Chapter 2, we performed sensitivity analyses on progression rates and type of progression to determine if this second “model” of progression could result in similar stage distributions to observed data.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is fig26130.jpg.

   Figure 26. Model predictions of ovarian cancer incidence (black triangles) compared to SEER incidence rates (closed circles)

Figure 26 compares the predicted incidence of ovarian cancer derived from the model with incidence rates reported in the Surveillance, Epidemiology, and End Results (SEER) data set, under the assumption that there was a stepwise progression from Stage I through Stage IV:

Table 33

Inputs and outputs of ovarian cancer models
Model 1 (Stage 1 must progress through Stage II)Model 2 (some Stage I can progress directly through Stage III)Stage distribution: FIGO (local data from Skates et al.244)Stage distribution: SEER (1995-2001)
Parameter estimate
Annual probability of presenting with symptoms: Stage I0.0950.1
Annual probability of presenting with symptoms: Stage II0.0950.15
Annual probability of presenting with symptoms: Stage III0.70.9
Annual probability of presenting with symptoms: Stage IV11
Proportion of Stage I progressing directly to Stage III00.25
Model output: stage distribution
FIGO:
Stage I19.1%19.6%25%
Stage II8.2%9.3%8%
Stage III54.2%65.2%52%
Stage IV18.6%5.9%15%
SEER/WHO:
Local19.1%19.6%25%19%
Regional8.2%9.3%8%7%
Distant and unstaged72.8%71.1%67%75%

Abbreviations: FIGO = International Federation of Gynecology and Obstetrics; SEER = Surveillance, Epidemiology, and End Results; WHO = World Health Organization

We then allowed a proportion of Stage I cancers to proceed directly to Stage III and calibrated underlying progression rates. Table 33 compares the model input parameters and resulting stage distribution of the two models.

With relatively small changes in the probability of presenting with symptoms, a model that allows 25 percent of Stage I tumors to progress directly to Stage III results in stage distributions similar to observed data, and results in similar lifetime risk of ovarian cancer as the Urban model,243 In a model with multiple input parameters, a huge number of combinations of parameters can result in similar outputs. Given that estimations of the duration of the different stages of ovarian cancer are based on little empirical data, and that there is no empirical data on the natural history of ovarian cancer, further exploration of the implications for screening, and the evaluation of masses detected through screening, is warranted.

Summary

The evidence is insufficient to develop a comprehensive model to estimate the relative benefits and risks of different management strategies for evaluating the adnexal mass.

Based on summary estimates of pooled sensitivity and specificity, management strategies that use imaging as the first step for evaluating an adnexal mass detected on examination (as opposed to CA-125) are more efficient, since they exclude false positive results from further examination. Serial testing with imaging followed by CA-125 results in the fewest number of surgeries, but misses more cancers than parallel testing. Parallel testing greatly increases the number of tests required, but results in fewer missed cancers. Additional data are needed to evaluate cost-effectiveness.

Alternative assumptions about the natural history of ovarian cancer can result in modeled outcomes similar to those of published models; the implications of these assumptions should be explored further.

Chapter 4. Discussion

Limitations of the Report

There are several limitations to this evidence report:

  • We did not review articles published in languages other than English because of a lack of resources for translation. It is possible that this led to the failure to include some relevant articles.

  • For our review of prevalence studies (Question 1), we excluded studies performed outside the United States. Because the report was requested by the Centers for Disease Control and Prevention (CDC) to help with development of their policies and research agenda into ovarian cancer prevention strategies, we focused on U.S. populations and reasoned that the underlying prevalence of different conditions in women with adnexal masses could well differ in potentially important ways due to differences in racial/ethnic distribution and/or environmental exposures. As discussed in Chapter 2, this is supported by wide international variation in the incidence of cancer. Variations in screening, diagnosis, and surgical management could also lead to differences in the prevalence of various conditions among women with adnexal masses. It is possible that this reasoning was incorrect, and that some relevant articles were excluded. However, some non-U.S.-based articles were reviewed for other questions, and the majority shared the same biases as U.S.-based studies (i.e., most were done immediately preoperatively).

  • There was considerable heterogeneity in design and patient populations among studies, and our use of a random-effects model to perform meta-analyses for some questions may have led to inaccurate estimates of pooled sensitivity and specificity. We also did not weight the results by anything other than sample size; it is possible that different results might have been obtained by weighting for study quality, for example.

  • In our review of data from the Nationwide Inpatient Sample, we used only specific International Classification of Diseases, Ninth Revision (ICD-9) “E” class codes to identify complications. A more exhaustive strategy (e.g., identifying procedures not typically performed at the time of diagnostic surgery, identifying blood transfusions through procedure or charge codes, including patients with cancer who underwent hysterectomy) might have revealed more complications,26 but would have required additional assumptions about the original indication for the surgery and the likely potential contribution of different aspects of the procedure to the complication (e.g., hysterectomy vs. oophorectomy).

  • Our exploration of alternative models for the natural history of ovarian cancer did not directly compare estimated outcomes of screening strategies to other models. However, a comprehensive evaluation of screening for ovarian cancer was beyond the scope of this report. We are currently developing the model further to conduct these analyses.

Methodological Issues in the Literature

Description of the Patient Population

The main shortcoming of many of the papers reviewed was a failure to adequately describe the patient population, including the manner in which the adnexal mass was originally detected and subsequent evaluation. In Chapter 1, we described the importance of understanding the clinical presentation of the subjects in studies of management of adnexal masses. Because prevalence directly affects predictive values and may indirectly affect estimates of sensitivity and specificity, the probability that a patient is a true or false positive, or true or false negative, is dependent on the prevalence. In addition, the presence or absence of symptoms can affect the probability that a patient will undergo surgery if test findings indicate a benign mass, since surgery may still be the treatment of choice for the underlying condition. We were disappointed that the overwhelming majority of the studies we reviewed, relevant to all of the questions, did not adequately describe their population, so that the proportions of patients who presented with asymptomatic masses versus those with symptoms could be compared.

To be fair, there is an inherent feasibility issue in studies of diagnostic test accuracy for ovarian cancer - the ideal reference standard is histological confirmation, yet this confirmation requires surgery. Although this is a limitation of all cancer screening tests, the surgery required for a definitive diagnosis of ovarian cancer is more extensive than that for many cancers (for example, cervical, breast, and colon cancer can all be diagnosed without a requirement for general anesthesia). Especially with screening, or early in the diagnostic evaluation, the risks of surgery may be difficult to justify (especially since the low prevalence of malignancy makes the positive predictive value of tests early in the diagnostic evaluation quite low). From a research ethics perspective, it is certainly reasonable to limit diagnostic test studies to patients already scheduled for surgery. However, readers of these studies should recognize that the prevalence of malignancy will be substantially higher in preoperative patients than in patients at the time of the initial diagnosis of adnexal mass. Because test performance may be affected by prevalence, the outcomes (in terms of true and false test results) may be quite different in these two patient populations.

The same caveats hold for studies of the outcomes of surgery. Morbidity and mortality related to surgical diagnosis are influenced by the underlying diagnosis, as well as the extent of the disease (such as size of the mass, presence of adhesions from the disease process or prior unrelated surgery, or cancer stage). Interpreting surgical outcomes from studies that do not provide relevant clinical information is difficult; at the least, generalizablity is a major concern. Lack of relevant clinical information is a particular problem with administrative databases, which otherwise have the attraction of large sample size and better generalizability.26

An even more basic shortcoming was the failure to describe potential differences in study results stratified by age or menopausal status. Given the clear and widely recognized relationship between age and ovarian cancer risk, all studies in this area should present results in a way that allows separate estimation of outcome by age/menopausal status.

Sample Size

Few of the studies we reviewed included a priori sample size calculations. Use of confidence intervals for parameter estimates was uncommon. In studies of scoring systems, there were often too few cases of cancer for the number of variables included in the original models.

Blinding

Relatively few of the diagnostic studies reported whether those interpreting test results were blinded to either clinical presentation or ultimate diagnosis. This could clearly have an impact, particularly in studies of the bimanual pelvic examination; the finding that specificity decreased as prevalence increased suggests that the threshold for identifying a mass as cancer is lower if the clinical suspicion - based on other factors such as patient age, menopausal status, or history - is higher. Although this may be appropriate clinically, it results in biased estimates of test performance.

Observer Variability

Few studies addressed the potential impact of observer variability on the precision of test characteristics.

Natural History of Ovarian Cancer

As discussed in more detail in the section on Question 7, ovarian cancer has been implicitly assumed to progress through a series of stages in a way analogous to cervical cancer. Alternative models are biologically plausible, and mathematical models can be “fitted” to match reported data under a variety of scenarios. Since existing models already show that the effectiveness of screening is dependent on assumptions about the length of Stage I, further exploration of the impact of varying assumptions about natural history is warranted.

The most important parameter in these models, stage duration, is inherently unknowable; however, the source for the parameter estimate in the two most sophisticated models were “personal communications” with two gynecologic oncologists. At the least, more formal methods of eliciting expert opinion are probably warranted for future modeling studies.

Implications of Findings

Question 1

The prevalence of malignancy, even in postmenopausal women, is low - approximately 0.1 percent (1 in 10,000) in large screening studies in the United States. The potential for screening to reduce morbidity and mortality is currently being tested in at least three large trials; these trials should also provide valuable data on disease prevalence and the effectiveness of various followup strategies.

Question 2

Until the results of the large screening trials are available, many, if not most, women with asymptomatic adnexal masses will have had the mass detected as part of a routine health maintenance examination.

The bimanual pelvic examination appears to have a sensitivity of less than 60 percent, whether for detecting adnexal masses in general or for distinguishing benign from malignant masses. Based on the best pooled estimate of sensitivity (45 percent) and a prevalence of 0.1 percent, a normal risk, asymptomatic, postmenopausal woman with a normal pelvic examination has a 99.94 percent chance of not having cancer, even though over half of the cancers would be missed. This is due to the low prevalence of ovarian cancer, since, even without the test, her probability of not having cancer is 99.99 percent. Given these test characteristics, the value of the pelvic examination in reducing ovarian cancer morbidity and mortality appears to be extremely limited, at best. Although there may be some rationales for an annual bimanual examination (discussed in Chapter 5), ovarian cancer screening is not one of them.

Question 3

Of the various diagnostic imaging modalities, either a combination of ultrasound morphology and Doppler velocimetry, or magnetic resonance imaging (MRI), had the best combination of sensitivity and specificity for distinguishing benign from malignant disease. If confirmed by direct comparison, cost-effectiveness might be the most important determinant of which would be the optimal diagnostic procedure. Because the specificity of cancer antigen 125 (CA-125) is high in postmenopausal women, it is helpful in ruling in disease.

Question 4

Additional validation of scoring systems in new populations is required before widespread adaptation can be recommended.

Question 5

The most effective and efficient method for following patients who have been classified as having a benign mass is unclear, although unilocular cysts less than 10 cm appear to have a very low risk of malignancy.

Question 6

The risks of diagnostic laparoscopy or laparotomy, particularly in asymptomatic women who ultimately prove to have a benign lesion, are unclear. Overall morbidity appears to be low in reported series, but these are subject to numerous biases, particularly regarding selection for laparoscopy. Two small randomized trials suggest higher short-term morbidity with laparotomy compared to laparoscopy, but differences between the two groups raise the possibility of confounding.

Question 7

Based on our pooled estimates of sensitivity and specificity, serial testing of postmenopausal women with an adnexal mass detected by pelvic examination with either ultrasound morphology plus Doppler imaging, or MRI (which had similar sensitivities and specificities), followed by CA-125, resulted in the most efficient combination of number of tests, missed cancers, and surgeries. Parallel testing and using a scoring system such as the Risk of Malignancy Index resulted in fewer missed cancers than serial testing, but more overall tests and more surgeries. Additional data are needed to refine these estimates, to include the morbidities of the tests and surgeries, and to perform cost-effectiveness analyses. Either combined strategy is preferable to using imaging alone or CA-125 alone.

We cannot directly compare these results to the joint guidelines of the Society of Gynecologic Oncologists (SGO) and American College of Obstetricians and Gynecologists (ACOG) on which patient to refer to a gynecologic oncologist247 because the data were not available to replicate their findings. However, our results are consistent with the guidelines, which recommend a CA-125 level above 35 for postmenopausal women, the presence of ascites, or evidence of adnexal or distant metastasis.

Alternative assumptions and parameter estimates can be used to generate predicted cancer incidences similar to those seen in published models of the natural history of ovarian cancer. In order to better estimate the potential impact of different strategies for ovarian cancer screening, and for managing masses detected through screening or presenting with symptoms, additional models that explore the implications for alternative natural history assumptions are needed. Data from ongoing screening trials may provide estimates of many of the currently unknown parameters.

Chapter 5. Future Research

This section outlines research priorities identified through the review, both in terms of fundamental gaps in knowledge and in addressing methodological issues of existing studies.

Minimal Data Reporting

Our ability to stratify results by relevant patient characteristics, or to compare the potential effect of patient characteristics on different results from different studies, was limited by the lack of information in most studies. We would suggest that future studies relevant to the diagnosis and management of adnexal masses provide data on, and present results stratified by, the following minimum characteristics:

  • Patient age and/or menopausal status

  • Patient body mass index

  • Patient race and ethnicity

  • Presence or absence of risk factors for ovarian cancer, particularly family history

  • Means by which the adnexal mass was initially diagnosed—pelvic examination or imaging

  • Reason for the initial examination which led to diagnosis of mass: symptoms referable to pelvic mass or ovarian cancer, examination for other symptoms, asymptomatic screening for ovarian cancer, or asymptomatic screening for other conditions

Prevalence of Different Types of Adnexal Masses

  • Large scale screening trials will provide some data on the prevalence of different types of masses.

  • Administrative data from surgical procedures may provide crude estimates, but some important information (like stage and grade of cancer, or histologic subtype) will likely be missing. In addition, relevant clinical data on presence or absence of symptoms and the diagnostic pathway leading to diagnosis will likely be missing. The best resource for obtaining the necessary data would likely be a large health maintenance organization (HMO) or third-party payer, which would allow comparison of inpatient and outpatient records, and followup of patients after diagnosis. Medicare data would provide similar information for women 65 and older.

  • Separate reporting of the prevalence of different types of masses among women with and without symptoms would be helpful for clinical decisionmaking.

Diagnostic Testing

  • Ideally, tests would be evaluated at the stage in the clinical pathway in which they are to be used.

  • Since this means that many women who have a negative test will not undergo the reference standard, careful attention should be paid to development of alternative reference standards, including definitions of appropriate length of followup.

  • More direct comparisons of alternative tests should be performed; existing studies are frequently underpowered to detect clinically meaningful differences, or to establish equivalence. Based on pooled analyses, either magnetic resonance imaging (MRI) or combined ultrasound evaluation of morphology and Doppler velocimetry have attractive sensitivity and specificity. Only two studies, with a total of 200 subjects, have directly compared these modalities in the same patient population.91, 100In both of these studies, MRI was less sensitive but more specific than combined morphology/Doppler. More precise comparative estimates should be obtained.

  • There is a paucity of studies on positron emission tomography (PET) compared to other imaging modalities. Given that the Centers for Medicaid and Medicare Services (CMS) is now reimbursing for PET scans done within the setting of a clinical trial, there is an excellent opportunity for high-quality studies which avoid the deficiencies outlined in this report.

  • Although discriminating between benign and malignant lesions is the highest priority in most clinical situations, estimates of the sensitivity and specificity of various imaging modalities for specific nonmalignant lesions (endometriomas, mature teratomas, etc.) would be helpful for developing comprehensive management strategies, particularly in conjunction with good data on prevalence in premenopausal women. We identified multiple articles relevant to this question during our search, which were excluded because they were not relevant to the main study questions. Although many of the methodological issues identified here would be issues with these studies, a systematic review of this literature would have value.

  • New tumor markers should continue to undergo evaluation as diagnostic tests as they are identified, using appropriate methodological standards.

Scoring Systems

  • Validation studies in new populations are needed.

  • Attention should be paid to adequate sample size.

Followup Studies

  • Additional studies, with clear definitions for “benign” lesions and clear protocols for followup, with documentation of loss to followup, are needed. Because by definition these types of studies will not have histological confirmation of all test results, estimates of test performance from such studies may have some bias.

Adverse Outcomes of Surgery

  • As with studies of prevalence, both currently published studies (mostly case series) and administrative data have significant deficiencies. Case series would be improved by clearer description of the clinical pathway by which patients ended up undergoing surgery, as well as by providing relevant clinical data (such as body mass index, history of prior surgeries, and extent of disease).

  • Data on outcomes from a variety of settings, including community settings, are needed.

  • Again, as with studies of prevalence, data from sources able to provide both inpatient and outpatient data over time, such as HMOs, third-party payers, and Medicare, are likely to provide the best combination of sample size, generalizability, and clinical detail.

Sensitivity and Specificity of the Pelvic Examination

  • The annual bimanual pelvic examination appears to have little, if any, benefit for reducing ovarian cancer morbidity and mortality in asymptomatic women. Given that many organizations now recommend less frequent cervical cancer screening in many women, that no screening test has ever been shown to reduce morbidity and mortality from endometrial cancer, and that other gynecological cancers are too rare to justify population-based screening, it would appear that annual bimanual pelvic exams do not have a substantial benefit in reducing mortality. Therefore, evidence on the benefits of the exam would be helpful for patients, clinicians, and policymakers. Possible research areas include:

    • Many clinicians argue that the annual exam provides a “cue” for women to interact with a clinician and receive other preventive services.

      • Would women be less likely to see a health professional on a regular basis if they would not get a pelvic examination?

      • If the exam does provide a “cue” for some women, what is its effectiveness and cost-effectiveness compared to alternative methods of improving adherence to periodic health maintenance schedules?

      • Are there some women who do not regularly see a health professional because of embarrassment/fear/discomfort regarding a pelvic exam who would be more likely to see one if they could be assured they would not get an exam?

    • Others have argued that, after long experience, women expect to receive a pelvic examination (and Pap test) on an annual basis and will continue to demand the examination, despite evidence that the test has little benefit, or does not need to be performed on an annual basis.

      • How have patients reacted to other changes or paradigm shifts in medicine? Can patient expectations be changed in the face of new evidence? Do patient responses differ between changes in which one intervention is replaced by another, versus changes in which an intervention is no longer performed at all?

    • Although the pelvic examination does not appear to have significant benefit as a screening test, does it have more value as a diagnostic test?

    • Assuming the pelvic examination does have value as a diagnostic test, is there a relationship between volume/experience and test accuracy, as suggested by two of the studies we reviewed? If so, can routine examinations in asymptomatic women be justified as a method for maintaining exam skills?

    • If there is a relationship between volume and accuracy, what are the implications for the performance of diagnostic bimanual examinations by generalists (e.g., internists, pediatricians, family practitioners, generalist nurse practitioners) versus specialists (e.g., obstetrician/gynecologists, nurse-midwives, etc)

Modeling the Outcomes of Different Screening Strategies

  • Our modeling of the likely outcomes of different screening strategies was limited by the quantity and quality of data available for key parameters. Because this limited direct comparison of different testing strategies, we were not able to do a comprehensive comparison. The lack of data on patient characteristics, particularly symptom status, also prevented extensive analysis of the effects of different strategies in different clinical scenarios. Improving the evidence base for the other questions considered in the evidence report will make a substantial improvement in the ability to meaningfully model outcomes.

  • Data on relevant patient preferences for different outcomes are needed.

  • Data on relevant cost parameters are needed for cost-effectiveness analysis.

  • Data on relative test reproducibility can help determine the effect of observer variability on effectiveness and cost-effectiveness.

Modeling the Natural History of Ovarian Cancer

  • We identified only three models, one of which was an updated version of another. Having several groups working on simulation modeling, using different assumptions, software, model structure, etc., has proven quite helpful in the case of cervical cancer. Additional work should be strongly encouraged.

  • In particular, models should explore alternative disease natural history parameters, and the implications for various strategies, including screening and primary prevention.

Chapter 6. Conclusions

Developing an effective and efficient algorithm for the evaluation of any condition requires good evidence on the prevalence of the condition at the first diagnostic encounter, and the sensitivity and specificity of the potential diagnostic tests to be used. With this information, one can estimate the outcomes, in terms of true and false positive and negative results, of each test. Various combinations of tests can be compared, and, ideally, the consequences of each test's results in terms of benefits, harms, and costs can be estimated.

In the setting of an adnexal mass, the primary issue is discriminating benign from malignant masses; ideally, all women with an underlying ovarian malignancy would receive appropriate surgical management (perfect sensitivity), and no woman with an asymptomatic benign mass would undergo unnecessary surgery (perfect specificity). The optimal strategy may well differ based on whether or not the patient presents with symptoms, both because the prevalence of disease is likely to be higher in the patient with symptoms (making the positive predictive value higher and the negative predictive value lower), and because surgical management may ultimately be appropriate for a symptomatic patient, and some asymptomatic patients, even if the mass is benign. Age and/or menopausal status are also important considerations, primarily because ovarian cancer is rare prior to age 50, but also because some of the risks of surgery may increase with age.

Unfortunately, the overwhelming majority of the literature we reviewed did not provide sufficient detail on these important patient characteristics to allow confident estimation of the outcomes of different diagnostic strategies, so that we are unable to conclude that any of the strategies achieve the aims of maximizing appropriate treatment and minimizing unnecessary surgery. Outside of studies that were explicitly designed to evaluate screening, few articles described whether patients were symptomatic or asymptomatic, or testing done prior to the diagnostic test being evaluated. Surprisingly few studies reported results separately for premenopausal and postmenopausal women. Future studies need to provide this information.

All of the diagnostic tests and scoring systems we evaluated exhibited a trade-off between sensitivity and specificity - studies of a given test that reported higher sensitivity had lower specificity, and vice versa. In pooled analysis, either the combination of ultrasound morphology and Doppler blood flow, or magnetic resonance imaging (MRI), had the best combination of sensitivity and specificity. Simple modeling of series and parallel tests suggests that, in postmenopausal women, imaging using ultrasound morphology and Doppler blood flow, or MRI, followed by CA-125, is both more sensitive (misses fewer cancers) and more specific (avoids more surgery) than either test alone. A strategy in which both tests were performed and used in a scoring system, the Risk Malignancy Index, prevented additional cancers but with twice as many tests and more surgeries. More data on key parameters are needed to determine if, in certain settings, alternative combinations of tests, performed in parallel or series, might have better outcomes or be more efficient.

Studies of surgical management suffered from the same limitations in terms of description of patient characteristics, making estimation of the risks of false positive diagnostic testing impossible. Similarly, administrative data that only includes discharge information do not provide important clinical information.

The bimanual pelvic examination has low sensitivity for both detection of adnexal masses and discriminating benign from malignant masses, raising doubts about its utility as a screening test in asymptomatic women.

Ultimately, evaluation of potential strategies for reducing morbidity and mortality from ovarian cancer may require use of simulation models, a technique that has proven helpful in evaluating prevention strategies for other cancers. Because the natural history of ovarian cancer is relatively unknown, testing of alternative models is critical. Although a few sophisticated models exist, development of additional models would be helpful, especially in the context of evaluating results from ongoing trials of screening. If any of these trials show a benefit from screening, then the need for better evidence on the diagnostic evaluation of adnexal masses will become even more critical.

List of Acronyms/Abbreviations

2DTwo-dimensional
3DThree-dimensional
ACOGAmerican College of Obstetricians and Gynecologists
ACRAmerican College of Radiology
AFPAlpha-fetoprotein
AHRQAgency for Healthcare Research and Quality
AUCArea under the curve
CA-125Cancer antigen 125
CDCCenters for Disease Control and Prevention
CEACarcinoembryonic antigen
CIConfidence interval
CMSCenters for Medicaid and Medicare Services
CTComputed tomography
FDG18-Fluorodeoxyglucose
FIGOInternational Federation of Gynecology and Obstetrics
FNAFine needle aspiration
hCGHuman chorionic gonadotropin
ICD-9International Classification of Diseases, Ninth Revision
LDHLactate dehydrogenase
LMPLow malignant potential
MeSHMedical Subject Heading
MRIMagnetic resonance imaging
NISNationwide Inpatient Sample
NPVNegative predictive value
PETPositron emission tomography
PIPulsatility index
PPVPositive predictive value
RIResistance index
RMIRisk of Malignancy Index
ROCReceiver operating characteristic
SEERSurveillance, Epidemiology, and End Results
SGOSociety of Gynecologic Oncologists
TAG-72Tumor-associated glycoprotein 72
TVUSTransvaginal ultrasound

Appendix A: Exact Search Strings

Search Strategy 1: pelvic exam performance

(developed and run by McCrory and Myers on September 10, 2004)

Database: Ovid MEDLINE(R) <1966 to September Week 1 2004>

Search Strategy:
————————————————————————————————————————

***************************

Search Strategy 2: test performance

Developed and run by McCrory on September 28, 2004

Database: Ovid MEDLINE(R) <1966 to September Week 3 2004>

Search Strategy:
————————————————————————————————————————

***************************

Search Strategy 3: predictive models

(strategy developed and run by McCrory on September 29, 2004)

Database: Ovid MEDLINE(R) <1966 to September Week 3 2004>

Search Strategy:
————————————————————————————————————————

***************************

Appendix B: List of Excluded Studies

All excluded studies listed below were reviewed in their full text version. Following each reference, in italics, is the reason(s) for exclusion and the Question (Q) for which the article was considered. If no Q is indicated, then the article was excluded a priori from the study for the reason given. An article can be considered (and therefore excluded) for more than one question, and all questions for which the article was excluded are identified. Reasons for exclusion signify only the usefulness of the articles for this study and are not intended as criticisms of the articles.

For reference, the questions are:

Question 1: What is the prevalence of various tumor types among women with an adnexal mass, stratified by cancer status (malignant vs. benign), age, menopausal status, and size of tumor?

Question 2: What are the sensitivity, specificity, and reliability of the bimanual examination?

Question 3: Among women with a palpable adnexal mass on exam or a mass identified by ultrasound/imaging, what is the sensitivity/specificity of various evaluation modalities including ultrasound (transvaginal ultrasound, transabdominal ultrasound, color Doppler, 2D vs. 3D ultrasound, CT scan, MRI scan, and CA-125 levels) for diagnosing malignant masses?

Question 4: What is the accuracy of explicit scoring systems which incorporate various combinations of imaging findings, patient risk factors, and/or CA-125 levels for detecting malignancy? Have these scoring systems been applied to a population of women before laparoscopy?

Question 5: Among women with suspected benign lesions on initial investigation, what are the sensitivity and specificity of monitoring with periodic CA-125 and/or interval ultrasound examinations for detecting malignant masses? How does the interval of testing/definition of change affect sensitivity and predictive value?

Question 6: Among women with adnexal masses, what are the morbidity and mortality from diagnostic surgery (laparoscopy or laparotomy)? At what point does the risk of laparoscopy outweigh the risk of detecting malignancy?

Question 7: What are the estimated trade-offs resulting from various strategies for evaluation of the adnexal mass?

List of Excluded Studies

Abu-Rustum NR, Rhee EH, Chi DS. et al. Subcutaneous tumor implantation after laparoscopic procedures in women with malignant disease.[see comment]. Obstet Gynecol. 2004; 103(3): 4807. Exclude no mass. [PubMed]
Adonakis GL, Paraskevaidis E, Tsiga S. et al. A combined approach for the early detection of ovarian cancer in asymptomatic women. Eur J Obstet Gynecol Reprod Biol. 1996; 65(2): 2215. Exclude Q5-wrong pt population. [PubMed]
Alcazar JL, Jurado M. Using a logistic model to predict malignancy of adnexal masses based on menopausal status, ultrasound morphology, and color Doppler findings. Gynecol Oncol. 1998; 69(2): 14650. Exclude Q3-unable to construct 2×2. [PubMed]
Alcazar JL, Jurado M. Prospective evaluation of a logistic model based on sonographic morphologic and color Doppler findings developed to predict adnexal malignancy. J Ultrasound Med. 1999; 18(12): 83742. Exclude Q3-unable to construct 2×2. [PubMed]
Alcazar JL, Laparte C, Jurado M. et al. The role of transvaginal ultrasonography combined with color velocity imaging and pulsed Doppler in the diagnosis of endometrioma. Fertil Steril. 1997; 67(3): 48791. Exclude Q1-sample size. [PubMed]
Alcazar JL, Ruiz-Perez ML, Errasti T. Transvaginal color Doppler sonography in adnexal masses: which parameter performs best? Ultrasound Obstet Gynecol. 1996; 8(2): 1149. Exclude Q3-unable to construct 2×2. [PubMed]
Alexander-Sefre F, Menon U, Jacobs IJ. Ovarian cancer screening. Hosp Med. 2002; 63(4): 2103. Exclude review. [PubMed]
Ali N, Jan H, Van Trappen P. et al. Radioimmunoscintigraphy with Tc-99m-labelled SM3 in differentiating malignant from benign adnexal masses. BJOG. 2003; 110(5): 50814. Exclude Q3-experimental or non-standard test. [PubMed]
Alvarez RD, Kilgore LC, Partridge EE. et al. Staging ovarian cancer diagnosed during laparoscopy: accuracy rather than immediacy. South Med J. 1993; 86(11): 12568. Exclude review. [PubMed]
Alvarez-Sanchez F, Brache V, de Oca VM. et al. Prevalence of enlarged ovarian follicles among users of levonorgestrel subdermal contraceptive implants (Norplant). Am J Obstet Gynecol. 2000; 182(3): 5359. Exclude Q3-no histol. dx. [PubMed]
American College of Obstetricians and Gynecologists. ACOG Committee Opinion: number 280, December 2002. The role of the generalist obstetrician-gynecologist in the early detection of ovarian cancer. Obstet Gynecol. 2002; 100(6): 14136. Exclude review. [PubMed]
Anderiesz C, Quinn MA. Screening for ovarian cancer. Med J Aust. 2003; 178(12): 6556. Exclude review. [PubMed]
Andersen WA, Nichols GE, Avery SR. et al. Cytologic diagnosis of ovarian tumors: factors influencing accuracy in previously undiagnosed cases. Am J Obstet Gynecol. 1995; 173(2): 45763. discussion 463–4. Exclude Q3-wrong test. [PubMed]
Anderson MM, Irwin CE Jr, Snyder DL. Abnormal vaginal bleeding in adolescents. Pediatr Ann. 1986; 15(10): 697701. Exclude Q1-no histol. dx. [PubMed]
Andolf E, Jorgensen C, Astedt B. Ultrasound examination for detection of ovarian carcinoma in risk groups. Obstet Gynecol. 1990; 75(1): 1069. Exclude Q7-not descrip of sim model. [PubMed]
Angeid-Backman E, Coleman BG, Arger PH. et al. Comparison of resistive index versus pulsatility index in assessing the benign etiology of adnexal masses. Clin Imaging. 1998; 22(4): 28491. Exclude no mass. [PubMed]
Aslam N, Tailor A, Lawton F. et al. Prospective evaluation of three different models for the pre-operative diagnosis of ovarian cancer. BJOG. 2000; 107(11): 134753. Exclude Q3-inconsistent data. [PubMed]
Aubel S, Wozney P, Edwards RP. MRI of female uterine and juxta-uterine masses: clinical application in 25 patients. Magn Reson Imaging. 1991; 9(4): 48591. Exclude Q3-sample size. [PubMed]
Bandera CA, Ye B, Mok SC. New technologies for the identification of markers for early detection of ovarian cancer. Curr Opin Obstet Gynecol. 2003; 15(1): 515. Exclude review. [PubMed]
Baron AT, Cora EM, Lafky JM. et al. Soluble epidermal growth factor receptor (sEGFR/sErbB1) as a potential risk, screening, and diagnostic serum biomarker of epithelial ovarian cancer. Cancer Epidemiol Biomarkers Prev. 2003; 12(2): 10313. Exclude no mass. [PubMed]
Bast RC Jr, Feeney M, Lazarus H. et al. Reactivity of a monoclonal antibody with human ovarian carcinoma. J Clin Invest. 1981; 68(5): 13317. Exclude no mass. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Bast RC Jr, Knauf S, Epenetos A. et al. Coordinate elevation of serum markers in ovarian cancer but not in benign disease. Cancer. 1991; 68(8): 175863. [PubMed]
Bast RC Jr, Urban N, Shridhar V. et al. Early detection of ovarian cancer: promise and reality. Cancer Treat Res. 2002; 107: 6197. Exclude review. [PubMed]
Bell R, Petticrew M, Sheldon T. The performance of screening tests for ovarian cancer: results of a systematic review. Br J Obstet Gynaecol. 1998; 105(11): 113647. Exclude no mass. [PubMed]
Benacerraf BR, Finkler NJ, Wojciechowski C. et al. Sonographic accuracy in the diagnosis of ovarian masses. J Reprod Med. 1990; 35(5): 4915. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Berlanda N, Ferrari MM, Mezzopane R. et al. Impact of a multiparameter, ultrasound-based triage on surgical management of adnexal masses. Ultrasound Obstet Gynecol. 2002; 20(2): 1815. Exclude Q6-no M&M data. [PubMed]
Biran G, Golan A, Sagiv R. et al. Conversion of laparoscopy to laparotomy due to adnexal malignancy. Eur J Gynaecol Oncol. 2002; 23(2): 15760. Exclude Q6-no M&M data/Exclude Q4-unable to construct 2×2/ Exclude Q3-unable to construct 2×2. [PubMed]
Blend MJ, Ostrowski GJ. Recent advances in the detection of ovarian cancer: a review.. J Am Osteopath Assoc. 1994; 94(4): 30518. Exclude review. [PubMed]
Bohm-Velez M, Mendelson E, Bree R. et al. Ovarian cancer screening. American College of Radiology. ACR Appropriateness Criteria. Radiology. 2000; 215(Suppl): 86171. Exclude review. [PubMed]
Bohm-Velez M, Mendelson E, Bree R. et al. Suspected adnexal masses. American College of Radiology. ACR Appropriateness Criteria. Radiology. 2000; 215(Suppl): 9318. Exclude review. [PubMed]
Boll D, Geomini PM, Brolmann HA. et al. The pre-operative assessment of the adnexal mass: the accuracy of clinical estimates versus clinical prediction rules. BJOG. 2003; 110(5): 51923. Exclude Q4-partial dupl new data not relevant. [PubMed]
Bossuyt PM, Reitsma JB, Bruns DE. et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Radiol. 2003; 58(8): 57580. Exclude review. [PubMed]
Bourne TH, Campbell S, Reynolds KM. et al. Screening for early familial ovarian cancer with transvaginal ultrasonography and colour blood flow imaging. BMJ. 1993; 306(6884): 10259. Exclude no mass. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Bourne TH, Hampson J, Reynolds K. et al. Screening for early ovarian cancer. Br J Hosp Med. 1992; 48(8): 4549. Exclude review. [PubMed]
Brown DL, Frates MC, Laing FC. et al. Ovarian masses: can benign and malignant lesions be differentiated with color and pulsed Doppler US? Radiology. 1994; 190(1): 3336. Exclude Q3-sample size. [PubMed]
Brown DL, Zou KH, Tempany CM. et al. Primary versus secondary ovarian malignancy: imaging findings of adnexal masses in the Radiology Diagnostic Oncology Group Study. Radiology. 2001; 219(1): 2138. Exclude no mass. [PubMed]
Buist MR, Golding RP, Burger CW. et al. Comparative evaluation of diagnostic methods in ovarian carcinoma with emphasis on CT and MRI. Gynecol Oncol. 1994; 52(2): 1918. Exclude Q6-no M&M data. [PubMed]
Buquet RA, Amato AR, Huang GB. et al. Is preoperative selection of patients with cystic adnexal masses essential for laparoscopic treatment? J Am Assoc Gynecol Laparosc. 1999; 6(4): 47781. Exclude Q6-no M&M data. [PubMed]
Buy JN, Ghossain MA, Mark AS. et al. Focal hyperdense areas in endometriomas: a characteristic finding on CT. AJR Am J Roentgenol. 1992; 159(4): 76971. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Buy JN, Ghossain MA, Sciot C. et al. Epithelial tumors of the ovary: CT findings and corrrelation with US. Radiology. 1991; 178: 81118. Exclude Q3-unable to construct 2×2. [PubMed]
Campbell S, Bhan V, Royston P. et al. Transabdominal ultrasound screening for early ovarian cancer. BMJ. 1989; 299(6712): 13637. Exclude no mass. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Canis M, Bassil S, Wattiez A. et al. Fertility following laparoscopic management of benign adnexal cysts. Hum Reprod. 1992; 7(4): 52931. Exclude Q6-review. [PubMed]
Canis M, Pouly JL, Wattiez A. et al. Laparoscopic management of adnexal masses suspicious at ultrasound. Obstet Gynecol. 1997; 89(5 Pt 1): 67983. Exclude Q6-no M&M data. [PubMed]
Caoili EM, Hertzberg BS, Kliewer MA. et al. Refractory shadowing from pelvic masses on sonography: a useful diagnostic sign for uterine leiomyomas. AJR Am J Roentgenol. 2000; 174(1): 97101. Exclude no mass. [PubMed]
Cappelleri JC, Ioannidis JP, Schmid C. Large trials vs meta-analysis of smaller trials: how do their results compare? JAMA. 1996; 276: 13328. Exclude review. [PubMed]
Carlson KJ, Skates SJ, Singer DE. Screening for ovarian cancer. Ann Intern Med. 1994; 121(2): 12432. Exclude review. [PubMed]
Carter J. An update on ovarian cancer screening. Aust N Z J Obstet Gynaecol. 1994; 34(2): 16974. Exclude no mass. [PubMed]
Carter J, Fowler J, Carson L. et al. How accurate is the pelvic examination as compared to transvaginal sonography? A prospective, comparative study. J Reprod Med. 1994; 39(1): 324. Exclude Q3-distguish malignant versus nonmalignant/ Exclude Q2-unable to construct 2×2. [PubMed]
Carter J, Saltzman A, Hartenbach E. et al. Flow characteristics in benign and malignant gynecologic tumors using transvaginal color flow Doppler. Obstet Gynecol. 1994; 83(1): 12530. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Chadha P, Puri M, Gupta R. A comparative evaluation of clinical examination, pelvic ultrasound and laparoscopy in the diagnosis of pelvic masses. Indian J Med Sci. 1994; 48(7): 15860. Exclude Q2-unable to construct 2×2. [PubMed]
Chalas E, Constantino J, Wickerham L. et al. Benign gynecologic conditions among participants in the breast cancer prevention trial. Am J Obstet and Gynecol. 2005; 192: 12309. Exclude Q1-wrong pt population. [PubMed]
Cherry C, Vacchiano SA. Ovarian cancer screening and prevention. Semin Oncol Nurs. 2002; 18(3): 16773. Exclude no mass. [PubMed]
Childers JM, Aqua KA, Surwit EA. et al. Abdominal-wall tumor implantation after laparoscopy for malignant conditions. Obstet Gynecol. 1994; 84(5): 7659. Exclude no mass. [PubMed]
Close RJ, Sachs CJ, Dyne PL. Reliability of bimanual pelvic examinations performed in emergency departments. West J Med. 2001; 175(4): 2404. Exclude Q2-unable to construct 2×2. [PubMed]
Cohen L, Fishman DA. Ultrasound and ovarian cancer. Cancer Treat Res. 2002; 107: 11932. Exclude review. [PubMed]
Cooper BC, Ritchie JM, Broghammer CL. et al. Preoperative serum vascular endothelial growth factor levels: significance in ovarian cancer. Clin Cancer Res. 2002; 8(10): 31937. Exclude Q3-experimental or non-standard test. [PubMed]
Crade M, Yiu-Chiu V, Kincaid K. Color Doppler and ovarian masses: familiarity breeds confidence. Ultrasound Obstet Gynecol. 1995; 6(5): 3734. Exclude Q3-no histol. dx. [PubMed]
Crawford RA, Gore ME, Shepherd JH. Ovarian cancers related to minimal access surgery.[see comment]. Br J Obstet Gynaecol. 1995; 102(9): 72630. Exclude no mass. [PubMed]
Crayford TJ, Campbell S, Bourne TH. et al. Benign ovarian cysts and ovarian cancer: a cohort study with implications for screening. Lancet. 2000; 355(9209): 10603. Exclude Q5-wrong pt population. [PubMed]
Creasman WT, Soper JT. The undiagnosed adnexal mass after the menopause. Clin Obstet Gynecol. 1986; 29(2): 44652. Exclude Q1-study design not case series or cohort. [PubMed]
Crump C, McIntosh MW, Urban N. et al. Ovarian cancer tumor marker behavior in asymptomatic healthy women: implications for screening. Cancer Epidemiol Biomarkers Prev. 2000; 9(10): 110711. Exclude review. [PubMed]
Crvenkovic G, Karlan BY, Platt LD. Current role of ultrasound in ovarian cancer screening. Clin Obstet Gynecol. 1996; 39(1): 25967. Exclude review. [PubMed]
de Bruijn HW, van der Zee AG, Aalders JG. The value of cancer antigen 125 (CA 125) during treatment and follow-up of patients with ovarian cancer. Curr Opin Obstet Gynecol. 1997; 9(1): 813. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
De Vries SO, Hunink MG, Polak JF. Summary receiver operating characteristic curves as a technique for meta analysis of the diganoistic performance of duplex ultrasonography in peripheral arterial disease. Acad Radiol. 1996; 3: 3619. Exclude review. [PubMed]
Decloedt J, Berteloot P, Vergote I. The feasibility of open laparoscopy in gynecologic-oncologic patients. Gynecol Oncol. 1997; 66(1): 13840. Exclude no mass. [PubMed]
Demirkiran F, Kumbak B, Bese T. et al. Vascular endothelial growth factor in adnexal masses. Int J Gynaecol Obstet. 2003; 83(1): 538. Exclude Q3-experimental or non-standard test. [PubMed]
Dgani R, Shani A, Elchalal U. et al. The leukocyte adherence inhibition test (LAI) in preoperative diagnosis of epithelial ovarian cancer. Gynecol Oncol. 1993; 49(3): 34953. Exclude Q3-wrong test. [PubMed]
Dietrich M, Osmers RG, Grobe G. et al. Limitations of the evaluation of adnexal masses by its macroscopic aspects, cytology and biopsy. Eur J Obstet Gynecol Reprod Biol. 1999; 82(1): 5762. Exclude Q3-wrong test. [PubMed]
Dixon JG, Bognar BA, Keyserling TC. et al. Teaching women's health skills: confidence, attitudes and practice patterns of academic generalist physician. J Gen Intern Med. 2003; 18(6): 4118. Exclude review. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Dogan MM, Ugur M, Soysal SK. et al. Transvaginal sonographic diagnosis of ovarian endometrioma. Int J Gynaecol Obstet. 1966; 52(2): 1459. Exclude no mass. [PubMed]
Domar AD. Psychological aspects of the pelvic exam: individual needs and physician involvement. Women Health. 1985-1986. pp. 75–90. Exclude no mass.
Dordoni D, Zaglio S, Zucca S. et al. The role of sonographically guided aspiration in the clinical management of ovarian cysts. J Ultrasound Med. 1993; 12(1): 2731. Exclude Q3-wrong test. [PubMed]
Dorum A, Blom G, Ekerhovd E. et al. Prevalence and histologic diagnosis of adnexal cysts in postmenopausal women: an autopsy study. Am J Obstet Gynecol. 2005; 192: 4854. Exclude non U.S. [PubMed]
Dueholm M, Lundorf E, Hansen ES. et al. Magnetic resonance imaging and transvaginal ultrasonography for the diagnosis of adenomyosis. Fertil Steril. 2001; 76(3): 58894. Exclude no mass. [PubMed]
Ehlen T. Management of low malignant potential tumour of the ovary: a policy statement. SOGC/GOC/SCC Policy and Practice Guideline Committee 2000;(85). Exclude review.
Einhorn N. Ovarian cancer. Early diagnosis and screening. Hematol Oncol Clin North Am. 1992; 6(4): 84350. Exclude no mass. [PubMed]
Eisen A, Rebbeck TR, Wood WC. et al. Prophylactic surgery in women with a hereditary predisposition to breast and ovarian cancer. J Clin Oncol. 2000; 18(9): 198095. Exclude review. [PubMed]
Elg S, Lee RB, Stones C. et al. Evaluation of serum haptoglobin levels in patients with adnexal masses. Mil Med. 1989; 154(5): 2346. Exclude Q3-experimental or non-standard test. [PubMed]
Elit L. Surgical management of an adnexal mass suspicious for malignancy. SOGC Clinical Practice Guidelines 2000;(97). Exclude review.
Elwood M. Proteomic patterns in serum and identification of ovarian cancer. Lancet. 2002; 360(9327): 1701. Exclude no mass. [PubMed]
Emery J, Yaphe J, Priest P. et al. Screening for ovarian cancer. Lancet. 1999; 354(9177): 50910. Exclude no mass. [PubMed]
Fadare O, Mariappan MR, Wang S. et al. The histologic subtype of ovarian tumors affects the detection rate by pelvic washings. Cancer. 2004; 102(3): 1506. Exclude no mass. [PubMed]
Fayed ST, Ahmad SM, Kassim SK. et al. The value of CA 125 and CA72-4 in management of patients with epithelial ovarian cancer. Dis Markers. 1998; 14(3): 15560. Exclude Q3-no verification test negative. [PubMed]
Fedele L, Bianchi S, Dorta M. et al. Transvaginal ultrasonography versus hysteroscopy in the diagnosis of uterine submucous myomas. Obstet Gynecol. 1991; 77(5): 7458. Exclude no mass. [PubMed]
Fedele L, Bianchi S, Dorta M. et al. Transvaginal ultrasonography in the differential diagnosis of adenomyoma versus leiomyoma. Am J Obstet Gynecol. 1992; 167(3): 6036. Exclude no mass. [PubMed]
Finkler NJ. Clinical utility of CA 125 in preoperative diagnosis of patients with pelvic masses. Eur J Obstet Gynecol Reprod Biol. 1993; 49(12): 1057. Exclude Review. [PubMed]
Finkler NJ, Benacerraf B, Lavin PT. et al. Comparison of serum CA 125, clinical impression, and ultrasound in the preoperative evaluation of ovarian masses. Obstet Gynecol. 1988; 72(4): 65964. Exclude Q1-no histol. dx. [PubMed]
Fishman DA, Cohen L, Blank SV. et al. The role of ultrasound evaluation in the detection of early-stage epithelial ovarian cancer. Am J Obstet Gynecol. 2005; 192(4): 121421. discussion 1221–2. Exclude Q3-unable to construct 2×2. [PubMed]
Fleischer AC, Cullinan JA, Jones HW 3rd. et al. Correlation of histomorphology of ovarian masses with color Doppler sonography. Ultrasound Med Biol. 1996; 22(5): 5559. Exclude Q3-unable to construct 2×2. [PubMed]
Fleischer AC, Cullinan JA, Kepple DM. et al. Conventional and color Doppler transvaginal sonography of pelvic masses: a comparison of relative histologic specificities. J Ultrasound Med. 1993; 12(12): 70512. Exclude Q3-unable to construct 2×2. [PubMed]
Fleischer AC, Jones HW 3rd. Color Doppler sonography of ovarian masses: the importance of a multiparameter approach. Gynecol Oncol. 1993; 50(1): 12. Exclude review. [PubMed]
Fleischer AC, McKee MS, Gordon AN. et al. Transvaginal sonography of postmenopausal ovaries with pathologic correlation. J Ultrasound Med. 1990; 9(11): 63744. Exclude no mass. [PubMed]
Fleischer AC, Rodgers WH, Rao BK. et al. Assessment of ovarian tumor vascularity with transvaginal color Doppler sonography. J Ultrasound Med. 1991; 10(10): 5638. Exclude Q3-publ duplicate. [PubMed]
Flynn MK, Niloff JM. Outpatient minilaparotomy for ovarian cysts. J Reprod Med. 1999; 44(5): 399404. Exclude Q6-no M&M data. [PubMed]
Foxall MJ, Barron CR, Houfek JF. Ethnic influences on body awareness, trait anxiety, perceived risk, and breast and gynecologic cancer screening practices. Oncol Nurs Forum. 2001; 28(4): 72738. Exclude review. [PubMed]
Franchi M, Beretta P, Ghezzi F. et al. Diagnosis of pelvic masses with transabdominal color Doppler, CA 125 and ultrasonography. Acta Obstet Gynecol Scand. 1995; 74(9): 7349. Exclude Q4-unable to construct 2×2. [PubMed]
Frederick JL, Paulson RJ, Sauer MV. Routine use of vaginal ultrasonography in the preoperative evaluation of gynecologic patients. An adjunct to resident education. J Reprod Med. 1991; 36(11): 77982. Exclude no mass. [PubMed]
Frenkel Y, Oelsner G, Ben-Baruch G. et al. Major surgical complications of laparoscopy. Eur J Obstet Gynecol Reprod Biol. 1981; 12(2): 10711. Exclude Other All Premenopausal. [PubMed]
Gadducci A, Baicchi U, Marrai R. et al. Pretreatment plasma levels of fibrinopeptide-A (FPA), D-dimer (DD), and von Willebrand factor (vWF) in patients with ovarian carcinoma. Gynecol Oncol. 1994; 53(3): 3526. Exclude Q3-wrong test. [PubMed]
Gadducci A, Marrai R, Baicchi U. et al. Preoperative D-dimer plasma assay is not a predictor of clinical outcome for patients with advanced ovarian cancer. Gynecol Oncol. 1997; 66(1): 858. Exclude no mass. [PubMed]
Geomini P, Bremer G, Kruitwagen R. et al. Diagnostic accuracy of frozen section diagnosis of the adnexal mass: a metanalysis. [Review] [43 refs] [Journal Article. Meta-Analysis. Review] Gynecol Oncol. 2005; 96(1): 19. Exclude review. [PubMed]
Goff BA, Mandel L, Muntz HG. et al. Ovarian carcinoma diagnosis. Cancer. 2000; 89(10): 206875. Exclude review. [PubMed]
Grab D, Flock F, Stohr I. et al. Classification of asymptomatic adnexal masses by ultrasound, magnetic resonance imaging, and positron emission tomography. Gynecol Oncol. 2000; 77(3): 4549. Exclude Q4-not scoring system. [PubMed]
Granberg S, Wikland M. A comparison between ultrasound and gynecologic examination for detection of enlarged ovaries in a group of women at risk for ovarian carcinoma. J Ultrasound Med. 1988; 7(2): 5964. Exclude no mass. [PubMed]
Gryspeerdt S, Clabout L, Van Hoe L. et al. Intraperitoneal contrast material combined with CT for detection of peritoneal metastases of ovarian cancer. Eur J Gynaecol Oncol. 1998; 19(5): 4347. Exclude no mass. [PubMed]
Guerriero S, Ajossa S, Lai MP. et al. The diagnosis of functional ovarian cysts using transvaginal ultrasound combined with clinical parameters, CA125 determinations, and color Doppler. Eur J Obstet Gynecol Reprod Biol. 2003; 110(1): 838. Exclude other all premenopausal. [PubMed]
Guerriero S, Ajossa S, Lai MP. et al. Transvaginal ultrasonography associated with colour Doppler energy in the diagnosis of hydrosalpinx. Hum Reprod. 2000; 15(7): 156872. Exclude Q3-no cancer outcome. [PubMed]
Guerriero S, Mais V, Ajossa S. et al. The role of endovaginal ultrasound in differentiating endometriomas from other ovarian cysts. Clin Exp Obstet Gynecol. 1995; 22(1): 202. Exclude Other All Premenopausal. [PubMed]
Guerriero S, Mais V, Ajossa S. et al. Transvaginal ultrasonography combined with CA-125 plasma levels in the diagnosis of endometrioma. Fertil Steril. 1996; 65(2): 2938. Exclude other all premenopausal. [PubMed]
Guerriero S, Mallarini G, Ajossa S. et al. Transvaginal ultrasound and computed tomography combined with clinical parameters and CA-125 determinations in the differential diagnosis of persistent ovarian cysts in premenopausal women. Ultrasound Obstet Gynecol. 1997; 9(5): 33943. Exclude Q4-all premenopausal/ Exclude Q3-all premenopausal. [PubMed]
Guidozzi F. Screening for ovarian cancer. Obstet Gynecol Surv. 1996; 51(11): 696701. Exclude no mass. [PubMed]
Hakama M, Stenman UH, Knekt P. et al. CA 125 as a screening test for ovarian cancer. J Med Screen. 1996; 3(1): 402. Exclude no mass. [PubMed]
Hall DJ, Hurt WG. The adnexal mass. J Fam Pract. 1982; 14(1): 13540. Exclude no mass. [PubMed]
Hamm B, Kubik-Huch RA, Fleige B. MR imaging and CT of the female pelvis: radiologic-pathologic correlation. Eur Radiol. 1999; 9(1): 315. Exclude Q3-sample size. [PubMed]
Hamper UM, Sheth S, Abbas FM. et al. Transvaginal color Doppler sonography of adnexal masses: differences in blood flow impedance in benign and malignant lesions. AJR Am J Roentgenol. 1993; 160(6): 12258. Exclude Q3-unable to construct 2×2. [PubMed]
Hartge P, Hayes R, Reding D. et al. Complex ovarian cysts in postmenopausal women are not associated with ovarian cancer risk factors. Am J Obstet and Gynecol. 2000; 183(5): 12327. Exclude Q1-no histol. dx. [PubMed]
Hata K, Hata T, Collins WP. Association of thymidine phosphorylase concentration with ultrasound-derived indices of blood flow in ovarian masses. Cancer. 1997; 80(6): 107984. Exclude Q3-sample size. [PubMed]
Hata K, Miyazaki K, Collins WP. Value of end-points from multiple or worst case Doppler spectra for the assessment of ovarian masses. Ultrasound Obstet Gynecol. 1999; 13(4): 284. Exclude no mass. [PubMed]
Hata K, Nagami H, Iida K. et al. Expression of thymidine phosphorylase in malignant ovarian tumors: correlation with microvessel density and an ultrasound-derived index of angiogenesis. Ultrasound Obstet Gynecol. 1998; 12(3): 2016. Exclude Q3-no histol. dx. [PubMed]
Hefler L, Mayerhofer K, Nardi A. et al. Serum soluble Fas levels in ovarian cancer. Obstet Gynecol. 2000; 96(1): 659. Exclude no mass. [PubMed]
Helzlsouer KJ, Bush TL, Alberg AJ. et al. Prospective study of serum CA-125 levels as markers of ovarian cancer. JAMA. 1993; 269(9): 11236. Exclude no mass. [PubMed]
Hensley ML, Castiel M, Robson ME. Screening for ovarian cancer: what we know, what we need to know. Oncology (Huntingt) 2000;14(11):1601–8, 1613–6. Exclude no mass.
Hricak H, Chen M, Coakley FV. et al. Complex adnexal masses: detection and characterization with MR imaging—multivariate analysis. Radiology. 2000; 214(1): 3946. Exclude Q1-denom is masses. [PubMed]
Hulka JF, Hulka CA. Preoperative sonographic evaluation and laparoscopic management of persistent adnexal masses: a 1994 review.. J Am Assoc Gynecol Laparosc. 1994; 1(3): 197205. Exclude review. [PubMed]
Im S, Gordon A, Buttin B. et al. Validation of referral guidelines for women with pelvic masses. Obstet Gyencol. 2005; 205(1): 3541. Exclude Q4-not scoring system/Exclude Q3-unable to construct 2×2.
Irwig L, Tosteson A, Gatsonis C. Guidelines for meta-analyses evaluating diagnositc tests. Ann Intern Med. 1994; 120: 66776. Exclude review. [PubMed]
Jacobs I. Genetic, biochemical, and multimodal approaches to screening for ovarian cancer. Gynecol Oncol. 1994; 55(3 Pt 2): S227. Exclude no mass. [PubMed]
Jacobs I, Davies AP, Bridges J. et al. Prevalence screening for ovarian cancer in postmenopausal women by CA 125 measurement and ultrasonography. BMJ. 1993; 306(6884): 10304. Exclude no mass. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Jacobs I, Oram D. Screening for ovarian cancer. Biomed Pharmacother. 1988; 42(9): 58996. Exclude review. [PubMed]
Jacobs I, Stabile I, Bridges J. et al. Multimodal approach to screening for ovarian cancer. Lancet. 1988; 1(8580): 26871. Exclude Q4-unable to construct 2×2. [PubMed]
Jacobs IJ, Oram DH, Bast RC Jr. Strategies for improving the specificity of screening for ovarian cancer with tumor-associated antigens CA 125, CA 15-3, and TAG 72.3. Obstet Gynecol. 1992; 80(3 Pt 1): 3969. Exclude no mass. [PubMed]
Jacobs IJ, Rivera H, Oram DH. et al. Differential diagnosis of ovarian cancer with tumour markers CA 125, CA 15-3 and TAG 72.3. Br J Obstet Gynaecol. 1993; 100(12): 11204. Exclude Q4-partial dupl new data not relevant. [PubMed]
Jain KA. Prospective evaluation of adnexal masses with endovaginal gray-scale and duplex and color Doppler US: correlation with pathologic findings. Radiology. 1994; 191(1): 637. Exclude Q1-denom is masses. [PubMed]
Kadar N. Port-site recurrences following laparoscopic operations for gynaecological malignancies. Br J Obstet Gynaecol. 1997; 104(11): 130813. Exclude no mass. [PubMed]
Karlan BY. Screening for ovarian cancer: what are the optimal surrogate endpoints for clinical trials? J Cell Biochem 1995;23(227–32). Exclude review.
Karlan BY, Platt LD. The current status of ultrasound and color Doppler imaging in screening for ovarian cancer. Gynecol Oncol. 1994; 55(3 Pt 2): S2833. Exclude review. [PubMed]
Kerpsack JT, Finan MA. Thrombocytosis as a predictor of malignancy in women with a pelvic mass. J Reprod Med. 2000; 45(11): 92932. Exclude Q3-wrong test. [PubMed]
Kim JH, Skates SJ, Uede T. et al. Osteopontin as a potential diagnostic biomarker for ovarian cancer. JAMA. 2002; 287(13): 16719. Exclude Q3-unable to construct 2×2. [PubMed]
Kinkel K, Hricak H, Lu Y. et al. US characterization of ovarian masses: a meta-analysis. Radiology. 2000; 217(3): 80311. Exclude Q3-review. [PubMed]
Klaren HM, van't Veer LJ, van Leeuwen FE. et al. Potential for bias in studies on efficacy of prophylactic surgery for BRCA1 and BRCA2 mutation. J Natl Cancer Inst. 2003; 95(13): 9417. Exclude review. [PubMed]
Kramer BS, Gohagan J, Prorok PC. et al. A National Cancer Institute sponsored screening trial for prostatic, lung, colorectal, and ovarian cancers. Cancer. 1993; 71(2 Suppl): 58993. Exclude review. [PubMed]
Kruitwagen RF, Swinkels BM, Keyser KG. et al. Incidence and effect on survival of abdominal wall metastases at trocar or puncture sites following laparoscopy or paracentesis in women with ovarian cancer.[see comment]. Gynecol Oncol. 1996; 60(2): 2337. Exclude Q6-sample size. [PubMed]
Kupesic S, Kurjak A. Contrast-enhanced, three-dimensional power Doppler sonography for differentiation of adnexal masses. Obstet Gynecol. 2000; 96(3): 4528. Exclude Q3-sample size. [PubMed]
Kurjak A, Kupesic S, Anic T. et al. Three-dimensional ultrasound and power doppler improve the diagnosis of ovarian lesions. Gynecol Oncol. 2000; 76(1): 2832. Exclude Q3-publ duplicate. [PubMed]
Kurjak A, Kupesic S, Babic MM. et al. Preoperative evaluation of cystic teratoma: what does color Doppler add? J Clin Ultrasound. 1997; 25(6): 30916. Exclude no mass. [PubMed]
Kurjak A, Predanic M. Ovarian cancer screening. Curr Opin Obstet Gynecol. 1994; 6(1): 6774. Exclude review. [PubMed]
Lang F. Resident behaviors during observed pelvic examinations. Fam Med. 1990; 22(2): 1535. Exclude no mass. [PubMed]
Larsen T, Torp-Pedersen ST, Ottesen M. et al. Abdominal ultrasound combined with histological and cytological fine needle biopsy of suspected ovarian tumors. Eur J Obstet Gynecol Reprod Biol. 1993; 50(3): 2039. Exclude Q3-experimental or non-standard test. [PubMed]
Layfield LJ, Heaps JM, Berek JS. Fine-needle aspiration cytology accuracy with palpable gynecologic neoplasms. Gynecol Oncol. 1991; 40(1): 703. Exclude Q3-sample size. [PubMed]
Lee JH, Jeong YK, Park JK. et al. “Ovarian vascular pedicle” sign revealing organ of origin of a pelvic mass lesion on helical CT. AJR Am J Roentgenol. 2003; 181(1): 1317. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Lehner R, Wenzl R, Heinzl H. et al. Influence of delayed staging laparotomy after laparoscopic removal of ovarian masses later found malignant. Obstet Gynecol. 1998; 92(6): 96771. Exclude no mass. [PubMed]
Lerner JP, Timor-Tritsch IE, Federman A. et al. Transvaginal ultrasonographic characterization of ovarian masses with an improved, weighted scoring system. Am J Obstet Gynecol. 1994; 170(1 Pt 1): 815. Exclude Q1-denom is masses. [PubMed]
Levine D, Feldstein VA, Babcook CJ. et al. Sonography of ovarian masses: poor sensitivity of resistive index for identifying malignant lesions. AJR Am J Roentgenol. 1994; 162(6): 13559. Exclude Q3-sample size. [PubMed]
Levine D, Gosink BB, Wolf SI. et al. Simple adnexal cysts: the natural history in postmenopausal women.[see comment]. Radiology. 1992; 184(3): 6539. Exclude Q6-sample size. [PubMed]
Levy G, Levine P, Brennan J. et al. Color flow-directed Doppler studies of ovarian masses. Computer analysis. J Reprod Med. 1998; 43(10): 8658. Exclude Other. [PubMed]
Lieberman G, Buscombe JR, Hilson AJ. et al. Preoperative diagnosis of ovarian carcinoma with a novel monoclonal antibody. Am J Obstet Gynecol. 2000; 183(3): 53440. Exclude Q3-experimental or non-standard test. [PubMed]
Lieberman G, MacLean AB, Buscombe JR. et al. The clinical application of a dual head gamma camera with coincidence detection in 20 women with suspected ovarian cancer. BJOG. 2001; 108(12): 122936. Exclude no mass. [PubMed]
Liede A, Karlan B, Baldwin RL. et al. Cancer Incidence in a population of jewish women at risk of ovarian cancer. J Clin Oncol. 2002; 20(6): 157077. Exclude review. [PubMed]
Lin P, Falcone T, Tulandi T. Excision of ovarian dermoid cyst by laparoscopy and by laparotomy. Am J Obstet Gynecol. 1995; 173(3 Pt 1): 76971. Exclude Q6-sample size. [PubMed]
Lynch HT, Albano WA, Lynch JF. et al. Surveillance and management of patients at high genetic risk for ovarian carcinoma. Obstet Gynecol. 1982; 59(5): 58996. Exclude no mass. [PubMed]
MacDonald ND, Rosenthal AN, Jacobs IJ. Screening for ovarian cancer. Ann Acad Med Singapore. 1998; 27(5): 67682. Exclude review. [PubMed]
Mackey SE, Creasman WT. Ovarian cancer screening. J Clin Oncol. 1995; 13(3): 78393. Exclude review. [PubMed]
Maggino T, Gadducci A. Serum markers as prognostic factors in epithelial ovarian cancer: an overview. Eur J Gynaecol Oncol. 2000; 21(1): 649. Exclude review. [PubMed]
Mais V, Ajossa S, Piras B. et al. Treatment of nonendometriotic benign adnexal cysts: a randomized comparison of laparoscopy and laparotomy. Obstet Gynecol. 1995; 86(5): 7704. Exclude Q6-sample size. [PubMed]
Mais V, Guerriero S, Ajossa S. et al. Transvaginal ultrasonography in the diagnosis of cystic teratoma. Obstet Gynecol. 1995; 85(1): 4852. Exclude Q3-all premenopausal. [PubMed]
Markman M. Limitations to the use of the CA-125 antigen level in ovarian cancer. Curr Oncol Rep. 2003; 5(4): 2634. Exclude review. [PubMed]
Masson V. Bodies of knowledge. Nurs Health Care Perspect. 1997; 18(6): 291. Exclude no mass. [PubMed]
McIntosh MW, Urban N. A parametric empirical Bayes method for cancer screening using longitudinal observations of a biomarker. Biostatistics. 2003; 4(1): 2740. Exclude Q7-not descrip of sim model. [PubMed]
Mendilcioglu I, Zorlu CG, Trak B. et al. Laparoscopic management of adnexal masses. Safety and effectiveness. J Reprod Med. 2002; 47(1): 3640. Exclude Q6-sample size. [PubMed]
Menon U, Jacobs IJ. Ovarian cancer screening in the general population. Curr Opin Obstet Gynecol. 2001; 13(1): 614. Exclude review. [PubMed]
Menon U, Talaat A, Rosenthal AN. et al. Performance of ultrasound as a second line test to serum CA125 in ovarian cancer screening. BJOG. 2000; 107(2): 1659. Exclude Q4-wrong pt population. [PubMed]
Mensah LG. Gynecology in the generalist's office. Ethn Dis 2003;13(3 Suppl 3):S3-50-1.Exclude Q1-no histol. dx.
Merz E, Miric-Tesanic D, Bahlmann F. et al. Sonographic size of uterus and ovaries in pre- and postmenopausal women. Ultrasound Obstet Gynecol. 1996; 7(1): 3842. Exclude no mass. [PubMed]
Mettler L, Semm K, Shive K. Endoscopic management of adnexal masses. J Soc Laparoendosc Surg. 1997; 1(2): 10312. Exclude Q6-no M&M data.
Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making. 1993; 13: 2537. Exclude review. [PubMed]
Milad MP, Cohen L. Preoperative ultrasound assessment of adnexal masses in premenopausal women. Int J Gynaecol Obstet. 1999; 66(2): 13741. Exclude other all premenopausal. [PubMed]
Mills GB, Bast RC Jr, Srivastava S. Future for ovarian cancer screening: novel markers from emerging technologies of transcriptional profiling and proteomics. J Natl Cancer Inst. 2001; 93(19): 14379. Exclude review. [PubMed]
Misawa T, Asai M, Higashide K. How to decrease false-positive cases of ovarian cancer screening by transvaginal sonography. J Exp Clin Cancer Res. 1997; 16(2): 21720. Exclude no mass. [PubMed]
Modesitt SC, Pavlik EJ, Ueland FR. et al. Risk of malignancy in unilocular ovarian cystic tumors less than 10 centimeters in diameter. Obstet Gynecol. 2003; 102(3): 5949. Exclude Q3-no cancer outcome. [PubMed]
Morgan A. Adnexal mass evaluation in the emergency department. Emerg Med Clin North Am. 2001; 19(3): 799816. Exclude review. [PubMed]
Moses LE, Shapiro D. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993; 12: 1293316. Exclude review. [PubMed]
Muto MG, Cramer DW, Brown DL. et al. Screening for ovarian cancer: the preliminary experience of a familial ovarian cancer center. Gynecol Oncol. 1993; 51(1): 1220. Exclude Q3-sample size. [PubMed]
Nagarsheth NP, Rahaman J, Cohen CJ. et al. The incidence of port-site metastases in gynecologic cancers. J Soc Laparoendosc Surg. 2004; 8(2): 1339. Exclude no mass.
Nevin J, Denny L, Soeters R. et al. Ultrasonography of pelvic masses. Br J Obstet Gynaecol. 1998; 105(2): 1379. Exclude review. [PubMed]
Nezhat C, Santolaya J, Nezhat FR. Comparison of transvaginal sonography and bimanual pelvic examination in patients with laparoscopically confirmed endometriosis. J Am Assoc Gynecol Laparosc. 1994; 1(2): 12730. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Nezhat F, Nezhat C, Welander CE. et al. Four ovarian cancers diagnosed during laparoscopic management of 1011 women with adnexal masses. Am J Obstet Gynecol. 1992; 167(3): 7906. Exclude other. [PubMed]
Nichols M, Morgan E, Jensen JT. Comparing bimanual pelvic examination to ultrasound measurement for assessment of gestational age in the first trimester of pregnancy. J Reprod Med. 2002; 47(10): 8258. Exclude Q1-no histol. dx. [PubMed]
O'Rourke J, Mahon SM. A comprehensive look at the early detection of ovarian cancer. Clin J Oncol Nurs. 2003; 7(1): 417. Exclude no mass. [PubMed]
Ong S, Duffy T, Murphy J. Transabdominal ultrasound and its correlation with clinical findings in gynaecology. Ir J Med Sci. 1996; 165(4): 26870. Exclude Q3-unable to construct 2×2. [PubMed]
Onsrud M, Shabana A, Austgulen R. et al. Comparison between soluble tumor necrosis factor receptors and CA125 in peritoneal fluids as a marker for epithelial ovarian cancer. Gynecol Oncol. 1995; 57(2): 1837. Exclude Q3-wrong test. [PubMed]
Opala T, Drews K, Rzymski P. et al. Evaluation of soluble intracellular adhesion molecule-1 (sICAM-1) in benign and malignant ovarian masses. Eur J Gynaecol Oncol. 2003; 24(34): 2557. Exclude Q3-experimental or non-standard test. [PubMed]
Opala T, Rzymski P, Wilczak M. et al. Evaluation of soluble tumour necrosis factor alpha receptors p55 and p75 in ovarian cancer patients. Eur J Gynaecol Oncol. 2005; 26(1): 436. Exclude Q3-wrong test. [PubMed]
Oram DH, Jacobs IJ, Brady JL, et al. Early diagnosis of ovarian cancer. Br J Hosp Med 1990;44(5):320, 322, 324. Exclude Q3-review.
Paley PJ. Screening for the major malignancies affecting women: current guidelines. Am J Obstet Gynecol. 2001; 184(5): 102130. Exclude review. [PubMed]
Papasakelariou C, Saunders D, De La Rosa A. Comparative study of laparoscopic oophorectomy. J Am Assoc Gynecol Laparosc. 1995; 2(4): 40710. Exclude Q6-no M&M data. [PubMed]
Pardo J, Kaplan B, Yitzhak M. et al. Ultrasonographic evaluation of hysterectomized patients with and without concomitant adnexectomy? Clin Exp Obstet Gynecol. 1998; 25(4): 1334. Exclude no mass. [PubMed]
Parker WH. Management of adnexal masses by operative laparoscopy. Selection criteria. J Reprod Med. 1992; 37(7): 6036. Exclude no mass. [PubMed]
Parker WH, Berek JS. Management of selected cystic adnexal masses in postmenopausal women by operative laparoscopy: a pilot study. Am J Obstet Gynecol. 1990; 163(5 Pt 1): 15747. Exclude Q2-no physical exam. [PubMed]
Patel MD, Feldstein VA, Chen DC. et al. Endometriomas: diagnostic performance of US [erratum appears in Radiology 1999 Dec;213(3):930]. Radiology. 1999; 210(3): 73945. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Patel MD, Feldstein VA, Lipson SD. et al. Cystic teratomas of the ovary: diagnostic value of sonography. AJR Am J Roentgenol. 1998; 171(4): 10615. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Patsner B, Mann WJ, Chalas E. Predictive value of CA 125 for ovarian carcinoma in patients presenting with pelvic masses. Obstet Gynecol. 1988; 71(6 Pt 1): 94950. Exclude review. [PubMed]
Pauler DK, Menon U, McIntosh M. et al. Factors influencing serum CA125II levels in healthy postmenopausal women. Cancer Epidemiol Biomarkers Prev. 2001; 10(5): 48993. Exclude Q1-no histol. dx. [PubMed]
Pearson VA. Screening for ovarian cancer: a review.. Public Health. 1994; 108(5): 36782. Exclude review. [PubMed]
Petricoin EF, Ardekani AM, Hitt BA. et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet. 2002; 359(9306): 5727. Exclude no mass. [PubMed]
Predanic M, Vlahos N, Pennisi JA. et al. Color and pulsed Doppler sonography, gray-scale imaging, and serum CA 125 in the assessment of adnexal disease. Obstet Gynecol. 1996; 88(2): 2838. Exclude Q1-sample size. [PubMed]
Prefontaine M, Kroft T, Monck M. et al. Evaluation of a simple line width test involving magnetic resonance spectroscopy of plasma in carcinoma of the ovary. Cancer. 1991; 67(2): 40611. Exclude Q3-wrong test. [PubMed]
Pugh CM, Youngblood P. Development and validation of assessment measures for a newly developed physical examination simulator. J Am Med Inform Assoc. 2002; 9(5): 44860. Exclude review. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Qureshi IA, Ullah H, Akram MH. et al. Transvaginal versus transabdominal sonography in the evaluation of pelvic pathology. J Coll Physicians Surg Pak. 2004; 14(7): 3903. Exclude Q3-no histol. dx. [PubMed]
Ramirez PT, Frumovitz M, Wolf JK. et al. Laparoscopic port-site metastases in patients with gynecological malignancies. [Review...] [58 refs]. Int J Gynecol Cancer. 2004; 14(6): 10707. Exclude review. [PubMed]
Rifkin JI, Shapiro H, Regensteiner JG. et al. Why do some women refuse to allow male residents to perform pelvic exams? Acad Med. 2002; 77(10): 10348. Exclude no mass. [PubMed]
Rollins G. Developments in cervical and ovarian cancer screening: implications for current practice. Ann Intern Med. 2000; 133(12): 10214. Exclude review. [PubMed]
Roman LD, Muderspach LI, Stein SM. et al. Pelvic examination, tumor marker level, and gray-scale and Doppler sonography in the prediction of pelvic cancer. Obstet Gynecol. 1997; 89(4): 493500. Exclude Q7-not descrip of sim model/Exclude Q5-wrong pt population. [PubMed]
Rosenthal AN, Jacobs IJ. The role of CA 125 in screening for ovarian cancer. Int J Biol Markers. 1998; 13(4): 21620. Exclude review. [PubMed]
Sari R, Buyukberber S, Sevinc A. et al. The effects of abdominal and bimanual pelvic examination and transvaginal ultrasonography on serum CA-125 levels. Clin Exp Obstet Gynecol. 2000; 27(1): 6971. Exclude review. [PubMed]
Sassone AM, Timor-Tritsch IE, Artner A. et al. Transvaginal sonographic characterization of ovarian disease: evaluation of a new scoring system to predict ovarian malignancy. Obstet Gynecol. 1991; 78(1): 706. Exclude Q1-denom is masses. [PubMed]
Schwartz LB, Seifer DB. Diagnostic imaging of adnexal masses. A review. J Reprod Med. 1992; 37(1): 6371. Exclude review. [PubMed]
Seewaldt VL, Cain JM, Greer BE. et al. Reviving the pelvic examination for evaluating the status of ovarian carcinoma. J Clin Oncol. 1995; 13(3): 799. Exclude Q1-no histol. dx. [PubMed]
Sengoku K, Satoh T, Saitoh S. et al. Evaluation of transvaginal color Doppler sonography, transvaginal sonography and CA 125 for prediction of ovarian malignancy. Int J Gynaecol Obstet. 1994; 46(1): 3943. Exclude Q4-sample size. [PubMed]
Shaharabany Y, Akselrod S, Tepper R. A sensitive new indicator for diagnostics of ovarian malignancy, based on the Doppler velocity spectrum. Ultrasound Med Biol. 2004; 30(3): 295302. Exclude Q3-unable to construct 2×2. [PubMed]
Shalev E, Eliyahu S, Peleg D. et al. Laparoscopic management of adnexal cystic masses in postmenopausal women. Obstet Gynecol. 1994; 83(4): 5946. Exclude Q3-not complete series. [PubMed]
Shapiro I, Friedman Z, Lysyansky P. et al. The instantaneous measurement of multiple Doppler spectra in the investigation of ovarian masses. Ultrasound Obstet Gynecol. 1998; 11(5): 3536. Exclude Q3-no histol. dx. [PubMed]
Shen-Gunther J, Mannel RS. Ascites as a predictor of ovarian malignancy. Gynecol Oncol. 2002; 87(1): 7783. Exclude Q3-unable to construct 2×2/ Exclude Q2-only 1 patient had ascites. [PubMed]
Sheppard R, Fry A, Rush R. et al. Women at risk of ovarian cancer: attitudes towards and expectations of the familial ovarian cancer clinic. Fam Cancer. 2001; 1(1): 317. Exclude no mass. [PubMed]
SOGC/GOC/SCC Policy and Practice Guideline Committee. Guidelines for the Laparoscopic Management of the adnexal mass: a policy statement. SOGC Clinical Practice Guidelines 1998;(76). Exclude review.
Soriano D, Yefet Y, Seidman DS. et al. Laparoscopy versus laparotomy in the management of adnexal masses during pregnancy. Fertil Steril. 1999; 71(5): 95560. Exclude Other All Premenopausal. [PubMed]
Sparks JM, Varner RE. Ovarian cancer screening. Obstet Gynecol. 1991; 77(5): 78792. Exclude review. [PubMed]
Stein SM, Laifer-Narin S, Johnson MB. et al. Differentiation of benign and malignant adnexal masses: relative value of gray-scale, color Doppler, and spectral Doppler sonography. AJR Am J Roentgenol. 1995; 164(2): 3816. Exclude Q1-denom is masses. [PubMed]
Tailor A, Jurkovic D, Bourne TH. et al. Sonographic prediction of malignancy in adnexal masses using multivariate logistic regression analysis. Ultrasound Obstet Gynecol. 1997; 10(1): 417. Exclude Q4-sample size. [PubMed]
Tailor A, Jurkovic D, Bourne TH. et al. Sonographic prediction of malignancy in adnexal masses using an artificial neural network. Br J Obstet Gynaecol. 1999; 106(1): 2130. Exclude Q3-unable to construct 2×2. [PubMed]
Tailor A, Jurkovic D, Bourne TH. et al. A comparison of intratumoural indices of blood flow velocity and impedance for the diagnosis of ovarian cancer. Ultrasound Med Biol. 1996; 22(7): 83743. Exclude Q3-sample size. [PubMed]
Tangjitgamol S, Jesadapatrakul S, Manusirivithaya S. et al. Accuracy of frozen section in diagnosis of ovarian mass. Int J Gynecol Cancer. 2004; 14(2): 2129. Exclude Q3-wrong test. [PubMed]
Tekay A, Jouppila P. Controversies in assessment of ovarian tumors with transvaginal color Doppler ultrasound. Acta Obstet Gynecol Scand. 1996; 75(4): 31629. Exclude Q3-review. [PubMed]
Tempfer C, Hefler L, Heinzl H. et al. CYFRA 21-1 serum levels in women with adnexal masses and inflammatory diseases. Br J Cancer. 1998; 78(8): 110812. Exclude Q3-experimental or non-standard test. [PubMed]
Teneriello MG, Park RC. Early detection of ovarian cancer. CA Cancer J Clin. 1995; 45(2): 7187. Exclude review. [PubMed]
Tepper R, Keselbrener L, Manor M. et al. Decay constant of Doppler flow waveform as a possible indicator of ovarian malignancy. Ultrasound Med Biol. 1997; 23(8): 11717. Exclude Q3-unable to construct 2×2. [PubMed]
Thorvinger B. Diagnostic and interventional radiology in gynecologic neoplasms. Acta Radiol Suppl. 1992; 378(Pt 3): 93108. Exclude Q1-no histol. dx. [PubMed]
Timmerman D, Bourne TH, Tailor A. et al. A comparison of methods for preoperative discrimination between malignant and benign adnexal masses: the development of a new logistic regression model. Am J Obstet Gynecol. 1999; 181(1): 5765. Exclude Q3-unable to construct 2×2. [PubMed]
Timmerman D, Schwarzler P, Collins WP. et al. Subjective assessment of adnexal masses with the use of ultrasonography: an analysis of interobserver variability and experience. Ultrasound Obstet Gynecol. 1999; 13(1): 116. Exclude Q3-unable to construct 2×2. [PubMed]
Timor-Tritsch IE. Is office use of vaginal ultrasonography feasible? Am J Obstet Gynecol. 1990; 162(4): 9835. Exclude Q1-no histol. dx. [PubMed]
Togashi K, Nishimura K, Kimura I. et al. Endometrial cysts: diagnosis with MR imaging. Radiology. 1991; 180(1): 738. Exclude Q3-no cancer outcome. [PubMed]
Tuxen MK, Soletormos G, Dombernowsky P. Tumor markers in the management of patients with ovarian cancer. Cancer Treat Rev. 1995; 21(3): 21545. Exclude Q3-review. [PubMed]
U.S. Preventive Services Task Force. Screening for ovarian cancer: recommendation statement. Ann Fam Med. 2004; 2(3): 2602. Exclude Q1-no histol. dx. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Ueland F, DePriest P, DeSimone C, et al. The accuracy and examination under anesthesia and transvaginal sonography in evaluating ovarian size. Gynecol Oncol 2005 (in Press) Exclude no mass.
Urban N. Screening for ovarian cancer. We now need a definitive randomised trial. BMJ. 1999; 319(7221): 13178. Exclude Q1-no histol. dx. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Urban N, McIntosh MW, Andersen M. et al. Ovarian cancer screening. Hematol Oncol Clin North Am. 2003; 17(4): 9851005. Exclude review.
Usubutun A, Altinok G, Kucukali T. The value of intraoperative consultation (frozen section) in the diagnosis of ovarian neoplasms. Acta Obstet Gynecol Scand. 1998; 77(10): 10136. Exclude Q3-wrong test. [PubMed]
Vaidya AP, Curtin JP. The follow-up of ovarian cancer. Semin Oncol. 2003; 30(3): 40112. Exclude Q1-no histol. dx. [PubMed]
Valentin L. Prospective cross-validation of Doppler ultrasound examination and gray-scale ultrasound imaging for discrimination of benign and malignant pelvic masses. Ultrasound Obstet Gynecol. 1999; 14(4): 27383. Exclude Q3-publ duplicate. [PubMed]
Valentin L, Sladkevicius P, Marsal K. Limited contribution of Doppler velocimetry to the differential diagnosis of extrauterine pelvic tumors. Obstet Gynecol. 1994; 83(3): 42533. Exclude Q3-unable to construct 2×2. [PubMed]
van Dam PA, DeCloedt J, Tjalma WA. et al. Trocar implantation metastasis after laparoscopy in patients with advanced ovarian cancer: can the risk be reduced? Am J Obstet Gynecol. 1999; 181(3): 53641. Exclude no mass. [PubMed]
van der Burg ME, Lammes FB, Verweij J. The role of CA 125 and conventional examinations in diagnosing progressive carcinoma of the ovary. Surg Gynecol Obstet. 1993; 176(4): 3104. Exclude Q3-wrong test. [PubMed]
van Nagell JR Jr, DePriest PD, Gallion HH. et al. Ovarian cancer screening. Cancer. 1993; 71(4 Suppl): 15238. Exclude review. [PubMed]
van Nagell JR Jr, DePriest PD, Reedy MB. et al. The efficacy of transvaginal sonographic screening in asymptomatic women at risk for ovarian cancer. Gynecol Oncol. 2000; 77(3): 3506. Exclude Q5-wrong pt population. [PubMed]
van Nagell JR Jr, Higgins RV, Donaldson ES. et al. Transvaginal sonography as a screening method for ovarian cancer. A report of the first 1000 cases screened. Cancer. 1990; 65(3): 5737. Exclude Q1-pop is subset of larger study (#2730). [PubMed]
van Nagell JR Jr, Ueland FR. Ultrasound evaluation of pelvic masses: predictors of malignancy for the general gynecologist. Curr Opin Obstet Gynecol. 1999; 11(1): 459. Exclude review. [PubMed]
Varpula M. Magnetic resonance imaging of female pelvic masses and local recurrent tumors at an ultra low (0.02 T) magnetic field: correlation with computed tomography. Magn Reson Imaging. 1993; 11(1): 3546. Exclude Q3-sample size. [PubMed]
Varras M. Benefits and limitations of ultrasonographic evaluation of uterine adnexal lesions in early detection of ovarian cancer. Clin Exp Obstet Gynecol. 2004; 31(2): 8598. Exclude review. [PubMed]
Vogl FD, Frey M, Kreienberg R. et al. Autoimmunity against p53 predicts invasive cancer with poor survival in patients with an ovarian mass. Br J Cancer. 2000; 83(10): 133843. Exclude Q3-sample size. [PubMed]
von Schlippe M, Rustin GJ. Circulating tumour markers in ovarian tumours. Forum. 2000; 10(4): 38392. Exclude review. [PubMed]
Voss SC, Lacey CG, Pupkin M. et al. Ultrasound and the pelvic mass. J Reprod Med. 1983; 28(12): 8337. Exclude Q3-no verification test negative/ Exclude Q2-unable to construct 2×2. [PubMed]
Wakahara F, Kikkawa F, Nawa A. et al. Diagnostic efficacy of tumor markers, sonography, and intraoperative frozen section for ovarian tumors. Gynecol Obstet Invest. 2001; 52(3): 14752. Exclude Q4-unable to construct 2×2. [PubMed]
Wardle FJ, Collins W, Pern. et al. Psychological impact of screening for familial ovarian cancer. J Natl Cancer Inst. 1993; 85(8): 6537. Exclude review. [PubMed]
Weerakiet S, Wongkularb A, Rochanawutanon M. et al. Transvaginal ultrasonography combined with pelvic examination in the diagnosis of ovarian endometrioma. J Med Assoc Thai. 2000; 83(5): 5238. Exclude Q3-no cancer outcome. [PubMed]
Weiner Z, Beck D, Brandes JM. Transvaginal sonography, color flow imaging, computed tomographic scanning, and CA 125 as a routine follow-up examination in women with pelvic tumor: detection of recurrent disease. J Ultrasound Med. 1994; 13(1): 3741. Exclude no mass. [PubMed]
Westhoff C, Levin B, Ladd G. et al. Sources of variability in normal CA 125 levels. Cancer Epidemiol Biomarkers Prev. 1992; 1(5): 3579. Exclude review. [PubMed]
Yawn BP, Wollan PC. Ovarian cancer: the neglected diagnosis. Mayo Clinic Proceedings. 2004; 79(10): 127782. Exclude review. [PubMed]
Zhang Z, Bast RC Jr, Yu Y. et al. Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer.[see comment]. Cancer Res. 2004; 64(16): 588290. Exclude Q3-experimental or non-standard test. [PubMed]
Zurawski VR Jr, Knapp RC, Einhorn N. et al. An initial analysis of preoperative serum CA 125 levels in patients with early stage ovarian carcinoma. Gynecol Oncol. 1988; 30(1): 714. Exclude Q3-distguish malignant versus nonmalignant. [PubMed]
Zygmunt A, Markowska J, Fischer N. Significance of tissue polypeptide specific antigen (TPS) in diagnosis and monitoring of treatment in ovarian cancer. Eur J Gynaecol Oncol. 1998; 19(5): 4846. Exclude review. [PubMed]

Appendix C: Sample Data Abstraction Forms

Table C1

Question 1: What is the prevalence of various tumor types among peri- and postmenopausal women with an adnexal mass, stratified by cancer status (malignant vs. benign), age, and size of tumor?
StudyStudy DesignPatientsClinical PresentationResultsComments/Quality Scoring
StudyIDGeographical location:Age:Symptomatic (n [%]):[Proportion of each type of finding, stratified by cancer status, age/menopausal status (<45, 45–55, >55 or pre-peri-post-menopausal), and size of tumor. Include individual tumor types where possible.][IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE]
Dates:Mean (SD):Detected by exam (n [%]):Use Excel spreadsheet to calculate confidence intervals for prevalence data from screening studies[COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION]
Size of population:Median:Detected by imaging (n [%]):1)Quality assessment:
[num/denom for screening studies]Range:Combination (n [%]):2)[assign + or - to each item, and provide a brief rationale]
Screening study RegistryMenopausal status (n [%]):Additional data used for diagnosis:3)Size of population from which sample drawn:
OtherPre (< 45):4)Number of cases:
[delete all but one; please specify “Other”]Peri (45–55):5)Patient selection:
Post (> 55):Application of reference standard:
Race/ethnicity (n [%]):This article is also relevant to: [delete as appropriate]
Risk factors (n [%]):Question 2
Family history:Question 3
Genotype:Question 4
Other [specify]:Question 5
Question 6
Question 7

Table C2

Question 2: What are the sensitivity, specificity, and reliability of the bimanual examination?
StudyStudy DesignPatientsClinical PresentationClinical Setting of ExamResultsComments/Quality Scoring
StudyIDGeographical location:Age:Symptomatic (n [%]):[Please provide brief description of clinical setting in which bimanual exam was performed][For bimanual exam, provide reported sensitivity/specificity and provide 2×2 tables (if possible). If possible and appropriate, stratify by age or menopausal status. If data are available on reliability/ reproducibility, report these as well. Include kappa scores if these are reported or can be calculated.][IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE]
Dates:Mean (SD):Detected by exam (n [%]):1) [Use this space to provide information needed for reader to interpret Test +, Test -, Disease +, and Disease - headings in following table.][COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION]
Size of population:Median:Detected by imaging (n [%]):Quality assessment:
[num/denom for screening studies]Range:Combination (n [%]):[assign + or - to each item, and provide a brief rationale]
Screening study RegistryMenopausal status (n [%]):Additional data used for diagnosis:Reference standard:
OtherPre (< 45):Verification bias:
[delete all but one; please specify “Other”]Peri (45–55):Test reliability/variability:
Reference standard:Post (> 55):Sample size:
Reference standard applied to all test negatives?:Race/ethnicity (n [%]):Statistical tests:
Test reliability established?:Risk factors (n [%]):Blinding:
Statistical tests used:Family history:Definition of +/- on screening test:
Blinding:Genotype:This article is also relevant to: [delete as appropriate]
Definition of positive and negative on screening test:Other [specify]:2)Question 1
Inclusion criteria:Question 2
Exclusion criteria:Question 3
Question 4
Question 5
Question 7

Table C3

Question 3: Among peri- and postmenopausal women with a palpable adnexal mass on exam or a mass identified by ultrasound/imaging, what is the sensitivity/specificity of various evaluation modalities including ultrasound (transvaginal ultrasound, transabdominal ultrasound, color Doppler, 2-D vs 3D ultrasound, CT scan, MRI scan, and CA-125 levels) for diagnosing malignant masses?
StudyStudy DesignPatientsClinical PresentationResultsComments/Quality Scoring
StudyIDGeographical location:Age:Symptomatic (n [%]):[For each test reported, please provide a 2×2 table and report or calculate sensitivity, specificity, NPV, and PPV (all with confidence intervals). If possible and appropriate, stratify by age or menopausal status.][IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE]
Dates:Mean (SD):Detected by exam (n [%]):1) [Use this space to provide information needed for reader to interpret Test +, Test -, Disease +, and Disease - headings in following table.][COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION]
Size of population:Median:Detected by imaging (n [%]):Quality assessment:
[num/denom for screening studies]Range:Combination (n [%]):[assign + or - to each item, and provide a brief rationale]
Screening study RegistryMenopausal status (n [%]):Additional data used for diagnosis:Reference standard:
OtherPre (< 45):Verification bias:
[delete all but one; please specify “Other”]Peri (45ndash;55):Test reliability/variability:
Reference standard:Post (> 55):Sample size:
Reference standard applied to all test negatives?:Race/ethnicity (n [%]):Statistical tests:
Test reliability established?:Risk factors (n [%]):Blinding:
Statistical tests used:Family history:Definition of +/- on screening test:
Blinding:Genotype:2)This article is also relevant to: [delete as appropriate]
Definition of positive and negative on screening test:Other [specify]:Question 1
Inclusion criteria:Question 3
Exclusion criteria:Question 4
Question 5
Question 6
Question 7

Table C4

Question 4: What is the accuracy of explicit scoring systems which incorporate various combinations of imaging findings, patient risk factors, and/or CA-125 levels for detecting malignancy? Have these scoring systems been applied to a population of peri-/postmenopausal women before laparoscopy?
StudyStudy DesignPatientsClinical PresentationItems Included in Scoring SystemResultsComments/Quality Scoring
StudyIDGeographical location:Age:Symptomatic (n [%]):1)[For each reported scoring system (and individual components, if reported), provide reported sensitivity/specificity and provide 2×2 table; if multivariate analysis, provide area under ROC curve or c-statistic, if reported. If possible and appropriate, stratify by age or menopausal status.][IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE]
Dates:Mean (SD):Detected by exam (n [%]):2)1) [Use this space to provide information needed for reader to interpret Test +, Test -, Disease +, and Disease - headings in following table.][COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION]
Size of population:Median:Detected by imaging (n [%]):3)2)Quality assessment:
[num/denom for screening studies]Range:Combination (n [%]):4)Results were reported, but have not been abstracted, for the following combinations: [list][assign + or - to each item, and provide a brief rationale]
Screening study RegistryMenopausal status (n [%]):Additional data used for diagnosis:5)Reference standard:
OtherPre (< 45):6)Verification bias:
[delete all but one; please specify “Other”]Peri (45–55):7)Test reliability/variability:
Reference standard:Post (> 55):8)Sample size:
Reference standard applied to all test negatives?:Race/ethnicity (n [%]):9)Statistical tests:
Statistical tests used:Risk factors (n [%]):10)Blinding:
Blinding:Family history:Definition of +/- on screening test:
Definition of positive and negative on screening test:Genotype:Explicit validation method?:
Other [specify]:This article is also relevant to: [delete as appropriate]
Inclusion criteria:Question 1
Exclusion criteria:Question 2
Question 4
Question 5
Question 6
Question 7

Table C5

Question 5: Among women with suspected benign lesions on initial investigation, what is the sensitivity and specificity of monitoring with periodic CA-125 and/or interval ultrasound examinations for detecting malignant masses? How does the interval of testing/definition of change affect sensitivity and predictive value?
StudyStudy DesignPatientsClinical PresentationMonitoring StrategyResultsComments/Quality Scoring
StudyIDGeographical location:Age:Symptomatic (n [%]):Monitoring test:[For each reported monitoring strategy, provide reported sensitivity/specificity and provide 2×2 table; if multivariate analysis, provide area under ROC curve or c-statistic, if reported. If possible and appropriate, stratify by age or menopausal status.][IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE]
Dates:Mean (SD):Detected by exam (n [%]):Interval of testing:1) [Use this space to provide information needed for reader to interpret Test +, Test -, Disease +, and Disease - headings in following table.][COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION]
Size of population:Median:Detected by imaging (n [%]):Definition of change:2)Quality assessment:
[num/denom for screening studies]Range:Combination (n [%]):3)[assign + or - to each item, and provide a brief rationale]
Screening study RegistryMenopausal status (n [%]):Additional data used for diagnosis:Reference standard:
OtherPre (< 45):Verification bias:
[delete all but one; please specify “Other”]Peri (45–55):Test reliability/variability:
Reference standard:Post (> 55):Sample size:
Reference standard applied to all test negatives?:Race/ethnicity (n [%]):Statistical tests:
Test reliability established?:Risk factors (n [%]):Blinding:
Statistical tests used:Family history:Definition of +/- on screening test:
Blinding:Genotype:Explicit validation method?:
Definition of positive and negative on screening test:Other [specify]:This article is also relevant to: [delete as appropriate]
Length of follow up:Inclusion criteria:Question 1
Type of follow up:Exclusion criteria:Question 2
Follow-up interval:Loss to follow up:Question 3
Question 5
Question 6
Question 7

Table C6

Question 6: Among women with adnexal masses, what is the morbidity and mortality from diagnostic laparoscopy? At what point does the risk of laparoscopy outweigh the risk of detecting malignancy?
StudyStudy DesignPatientsClinical PresentationResultsComments/Quality Scoring
StudyIDGeographical location:Age:Symptomatic (n [%]):[For each, provide reported rate and 95% CI, if appropriate. If possible and appropriate, stratify results by age or menopausal status.][IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE]
Dates:Mean (SD):Detected by exam (n [%]):Use Excel spreadsheet to calculate confidence intervals for morbidity/mortality[COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION]
Size of population:Median:Detected by imaging (n [%]):1) Mortality:Quality assessment:
[num/denom for screening studies]Range:Combination (n [%]):2) Morbidity (total all complications):[assign + or - to each item, and provide a brief rationale]
Single center RegistryMenopausal status (n [%]):Additional data used for diagnosis:3) Specific complications:Size of population from which sample drawn:
[delete one]Pre (< 45):4) Rate of conversion to laparotomy:Number of cases:
Morbidity definitions:Peri (45–55):5)Patient selection:
Length of follow up after surgery:Post (> 55):6)Application of reference standard:
Race/ethnicity (n [%]):This article is also relevant to: [delete as appropriate]
Risk factors (n [%]):Question 1
Family history:Question 2
Genotype:Question 3
Other [specify]:Question 4
Loss to follow up:Question 6
Question 7

Table C7

Question 7: What are the estimated trade-offs resulting from various strategies for evaluation of the adnexal mass?
StudyStudy DesignStudy OutcomesSources for Model ProbabilitiesSources for Model OutcomesResultsComments
StudyIDType of model:[Life expectancy, quality of life, cancer incidence, cancer death, etc. Include costs, but we will not be using them here][In particular, sources for transition probabilities between different stages of pre-cancer/cancer][For each strategy compared, compare results for different outcomes; also, report results of significant sensitivity analyses.][IF ARTICLE SHOULD BE EXCLUDED, PLEASE EXPLAIN WHY HERE]
Population modeled (age, range):Simplifying assumptions:1)[COMMENT ON BIASES, ETC. AFFECTING CLINICAL INTERPRETATION]
Strategies compared:2)This article is also relevant to: [delete as appropriate]
3)Question 1
4)Question 2
5)Question 3
6)Question 4
Question 5
Question 6

Appendix D: Evidence Tables

Abbreviations used in the Evidence Tables

2DTwo-dimensional
3DThree-dimensional
AFPAlpha-fetoprotein
AHRQAgency for Healthcare Research and Quality
AUCArea under the curve
BMEBimanual examination
BMIBody mass index
CA-19-9Cancer antigen 19-9
CA-72–4Cancer antigen 72–4
CA-125Cancer antigen 125
CEACarcinoembryonic antigen
CIConfidence interval
CPPChronic pelvic pain
CTComputed tomography
F-FDG18-Fluorodeoxyglucose
FNAFine needle aspiration
FSHFollicle-stimulating hormone
GIGastrointestinal
hCGHuman chorionic gonadotropin
ICD-9International Classification of Diseases, Ninth Revision
LDHLactate dehydrogenase
LMPLow malignant potential
MRIMagnetic resonance imaging
NISNationwide Inpatient Sample
NANot applicable
NPVNegative predictive value
NRNot reported
OROdds ratio
PEPelvic examination
PETPositron emission tomography
PIPulsatility index
PIDPelvic inflammatory disease
PPSPapillary projection score
PPVPositive predictive value
PSVPeak systolic velocity
RIResistance index
RMIRisk of Malignancy Index
ROCReceiver operating characteristic
SDStandard deviation
SeSensitivity
SEMStandard error of the mean
SpSpecificity
TAG-72Tumor-associated glycoprotein 72
TAMXVTime-averaged maximum velocity
ATITumor-associated trypsin inhibitor
TVUSTransvaginal ultrasound
USUltrasound
UTIUrinary tract infection

Appendix E: Peer Reviewers

The Duke Evidence-based Practice Center is grateful to the following peer reviewers who read and commented on a draft version of this report:

Susan M. Ascher, MD; Department of Radiology, Georgetown University Hospital; Washington, DC

Andrew Berchuck, MD; Division of Gynecologic Oncology, Duke University Medical Center; Durham, NC

Michael L. Berman; Division of Gynecologic Oncology, UCI Medical Center; Orange, CA

Christie R. Eheman, MD; Centers for Disease Control and Prevention; Atlanta, GA

Barbara Goff, MD; University of Washington School of Medicine; Seattle, WA

Walter Kinney, MD; Department of Obstetrics and Gynecology, The Permanente Medical Group; Sacramento, CA

Ann Kolker; Ovarian Cancer National Alliance; Washington, DC

Herschel Lawson, MD; Center for Disease Control and Prevention; Atlanta, GA

Saralyn Mark, MD; Department of Health and Human Services; Washington, DC

Susan Meikle, MD, MSPH; Agency for Healthcare Research and Quality; Rockville, MD

Valerie McGuire, PhD; Department of Health Research and Policy, Stanford University; Stanford, CA

Edward E. Partridge, MD; Department of Obstetrics and Gynecology, University of Alabama; Birmingham, AL

Mona Saraiya, MD, MPH; Centers for Disease Control and Prevention; Atlanta, GA

George F. Sawaya, MD; Department of Obstetrics and Gynecology, UCSF; San Francisco, CA

Howard T. Sharp, MD; University of Utah Hospitals and Clinics; Salt Lake City, UT

Edward L. Trimble, MD, MPH; National Cancer Institute; Rockville, MD

John R. van Nagell Jr., MD; Department of Obstetrics and Gynecology, University of Kentucky Medical Center; Lexington, Kentucky

Nominations for peer reviewers were solicited from several sources, including the project's technical expert panel and interested federal agencies. The list of nominees was vetted and approved by the Agency for Healthcare Research and Quality (AHRQ).

References Cited in the Evidence Report
1.
Jemal A, Murray T, Samuels A. et al. Cancer statistics, 2003. Ca: Cancer J Clin. 2003; 53(1): 526. [PubMed]
2.
Anonymous. SEER Stat Databases: Incidence - SEER Regs Public - Use, Nov 2004 Sub for Expanded Races (1992-2002) and Incidence - SEER 13 Regs excluding AK Public - Use Nov 2004 Sub for Hispanics (1992-2002), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2005, based on November 2004 submission. Available at: www.seer.cancer.gov. Accessed: May 25, 2005.
3.
Russell P, Robboy S, Anderson M. Ovarian tumors: classification and clinical perspective. In: Robboy SJ, Anderson MC, and Russell P, editors. Pathology of the female reproductive tract. London: Churchill Livingstone; 2002. p. 527–38.
4.
Eheman C, Bobo J, Lawson H, et al. Identifying public health opportunities to reduce the burden of ovarian cancer: workshop summary. Atlanta, Ga. Centers for Disease Control and Prevention, 2001. Available at: www.cdc.gov/cancer/ovarian/workshop.htm#conclusion. Accessed: May 25, 2005.
5.
Eheman C, Brustrom J, Lawson H. The use of ultrasound in diagnosing ovarian cancer: can we improve on current practice? workshop summary. Atlanta, GA: Centers for Disease Control and Prevention, 2002. Accessed: http://www.cdc.gov/cancer/ovarian/workshop.htm. Accessed: May 25, 2005.
6.
Russell P, Robboy S, Anderson M. The ovary: normal appearances and non-neoplastic conditions. In: Robboy SJ, Anderson MC, and Russell P, editors. Pathology of the female reproductive tract. London: Churchill Livingstone; 2002. p. 475–526.
7.
Barber H, Graber E. The PMPO syndrome (postmenopausal palpable ovary syndrome). Obstet and Gynecol. 1971; 38(6): 9213.
8.
Munstedt K, von Georgi R, Misselwitz B. et al. Centralizing surgery for gynecologic oncology—a strategy assuring better quality treatment? [see comment]. Gynecol Oncol. 2003; 89(1): 48. [PubMed]
9.
Junor E, Hole D, McNulty L. et al. Specialist gynaecologists and survival outcome in ovarian cancer: a Scottish national study of 1866 patients. Br J Obstet Gynaecol. 1999; 106(11): 11306. [PubMed]
10.
Tingulstad S, Skeldestad FE, Hagen B. The effect of centralization of primary surgery on survival in ovarian cancer patients. Obstet Gynecol. 2003; 102(3): 499505. [PubMed]
11.
American College of Obstetricians and Gynecologists. ACOG Committee Opinion: number 280, December 2002. The role of the generalist obstetrician-gynecologist in the early detection of ovarian cancer. Obstet Gynecol. 2002; 100(6): 14136. [PubMed]
12.
American Academy of Family Physicians. Summary of recommendations for periodic health examinations. Leawood (KS):American Academy of Family Physicians. 2004.
13.
Geomini P, Bremer G, Kruitwagen R. et al. Diagnostic accuracy of frozen section diagnosis of the adnexal mass: a metaanalysis. Gynecol Oncol. 2005; 96(1): 19. [PubMed]
14.
American College of Obstetricians and Gynecologists. ACOG Practice Bulletin 45. Cervical cytology screening. August 2003. 2003.
15.
Whiting P, Rutjes A, Reitsma JB. et al. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern. Med. 2004; 140(3): 189202. [PubMed]
16.
Gohagan J, Prorok P, Hayes R. et al. The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial of the National Cancer Institute: history, organization and status. Control Clin Trials. 2000; 21(6Suppl): 251S72S. [PubMed]
17.
Prorok P, Andriole G, Bresalier R. et al. Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Control Clin Trials. 2000; 21(6 Suppl): 273S309S. [PubMed]
18.
Jacobs I. European randomized trial of ovarian cancer screening (protocol). London: Wolfson Institute of Preventive Medicine, Department of Environmental and Preventive Medicine. 1995.
19.
Goff B, Mandel L, Melancon C. et al. Frequency of symptoms of ovarian cancer in women presenting to primary care clinics. [see comment]. JAMA. 2004; 291(22): 270512. [PubMed]
20.
Thompson ISI ReseachSoft. ProCite [computer program]. Berkeley, CA: Thompson ISI ResearchSoft.
21.
Nanda K, McCrory D, Myers E. et al. Accuracy of the Papanicolaou test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann Intern Med. 2000; 132(10): 8109. [PubMed]
22.
Lau J. Meta-Stat 0.6. Shareware program for performing meta-analyses of diagnostic tests. Available at: www.medepi.net/meta/software/MetaTest/Mtreadme.txt. Accessed August 26, 2005.
23.
DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986; 7(3): 17788. [PubMed]
24.
Moses LE, Shapiro D. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993; 12: 1293316. [PubMed]
25.
Agency for Healthcare Research and Quality. Overview of the HCUP Nationwide Inpatient Sample. Rockville, MD. 2003.
26.
Myers E, Steege J. Risk adjustment for complications of hysterectomy:utility of routinely collected administrative data. Am J Obstet Gynecol. 1999; 181: 56775. [PubMed]
27.
Ries L, Eisner M, Kosary C, et al. SEER Cancer Statistics Review, 1973-1999. National Cancer Institute: Bethesda, MD. 2002.
28.
Anderson R, DeTurk P. National vital statistics reports: United States life tables, 1999. National Center for Health Statistics: Hyattsville, MD. 2002.
29.
Jones H. Ovarian Cancer: The Clinical Problem in Early Detection of Ovarian Carcinoma with Transvaginal Sonography: Potentials and Limitations, H.W. Jones, editor. Raven Press, Ltd.: New York. 1993;3–9.
30.
TreeAge Software. DATA 4.0 [computer program]. Williamstown, MA: TreeAge Software, Inc.
31.
Chalas E, Welshinger M, Engellener W. et al. The clinical significance of thrombocytosis in women presenting with a pelvic mass. Am J Obstet Gynecol. 1992; 166(3): 9747. [PubMed]
32.
Childers JM, Nasseri A, Surwit EA. Laparoscopic management of suspicious adnexal masses. Am J Obstet Gynecol. 1996; 175(6): 14517. discussion 1457–9. [PubMed]
33.
Cohen LS, Escobar PF, Scharm C. et al. Three-dimensional power Doppler ultrasound improves the diagnostic accuracy for ovarian cancer prediction. Gynecol Oncol. 2001; 82(1): 408. [PubMed]
34.
DePriest PD, Gallion HH, Pavlik EJ. et al. Transvaginal sonography as a screening method for the detection of early ovarian cancer. Gynecol Oncol. 1997; 65(3): 40814. [PubMed]
35.
DePriest PD, Shenson D, Fried A. et al. A morphology index based on sonographic findings in ovarian cancer. Gynecol Oncol. 1993; 51(1): 711. [PubMed]
36.
DePriest PD, van Nagell JR Jr, Gallion HH. et al. Ovarian cancer screening in asymptomatic postmenopausal women. Gynecol Oncol. 1993; 51(2): 2059. [PubMed]
37.
Dottino PR, Levine DA, Ripley DL. et al. Laparoscopic management of adnexal masses in premenopausal and postmenopausal women. Obstet Gynecol. 1999; 93(2): 2238. [PubMed]
38.
Fleischer AC, Cullinan JA, Jones HW 3rd. et al. Correlation of histomorphology of ovarian masses with color Doppler sonography. Ultrasound Med Biol. 1996; 22(5): 5559. [PubMed]
39.
Lin JY, Angel C, DuBeshter B. et al. Diagnoses after laparotomy for a mass in the pelvic area in women. Surg Gynecol Obstet. 1993; 176(4): 3338. [PubMed]
40.
Modesitt SC, Pavlik EJ, Ueland FR. et al. Risk of malignancy in unilocular ovarian cystic tumors less than 10 centimeters in diameter. Obstet Gynecol. 2003; 102(3): 5949. [PubMed]
41.
Parker WH, Levine RL, Howard FM. et al. A multicenter study of laparoscopic management of selected cystic adnexal masses in postmenopausal women. J Am Coll Surg. 1994; 179(6): 7337. [PubMed]
42.
Roman LD, Muderspach LI, Stein SM. et al. Pelvic examination, tumor marker level, and gray-scale and Doppler sonography in the prediction of pelvic cancer. Obstet Gynecol. 1997; 89(4): 493500. [PubMed]
43.
Schneider VL, Schneider A, Reed KL. et al. Comparison of Doppler with two-dimensional sonography and CA 125 for prediction of malignancy of pelvic masses. Obstet Gynecol. 1993; 81(6): 9838. [PubMed]
44.
Scoutt LM, McCarthy SM, Lange R. et al. MR evaluation of clinically suspected adnexal masses. J Comput Assist Tomogr. 1994; 18(4): 60918. [PubMed]
45.
Shen-Gunther J, Mannel RS. Ascites as a predictor of ovarian malignancy. Gynecol Oncol. 2002; 87(1): 7783. [PubMed]
46.
Smikle CB, Lunt CC, Hankins GD. Clinical predictors in the evaluation of a pelvic mass. Mil Med. 1995; 160(5): 2335. [PubMed]
47.
Troiano RN, Quedens-Case C, Taylor KJ. Correlation of findings on transvaginal sonography with serum CA 125 levels. AJR Am J Roentgenol. 1997; 168(6): 158790. [PubMed]
48.
Twickler DM, Forte TB, Santos-Ramos R. et al. The Ovarian Tumor Index predicts risk for malignancy. Cancer. 1999; 86(11): 228090. [PubMed]
49.
van Nagell JR Jr, DePriest PD, Reedy MB. et al. The efficacy of transvaginal sonographic screening in asymptomatic women at risk for ovarian cancer. Gynecol Oncol. 2000; 77(3): 3506. [PubMed]
50.
Vasilev SA, Schlaerth JB, Campeau J. et al. Serum CA 125 levels in preoperative evaluation of pelvic masses. Obstet Gynecol. 1988; 71(5): 7516. [PubMed]
51.
Adonakis GL, Paraskevaidis E, Tsiga S. et al. A combined approach for the early detection of ovarian cancer in asymptomatic women. Eur J Obstet Gynecol Reprod Biol. 1996; 65(2): 2215. [PubMed]
52.
Andolf E, Jorgensen C, Astedt B. Ultrasound examination for detection of ovarian carcinoma in risk groups. Obstet Gynecol. 1990; 75(1): 1069. [PubMed]
53.
Balbi GC, Musone R, Menditto A. et al. Women with a pelvic mass: indicators of malignancy. Eur J Gynaecol Oncol. 2001; 22(6): 45962. [PubMed]
54.
Buckshee K, Temsu I, Bhatla N. et al. Pelvic examination, transvaginal ultrasound and transvaginal color Doppler sonography as predictors of ovarian cancer. Int J Gynaecol Obstet. 1998; 61(1): 517. [PubMed]
55.
Dowd JR, Quinn MA, Rome R. et al. Women with a pelvic mass—when to perform an ultrasound. Aust N Z J Obstet Gynaecol. 1993; 33(4): 4047. [PubMed]
56.
Finkler NJ, Benacerraf B, Lavin PT. et al. Comparison of serum CA 125, clinical impression, and ultrasound in the preoperative evaluation of ovarian masses. Obstet Gynecol. 1988; 72(4): 65964. [PubMed]
57.
Grover SR, Quinn MA. Is there any value in bimanual pelvic examination as a screening test. Med J Aust. 1995; 162(8): 40810. [PubMed]
58.
Jacobs I, Stabile I, Bridges J. et al. Multimodal approach to screening for ovarian cancer. Lancet. 1988; 1(8580): 26871. [PubMed]
59.
Ong S, Duffy T, Murphy J. Transabdominal ultrasound and its correlation with clinical findings in gynaecology. Ir J Med Sci. 1996; 165(4): 26870. [PubMed]
60.
Padilla LA, Radosevich DM, Milad MP. Accuracy of the pelvic examination in detecting adnexal masses. Obstet Gynecol. 2000; 96(4): 5938. [PubMed]
61.
Padilla LA, Radosevich DM, Milad MP. Limitations of the pelvic examination for evaluation of the female pelvic organs. Int J Gynaecol Obstet. 2005; 88(1): 848. [PubMed]
62.
Schutter EM, Kenemans P, Sohn C. et al. Diagnostic value of pelvic examination, ultrasound, and serum CA 125 in postmenopausal women with a pelvic mass. An international multicenter study. Cancer. 1994; 74(4): 1398406. [PubMed]
63.
Schutter EM, Sohn C, Kristen P. et al. Estimation of probability of malignancy using a logistic model combining physical examination, ultrasound, serum CA 125, and serum CA 72-4 in postmenopausal women with a pelvic mass: an international multicenter study. Gynecol Oncol. 1998; 69(1): 5663. [PubMed]
64.
Ueland F, DePriest P, DeSimone C, et al. The accuracy and examination under anesthesia and transvaginal sonography in evaluating ovarian size. Gynecol Oncol 2005 (in Press).
65.
Alcazar JL, Castillo G. Comparison of 2-dimensional and 3-dimensional power-Doppler imaging in complex adnexal masses for the prediction of ovarian cancer. Am J Obstet Gynecol. 2005; 192(3): 80712. [PubMed]
66.
Alcazar JL, Errasti T, Zornoza A. et al. Transvaginal color Doppler ultrasonography and CA-125 in suspicious adnexal masses. Int J Gynaecol Obstet. 1999; 66(3): 25561. [PubMed]
67.
Alcazar JL, Galan MJ, Garcia-Manero M. et al. Three-dimensional sonographic morphologic assessment in complex adnexal masses: preliminary experience. J Ultrasound Med. 2003; 22(3): 24954. [PubMed]
68.
Alcazar JL, Lopez-Garcia G. Transvaginal color Doppler assessment of venous flow in adnexal masses. Ultrasound Obstet Gynecol. 2001; 17(5): 4348. [PubMed]
69.
Alcazar JL, Merce LT, Laparte C. et al. A new scoring system to differentiate benign from malignant adnexal masses. Am J Obstet Gynecol. 2003; 188(3): 68592. [PubMed]
70.
Anandakumar C, Chew S, Wong YC. et al. Role of transvaginal ultrasound color flow imaging and Doppler waveform analysis in differentiating between benign and malignant ovarian tumors. Ultrasound Obstet Gynecol. 1996; 7: 2804. [PubMed]
71.
Antonic J, Rakar S. Colour and pulsed Doppler US and tumour marker CA 125 in differentiation between benign and malignant ovarian masses. Anticancer Res. 1995; 15: 152732. [PubMed]
72.
Asif N, Sattar A, Dawood MM. et al. Pre-operative evaluation of ovarian mass: risk of malignancy index. J Coll Physicians Surg Pak. 2004; 14(3): 12831. [PubMed]
73.
Benjapibal M, Sunsaneevitayakul P, Boriboonhirunsarn D. et al. Color Doppler ultrasonography for prediction of malignant ovarian tumors. J Med Assoc Thai. 2002; 85(6): 70915. [PubMed]
74.
Benjapibal M, Sunsaneevitayakul P, Phatihattakorn C. et al. Sonographic morphological pattern in the pre-operative prediction of ovarian masses. J Med Assoc Thai. 2003; 86(4): 3327. [PubMed]
75.
Berlanda N, Ferrari MM, Mezzopane R. et al. Impact of a multiparameter, ultrasound-based triage on surgical management of adnexal masses. Ultrasound Obstet Gynecol. 2002; 20(2): 1815. [PubMed]
76.
Bromley B, Goodman H, Benacerraf BR. Comparison between sonographic morphology and Doppler waveform for the diagnosis of ovarian malignancy. Obstet Gynecol. 1994; 83(3): 4347. [PubMed]
77.
Brown DL, Doubilet PM, Miller FH. et al. Benign and malignant ovarian masses: selection of the most discriminating gray-scale and Doppler sonographic features. Radiology. 1998; 208(1): 10310. [PubMed]
78.
Buist MR, Golding RP, Burger CW. et al. Comparative evaluation of diagnostic methods in ovarian carcinoma with emphasis on CT and MRI. Gynecol Oncol. 1994; 52(2): 1918. [PubMed]
79.
Buy JN, Ghossain MA, Hugol D. et al. Characterization of adnexal masses: combination of color Doppler and conventional sonography compared with spectral Doppler analysis alone and conventional sonography alone. AJR Am J Roentgenol. 1996; 166(2): 38593. [PubMed]
80.
Canis M, Pouly JL, Wattiez A. et al. Laparoscopic management of adnexal masses suspicious at ultrasound. Obstet Gynecol. 1997; 89(5 Pt 1): 67983. [PubMed]
81.
Carter JR, Lau M, Fowler JM. et al. Blood flow characteristics of ovarian tumors: implications for ovarian cancer screening. Am J Obstet Gynecol. 1995; 172(3): 9017. [PubMed]
82.
Carter PG, Iles RK, Neven P. et al. The measurement of urinary beta core fragment in conjunction with serum CA125 does not aid the differentiation of malignant from benign pelvic masses. Gynecol Oncol. 1993; 51(3): 36871. [PubMed]
83.
Caruso A, Caforio L, Testa AC. et al. Transvaginal color Doppler ultrasonography in the presurgical characterization of adnexal masses. Gynecol Oncol. 1996; 63(2): 18491. [PubMed]
84.
Chen DX, Schwartz PE, Li FQ. Saliva and serum CA 125 assays for detecting malignant ovarian tumors. Obstet Gynecol. 1990; 75(4): 7014. [PubMed]
85.
Chen DX, Schwartz PE, Li XG. et al. Evaluation of CA 125 levels in differentiating malignant from benign tumors in patients with pelvic masses. Obstet Gynecol. 1988; 72(1): 237. [PubMed]
86.
Chou CY, Chang CH, Yao BL. et al. Color Doppler ultrasonography and serum CA 125 in the differentiation of benign and malignant ovarian tumors. J Clin Ultrasound. 1994; 22: 4916. [PubMed]
87.
Davies AP, Jacobs I, Woolas R. et al. The adnexal mass: benign or malignant? Evaluation of a risk of malignancy index. Br J Obstet Gynaecol. 1993; 100(10): 92731. [PubMed]
88.
DePriest PD, Varner E, Powell J. The efficacy of a sonographic morphology index in identifying ovarian cancer: a multi-institutional investigation. Gynecol Oncol. 1994; 55: 1748. [PubMed]
89.
Einhorn N, Bast RC Jr, Knapp RC. et al. Preoperative evaluation of serum CA 125 levels in patients with primary epithelial ovarian cancer. Obstet Gynecol. 1986; 67(3): 4146. [PubMed]
90.
Ekerhovd E, Wienerroith H, Staudach A. et al. Preoperative assessment of unilocular adnexal cysts by transvaginal ultrasonography: a comparison between ultrasonographic morphologic imaging and histopathologic diagnosis. Am J Obstet Gynecol. 2001; 184(2): 4854. [PubMed]
91.
Fenchel S, Grab D, Nuessle K. et al. Asymptomatic adnexal masses: correlation of FDG PET and histopathologic findings. Radiology. 2002; 223(3): 7808. [PubMed]
92.
Ferdeghini M, Gadducci A, Prontera C. et al. Serum soluble interleukin-2 receptor assay in epithelial ovarian cancer. Tumour Biol. 1993; 14(5): 3039. [PubMed]