NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Institute of Medicine (US) Committee on New Approaches to Early Detection and Diagnosis of Breast Cancer; Herdman R, Norton L, editors. Saving Women's Lives: Strategies for Improving Breast Cancer Detection and Diagnosis: A Breast Cancer Research Foundation and Institute of Medicine Symposium. Washington (DC): National Academies Press (US); 2005.

Cover of Saving Women's Lives

Saving Women's Lives: Strategies for Improving Breast Cancer Detection and Diagnosis: A Breast Cancer Research Foundation and Institute of Medicine Symposium.

Show details

2Plenary Session

Introduction to the Symposium and of the Founder and Chairman of The Breast Cancer Research Foundation

, Ph.D.

Chair, Committee on Saving Women's Lives: Strategies for Improving Breast Cancer Detection and Diagnosis; and Director, Science and Higher Education Programs, Gordon and Betty Moore Foundation

Good morning and welcome to this symposium to discuss the product of almost two years' work on saving women's lives. We are delighted to see so many who have joined us. Before we begin the formal presentations, however, we have a special guest, Evelyn H. Lauder, who has come from New York to say a few words to us. Mrs. Lauder is the Founder and Chairman of The Breast Cancer Research Foundation which has supported work at the Institute of Medicine's (IOM's) and National Research Council's (NRC's) National Cancer Policy Board on breast cancer research for several years, and in particular has been an important supporter of this project and co-sponsor of this symposium.

Introductory Remarks

, Founder and Chairman.

The Breast Cancer Research Foundation

Thank you, Dr. Penhoet, and good morning, everyone. I am very flattered to be introducing this symposium and pleased, on behalf of The Breast Cancer Research Foundation to welcome all of you. We are all here because we share a common goal, to save women's lives from breast cancer. As founder and chairman of an organization that has raised over $95 million since 1993 to support innovative research in preventing and curing cancer, I know and appreciate the critical role of early detection and diagnosis.

In 1992, along with Alexandra Penney, who was then editor of Self magazine, we introduced to people all over the world the pink ribbon which has come to be recognized as a universal sign of breast health and awareness. I'm proud to say that since that time, the Estee Lauder Companies alone have distributed over 45 million of these ribbons at our counters worldwide.

In 1993, I led a delegation of Estee Lauder executives and editors from Self magazine to Washington, D.C., and we raised a window shade on which had been pinned 250,000 names. Through coverage of this event in the national press and television and our visit to Hillary Clinton at the White House, we drew attention to the fact that the federal government needed desperately to give more funds for breast cancer research. It was then that Major General Travis was designated to head a study as a result of which substantial new funds were made available for breast cancer research. So from the outset, we have been dedicated to supporting clinical and genetic research into the causes and treatment of breast cancer.

Our Foundation's grants for stellar research projects have really grown. For example, and of particular interest today, we provided major funding for the 2001 IOM and NRC report, Mammography and Beyond (Institute of Medicine and National Research Council, 2001). After its publication, Dr. Larry Norton, our scientific diretor, who is with us today, addressed the Foundation's board and told us that the IOM was appointing a new committee to embark on a study to expand on that report. Dr. Herdman and Dr. Joy can attest to the enthusiasm with which Dr. Norton's suggestions were greeted. We called the IOM right after the meeting to say that $100,000 had been pledged on the spot by members of our board for the new committee's work.

Since then, the Foundation has provided steady financial support for the project and has eagerly awaited the committee's findings, which were released to the public last week, and are being expanded and presented in greater detail at this symposium today.

I could not be more proud of the research that Dr. Norton and his colleagues on our Medical Advisory Board have recommended for support by The Breast Cancer Research Foundation. In fact, the first scientific presenter this morning will be our dear friend, Dr. Laura Esserman, whose research we have also supported for 10 years.

The work that you have all done in assessing the state of breast cancer and early detection in this country and in identifying ways to improve detection and diagnosis is of major importance. Your research will make a huge difference in women's lives, and for that, I want to thank you personally. You have fueled the determination of volunteers like myself and Peg Mastrianni, the deputy director of the Foundation, and her colleague, Anna DeLuca, who directs public affairs. You encourage us to work toward increasing public awareness and support, though magazine editorials, newspaper reporters, as well as fundraising. So I can't thank you enough.

DR. PENHOET: Thank you, Mrs. Lauder, for your comments and especially for the opportunity you have provided all of us on this committee to work on this project, to join you in the fight against breast cancer. We would also like to acknowledge the other sponsors of our work and the symposium: the Broad Reach Foundation, the Apex Foundation,1 and the National Cancer Institute.

In addition, I would like to extend our warm thanks to Dr. Janet Joy, who was our report's study director. It is hard to imagine anybody working harder for a period of 18 months; she has produced a very fine, readable report. I also recognize the vice chair of this committee, Dr. Diana Petitti, who worked closely with me and with Janet throughout the entire process. Diana, thank you so much for your help with this work today. Due to the shortness of time, I won't go through the entire roster of participants in this committee, who are listed in the front of the report. This was an extraordinary group of experts who worked really hard to achieve the tasks assigned to this committee. We are grateful to all the participants, especially the sponsors and the staff of the IOM, for bringing us to this point.

The report that we are here to discuss is the result of an 18-month study charged with examining existing and evolving approaches—that doesn't mean just technology—that hold the greatest promise for improving the early detection and diagnosis of breast cancer. The committee focused on identifying which approaches are likely to save the most lives in the near term. This includes technology in the broadest sense—from specific tools such as digital mammography, MRI, biomarkers, and proteomics to how these tools and strategies can be most efficiently deployed in clinical practice. The committee's recommendations address what we thought were the most important steps that could be taken to improve outcomes of breast cancer in the near term.

First, we have not yet optimally used the most powerful tool at our disposal, that is, mammography. So a number of our recommendations relate to the improvement of the practice of screen-film mammography and to better access to mammography.

Second, the committee believes we need technology and procedures to develop individually tailored screening strategies so that high, medium and low risk individuals can receive the type of screening that is most appropriate to them. This poses a difficult task—to stratify risk in the population as a whole. We think that the promise of genomic technology has already been realized in a few instances in breast cancer, for example, the BRCA family of genes, and that in the future this might be expanded significantly. Our ultimate objective would be to customize and optimize screening strategies for individual women.

Third, we need to address the weakest link in the pathway of technology development, that is, demonstration that a new technology or procedure truly improves health outcomes. Here, we recommend the formation of centers in the United States, either real or virtual, to integrate new technologies, particularly to integrate the basic research findings in biomarkers and proteomics, among other advances that we discuss, with clinical practice, and then once those things have become integrated, to make sure that clinical utility is demonstrated in a convincing way.

We believe these recommendations are fully consistent with the new initiative at the National Institutes of Health (NIH), the road map, which seeks to better integrate basic research with clinical practice.

The purpose of this symposium then is to discuss the implications of the recommendations in this report, as well as how they might be implemented. Several members of the committee are here today. You will hear from some of them, as well as other experts who have not been directly involved in the report. This symposium also will provide an opportunity to discuss the issues and complexities surrounding the early detection of breast cancer in much greater depth than is possible in a press conference. This morning we will hear from a series of speakers who will be addressing different themes following the outline in the report. In the afternoon, there will be two concurrent group discussions, ending with a plenary wrap-up discussion.

The Pros and Cons of Screening Mammography: What Women Need to Know—An Overview of the Report's Findings on Mammography

, M.D., M.B.A.

Director, Carol Franc Buck Breast Care Center and Professor of Surgery and Radiology University of California, San Francisco

The first thing that women need to know is that mammography is early detection, not prevention. The World Health Organization has set out principles for detection through population-based screening. The disease should be serious and prevalent, like breast cancer. There should be a detection test that is sensitive and specific, well tolerated, inexpensive, and that changes therapy or outcome. This is important, because population-based screening is very different from individual screening. We are going to talk about what that means, and also how our new understanding of the biology of breast cancer should affect our approach to screening. The goal for breast cancer screening is not to detect every possible breast disease or abnormality but rather to prevent deaths from breast cancer. That is the ultimate test of whether screening is of value.

What are the pros and cons of mammography? There have now been seven randomized trials that demonstrate 20 to 30 percent reductions in the relative risk of breast cancer mortality depending on age at screening. Mammography finds cancers at an earlier stage than detectable by physical examination. Small cancers are less likely to metastasize and are therefore more likely to have good outcomes. Small cancers are also more amenable to breast conserving treatment approaches and better cosmetic results. There really is no other technology that has been shown to systematically find tumors at an earlier stage, and 12 countries have implemented systematic screening programs.

Participants in the global mammography summit in June 2002 reviewed the available evidence and unanimously decided that there was no reason to change screening programs. But they also noted that mammography is only one part of the total management of a woman with breast cancer; integration with further diagnosis and treatment is critical.

So, why should there be any controversy? Well, mammography doesn't find all cancers. The sensitivity is 83 to 97 percent; it depends on age and probably breast density as well, which is related to age. Mammography has a relatively high false positive rate, which is important. Ten percent is high for population-based screening. Mammography is resource intensive. Quality is actually quite variable, depending on how mammography is performed.

The absolute number of women benefited by mammography is very different from the relative risk reduction. There is perhaps a four to six percent reduction in absolute mortality as opposed to the 20 to 30 percent reduction in relative risk. The absolute reduction also depends on age, and the value is quite a bit lower for young women, perhaps in the 1 to 2 percent range.

Finally and importantly, finding cancers early does not guarantee cure. Biology can trump detection, and better understanding biology is an important theme of this report. Mammography has also led to a very large increase in the detection of in situ cancers; some call this the friendly fire in the war against cancer, an apt description, I think.

Do we just need a more sensitive screening detection tool? Maybe we should be using ultrasound or magnetic resonance imaging (MRI), even though they are three- to fifteen-fold, respectively, more expensive than screen-film mammography. MRI, for which there has been considerable enthusiasm, can identify tumors that don't form masses and tumors in dense breast tissue. MRI certainly is useful when we know someone is at extremely high risk for developing breast cancer. For women with genetic susceptibility, some of whom are known to have an 80 percent lifetime risk of developing breast cancer, MRI, even though expensive, has been shown to be much more sensitive than screen-film mammography, particularly because those being screened are young women with dense breast tissue in whom underlying breast tumors are more likely to occur (Kriege et al., 2004)

The problem is that MRI is too sensitive. It finds all kinds of things that aren't cancer, but whose significance is unclear, and its performance in detecting cancer is not better in women with fatty breast tissue who make up the majority of women ages 50 to 70 for whom screening is designed. Furthermore, biopsy tools are not readily available. Lastly, it is also too expensive for a general population screening test. That is probably also true for ultrasound, which is very labor intensive.

These factors bear on how we want to think about population-based screening with the objective of saving women's lives. Screening has enormous impact. Economically, mammography in aggregate U.S. cost is somewhere in the six to ten billion dollar range. Emotionally, women who are called back for low risk lesions or women with indolent disease, who assume they have life threatening disease, pay an enormous price, if in fact these things would not have otherwise come to clinical attention.

The sensitivity and specificity of mammography depend to some extent on who interprets the image. High sensitivity, that is, the chance that a mammogram of a woman with breast cancer will be correctly interpreted as positive for cancer, is clearly desirable. But if you find absolutely everything, at some point you are going to pay the price of a high false positive fraction, or 1 minus the specificity (the specificity being the chance that the mammogram of a woman without breast cancer will be correctly interpreted as negative for cancer).

Historically, it was thought that this was just a simple tradeoff, but it is not. Sensitivity and specificity are highly variable. Some breast imagers are more experienced and/or skilled than others, and, as exemplified in Figure 2.1, these imagers (represented by the curve for “better” interpreters) have better ratios between true positive and false positive readings of mammograms. In short, the chance that a breast cancer will be detected depends in part on who reads the mammogram, and the chance of having a false positive also depends on the quality of the interpretation. (Of course, other factors are important, as well, such as the quality of the image, positioning and compression of the breast, among others.)

FIGURE 2.1. Better interpretation means a better ratio between true and false positive cancer readings as shown in this ROC (receiver operating characteristic) schematic.

FIGURE 2.1

Better interpretation means a better ratio between true and false positive cancer readings as shown in this ROC (receiver operating characteristic) schematic.

In the U.S., over 75 percent of biopsies following mammography are not cancer. Although there is variation in biopsy rates internationally, variation may be greater in the U.S. perhaps than in countries where they have more focused screening programs.

The organization of care clearly affects the quality of screening. High volume, experienced mammographers find the most cancers and miss the fewest cancers. Furthermore, coordinated teams make sure that, in the trajectory of care, the right procedure is done at the right time. We think that integration of the various aspects of care and feedback (learning from experience) is critical to optimizing performance.

Where there are focused, organized screening programs, the fraction of positive operative breast biopsies can be 80 to 90 percent (UK) or 85 to 95 percent (Sweden) compared to the U.S. fraction of 20 to 70 percent. Data such as these suggest the value of high volume screening programs and support the conclusion that in some countries, mammographic interpretation is more consistent and of greater specificity than in the U.S. Such effective, highly-organized programs can also be more efficient with the potential for significant cost savings as indicated in Figure 2.2, which shows the high aggregate cost differences between high recall, low cancer to biopsy ratio (CBR) and low recall, high CBR programs. Higher quality can also be much more cost-effective (Burnside et al., 2001).

FIGURE 2.2. Cost differences between programs of varying efficiency.

FIGURE 2.2

Cost differences between programs of varying efficiency. CBR = cancer to biopsy ratio.

We are also beginning to understand, through molecular fingerprinting, that all breast cancers are not the same. We can tell the types of cells from which breast cancers arise—where they are in the milk duct—and these different sources affect the outcomes for patients. As the survival curves in Figure 2.3 show, some types of breast cancer have a much higher risk of progression to distant metastasis and death (Sorlie et al., 2003).

FIGURE 2.3. All breast cancers are not the same.

FIGURE 2.3

All breast cancers are not the same.

For example, new data from the Women's Health Initiative, shown in Figure 2.4, indicate that some populations, such as African American women, have different types of breast cancer, so that, although the frequency may be lower, more aggressive, poorly differentiated plus estrogen receptor negative tumors are much more frequent.

FIGURE 2.4. African American women have significantly more aggressive, poorly differentiated, estrogen receptor (ER) negative breast cancer.

FIGURE 2.4

African American women have significantly more aggressive, poorly differentiated, estrogen receptor (ER) negative breast cancer. * Significantly increased

Breast cancer from milk duct luminal cells is more frequent in older women and is more likely to be well differentiated and amenable to treatment. Breast cancer from basal cells, which is more frequent in younger women, is often more aggressive and less likely to be amenable to treatment. Patient populations which are more likely to have aggressive, fast-growing tumors that are discovered when they are larger or have spread and are, therefore, less responsive to treatment, are not as likely to benefit from screening. That is relevant to what Dr. Penhoet was saying about screening and stratifying risk.

So the lessons from biology are that tumors grow at different rates. Fast growing tumors may not be caught in time by mammography. Other slow growing tumors provide time to do interval screening and detection before they are a threat to the patient. And there are still other cancers that may grow so slowly, particularly if they are in older women, that they may never come to clinical attention. So part of our strategy has to be designed in concert with our new understanding of the biology of breast cancer.

Returning to the implications of less frequent but more aggressive breast cancer originating in basal cells in African American women, one would expect perhaps that screening would make more of a difference in Caucasian women—more cases to find, more chance of making a difference through early detection. This does appear to be true, at least in part, as indicated by Figure 2.5 from the IOM report. Breast cancer incidence in African American women remains lower than in Caucasian women over time, but mortality is greater and has not been as affected by increasing rates of screening. While other factors may explain some of these effects, such as access and treatment variables, it is also important for us to integrate our understanding of biology when interpreting data like these. We should not expect too much from mammography. We know that mammography is not going to make a difference for all breast cancers, and breast cancer is not one disease. And mammography cannot be expected to find all cancers either.

FIGURE 2.5. Consider biology when interpreting mammography data.

FIGURE 2.5

Consider biology when interpreting mammography data. SOURCE: Surveillance Epidemiology and End Results Program, NCI.

The cost of mammography often exceeds reimbursement, and this is also important in thinking about strategies. Having run a digital mobile van service for underserved women, I know just how significant reimbursement is. If you want to provide mammography for the underserved, you will need to raise a lot of money. Figure 2.6 shows that poverty, not race, is the greatest barrier to screening.

FIGURE 2.6. Poverty is the greatest barrier to mammography screening.

FIGURE 2.6

Poverty is the greatest barrier to mammography screening.

Age is another important consideration in screening strategy. Mammography is most effective in women 50 to 70 years of age. Returning to the biology, we know that this is a population of women who are at greater risk for breast cancer and have cancers with slower growth rates. Unfortunately, screening rates decline in older populations. Less than two-thirds of women 65 to 70 are getting screened. In late age perhaps other competing risks of death make breast cancer less important. But the cost-effectiveness of screening mammography is at least three to five times higher in women aged 50 to 70 compared to those aged 40 to 50. In the younger women, the risk of having cancer is lower, the sensitivity of mammography is lower, the recall and biopsy rates are higher, and there are probably more of these tumors that metastasize early. Some of this information is summarized in Figure 2.7.

FIGURE 2.7. Advancing age and mammography.

FIGURE 2.7

Advancing age and mammography. SOURCE: Breast Cancer Facts and Figures, updated to 2003-2004, American Cancer Society.

What do we do with the information from mammography? The image interpretations are reported as BI-RADS® scores (Breast Imaging Reporting and Data System™), which is the standard way of classifying mammograms. A BI-RADS® 5 means probably (>75 percent) cancer, and a BI-RADS® 3 means probably benign (1-2 percent chance of cancer, 6 month follow-up recommended). A BI-RADS® 4 is called suspicious, and women are sent a letter to that effect. But there is enormous variation in the odds of having cancer in this category—from 3 to 75 percent. It might be an in situ pre-cancer, or it might be an invasive cancer, and it might often lead to biopsy and contribute to the false positive rate and create quite a bit of alarm. Some of this could be avoided by framing the information properly and having some self control. A new classification is going to have BI-RADS® 4A, 4B, and 4C, which is going to stratify the suspicious category, but it will take some time to disseminate a new system and determine what effect it will have.

My friend Gilbert Welch in his book, Should I Be Tested for Cancer, which I strongly recommend to anyone who is interested in this topic, points out that in our zeal to make a difference through screening, we need to be sure that we are making the right difference (Welch, 2004). Screening is a good thing if the incidence of detected late stage cancers is going down, and the incidence of detected early stage cancers is going up, even if the total incidence of cancers stays the same. But a stable incidence of late stage cancers with an increasing incidence of early stage cancers may indicate a problem. We may be seeing a misleadingly higher survival rate which is only due to inclusion of more early stage cancers, reflecting an increase in ductal carcinoma in situ (DCIS), that is, tumors that might not have had any consequences for the patient. Actually, presumably as a result of screening detection, the incidence of invasive breast cancer has increased and of late stage cancer decreased somewhat, but the incidence of DCIS has gone up dramatically. That is why biology becomes such an important part of our recommendations—to think about what it is that we are detecting, and how to treat it, making sure that we use the screening information judiciously.

Ductal carcinoma in situ is a disease in which the cells lining the lumen of the milk duct look like cancer, but they have not developed the capacity to invade. As such, they are 99 percent curable, but they comprise a lot of the breast cancer cases that are detected through mammography, almost 50,000 women a year. These are healthy women at high risk for invasive cancer in the future who don't have life threatening disease at the moment. We need to be careful about what we do with this information.

The mammography controversy was partly fueled by the fact that finding more DCIS was generating increasing mastectomy rates. These have subsequently decreased because more patients are being treated with breast conservation. But I feel that we have an incredible opportunity to use DCIS, now that we have this biomarker, to figure out how to prevent breast cancer if we would have patience and stop operating right away on these patients. As a surgeon, I feel I can lead the charge in this area. As technology finds evidence of cancer earlier and earlier, we should accompany that by parallel treatment and prevention strategies that are appropriate.

What are the solutions that the IOM report recommends? The first is that if you are going to screen in the first place, then screen well. Screening mammography can definitely be improved. The chance of being called back for an abnormal mammogram is as high as one in ten. The chance that a woman will have a breast biopsy or some kind of workup for an abnormal screen after 10 years of screening is one in two. Breast cancer at any given screening is a low frequency event, 1 or 2 cancers per 1,000 mammograms from women ages 40 to 50, and 5 to 7 cancers from women ages 50 to 70. And the quality of mammography is variable.

At the same time, our ability to screen well is threatened. The number of radiologists willing to read mammograms is declining. Well trained mammographers are in short supply. The cost of mammography exceeds reimbursement. Even though we know that well-trained mammographers find more cancers and have fewer false positives, probably less than a third of our mammograms are being read by such individuals.

Some of our solutions involve focusing on leveraging emerging technologies, better organization of services, and quality improvement. Risk stratification of women might reduce the volume of mammograms and allow us to focus on women with the greatest frequency of cancer. Mammographic services and interpretation concentrated in centers of excellence that are integrated into multidisciplinary care are an important part of the solution. New technologies such as digital mammography create opportunities for change such as the addition of technologies like computerized decision aids which can reduce the difference between the average mammographer and the expert mammographer.

We recommend considering adoption in the United States of elements of successful breast cancer screening programs from other countries, including centralized expert interpretation, regionalization of programs, outcomes analysis, and benchmarking. We think it is important to collaborate with health-care providers and payors to improve quality; to develop and adopt practices that promote self improvement; to develop and disseminate technologies, such as computer aided diagnosis that will improve quality; and to expand the capacity of breast imaging specialists by specially trained allied health personnel.

We recommend integrating biology, technology, and risk models to develop new screening strategies for breast cancer. This will involve harnessing emerging molecular applications to not just find cancer, but to determine what type is present or likely to develop, and enable reasonable predictions about treatment and outcomes. Some of these applications may emerge from proteomics. Examination of patterns of proteins in the blood as a way of screening raises exciting possibilities that will require careful development and application.

The two major drivers of cost in mammography are the percent of women screened and the cost per mammogram. A cheap easy test that would allow us to limit mammography only to those women considered to be at risk would be a home run. So it challenges us to think not just of having competing technologies for mammography, but to integrate them in a better way. In the future, tests might be layered for optimal effect: screening by proteomics; susceptibility testing; inherited genetic variations (BRCA mutations, SNPs); or nipple fluid aspirate analysis to identify the risk of having or getting breast cancer, then using imaging techniques, like mammography, MRI, or ultrasound, as localizing tools or secondary screens; and finally using some of these techniques (MRI, PET, various probes and biomarkers) further to monitor response to therapeutic interventions. Harnessing risk discriminators and biomarkers together in a multidisciplinary way would give us the power to make progress in guiding screening and intervention strategies. Expression profiles, looking at circulating tumor cells, and characterizing these tumors might help us understand their prognosis and how they might respond to therapies or help us generate new ideas for targeting therapies.

Of course, to move in this direction, we will also need to ensure that these concepts of risk are better taught. Decision aids are going to be needed. We have recommended that research funders help develop tools that facilitate communication regarding breast cancer risk to the public and health care providers, so that we really understand the various risks and benefits, including the risks associated with emerging biomarkers. That means finding ways to teach women about their risks and the benefits of interventions. All development, of course, should be in the context of what is currently clinically practiced, because otherwise, you may find that what you are developing is not terribly useful.

Our final recommendations involve improving the environment for research and development of new technologies for breast cancer detection. In the report, we have highlighted the need to try and create centers where people from the research, imaging, and screening communities are working as a team to put all the pieces together. We know that the optimal use of new technology requires specific attention to implementation and dissemination. We were fortunate to have had members of our committee that were very much involved with operations management, or the dissemination of technology, and we understood that perhaps the hardest thing of all is changing practice. So paying attention to how we re-engineer care, help people redefine their jobs, and how we monitor that change, is absolutely essential.

In conclusion, the goal of early detection needs to be integrated with subsequent strategies. Understanding risk, applying early methods to find out if a cancer is present, understanding the likelihood of disease progression or whether someone can be left alone, understanding what tests can tell about a woman's responsiveness to systemic therapies or whether new strategies are needed, developing the idea of targeted prevention and therapy, we believe these are among the strategies that will lead to saving women's lives.

Challenges to Expanding Mammography: Better Quality for Women in Screening Sites

, M.D., Professor and Chair.

Department of Radiology, University of Michigan

Today, I want to talk to you about mammography, both screening and diagnostic, and ultrasound. Radiologists also perform a variety of procedures—aspirations, core biopsies, needle localizations, and now getting into ablation techniques. In essence, the breast radiologist is the primary care physician for breast disease.

The challenges before us regarding access to mammography include a human resources shortage, high liability risk associated with mammography, and relatively low reimbursement. The countermeasures include producing more radiologists, improving their productivity (work harder, physician extenders, computer assisted diagnosis), initiating tort reform, and adjusting the payment schedule.

Let's start with the human resources shortage. Demand for radiology services is rising at about one percent per year because of population growth. In addition, demographic changes, like aging of the population, may increase the need for radiology services by a further half a percent a year.

The biggest change however has been in medical practice. The sophistication of imaging technologies has reduced the value of the clinical history, has made the physical examination almost trivial, and has provided a tremendous amount of essential data for diagnosis and treatment. We have seen a shift from plain-film radiography to cross-sectional imaging techniques. Although these techniques are more expensive, in terms of information per dollar spent they are actually less expensive than the older techniques.

I summarize in Box 2.1 that there is approximately a 6 percent increase in growth of radiology services annually. Unfortunately, we are not producing radiologists fast enough to meet this demand. We train approximately a thousand new radiologists each year, but about 500 retire each year, so we have a net annual increase of only approximately 500 radiologists. If we assume that there are 33,000 practicing radiologists, then the growth rate in radiologists is only 1.5 percent per year. Demand up 6 percent, supply up 1.5 percent—we have a gap of 4.5 percent each year.

Box Icon

BOX 2.1

Demand for Radiology Services Is Rising.

How did we get into this situation? In the early 1990s, with the spread of managed care, we began to hear predictions that the need for ancillary services in health care systems would decline by 30 to 50 percent. At the University of Michigan, one analysis (Billi et al., 1995) predicted that the 45.2 full-time-equivalent (FTE) faculty that we had in our department in 1992 would drop to somewhere between 12 and 16 FTEs as a result of this reduced utilization of ancillary services secondary to managed care. That prediction was a bit off. We are now at 80.4 FTEs and climbing.

So what can we do? The first thing would be to train more radiologists. The problem here is that the Centers for Medicare and Medicaid Services (CMS) capped the number of positions allowed for graduate medical education payments under Medicare at their December 1996 levels. As a result of the managed care scare, many programs reduced the number of radiologist trainees. Some reductions were voluntary, in anticipation that there would be insufficient radiology jobs, and others were less than voluntary as the institutions converted some radiology positions to primary care positions.

A number of programs were actually eliminated. They felt they were un-needed. They may not have been doing a very good job in the first place, and maintaining the residency training programs was not really worth the effort. Most of these were not in our university programs, but in private practice or multi-specialty clinic settings.

So this cap that CMS imposed froze those reimbursable positions at radiology's nadir, and now we are stuck. We can no longer respond as a normal market economy; we are limited by that artificial cap. Box 2.2 shows the number of residency positions offered by the radiology matching program, the precipitous decline in the late 1990s, and the gradual slow increase thereafter. We have increased the number of residents at the University of Michigan twice, and a number of other programs have also been able to do that, but we are still not back to the levels we had prior to this managed care scare. Over this time period, the percent of the radiology positions filled through the matching program has risen to 99 percent, and the final percent or two fills from the pool of applicants who did not match, so all residencies are filled.

Box Icon

BOX 2.2

Residency Positions Offered. NOTE: Residency positions in radiology offered through the matching program fell dramatically after 1996 and have not recovered. SOURCE: National Residency Matching Program.

International graduates comprise a second potential source of radiologists, but there are difficulties in identifying and assessing the credentials of candidates. They train at institutions and in systems with which we often are not familiar, and it is not easy to determine how good they may be or to interview them. Once identified, there is the increasingly difficult visa problem and the different state medical licensing rules and procedures. And finally, board certification is yet another hurdle that is a particular problem for mammography because of Mammography Quality Standards Act (MQSA) regulations.

If we cannot increase the size of training programs or recruit international graduates, could we retard the retirement of existing radiologists? There are possibilities here. Some radiologists may be ready to retire, but perfectly happy to continue in a part-time rather than a full-time capacity. We can also allow sub-specialization. Many people like to do one thing, but don't like to do another thing. We can make their job description more focused and induce them to stay within the work force.

It is worrisome that the human resource shortage is affecting academic programs more than private practice, and that it affects mammography more than other areas in radiology. These are discouraging trends, because academic medical centers are training the next generation, and if we agree with Dr. Esserman that it would preferable to have mammography services delivered by experienced, high-volume breast imagers, the academic centers are the places with that expertise and volume.

Mammography is affected disproportionately by the trends mentioned. The problem is that radiologists choose other fields. If there are not enough radiologists to do all the work available, which work would you choose to do? Would you choose a field in which professional liability is high, reimbursement is low, and regulation is significant, or would you choose something else? Radiology help-wanted ads reflect this. Over the seven year period from 1994 to 2001, the fraction of these ads specifically looking for mammographers has grown disproportionately, from 6.4 to 10.2 percent. This happened during a time when want ads for radiologists of all kinds were more than doubling, so the actual numbers of ads for mammographers grew dramatically (Saketkchoo et al., 2002).

The malpractice issue is very significant for radiologists but particularly for mammographers. I wish the data on this were more up-to-date; studies are primarily from the 1990s. They show that misdiagnosis is one of the most common causes of malpractice litigation in radiology. Although bone disease, typically fractures, was the most common at the time of the study (Berlin and Berlin, 1995), breast cancer was a close second and was rising at a rate that suggests that if data were more recent than 1994, breast cancer would surely be number one in terms of liability.

In discussing mammography and reimbursement, we should be clear about the differences between screening and diagnostic mammography. Screening means the patient is asymptomatic; the exam is performed by a technologist, and the radiologist reads this at a later time. This allows him or her to read a large batch of screening mammograms in a quiet, undisturbed environment, and that can be relatively efficient. A diagnostic mammogram, on the other hand, means that the patient has a history of breast cancer, or an abnormal physical examination—a mass, bloody discharge, pain—or a prior abnormal mammogram. This is a real-time process. The radiologists must be present to look at the films. They may ask for repeat or different views, and then will then look at those, too. There is constant interruption in this process, and it is relatively inefficient.

This difference is not really reflected in either the relative value unit assignments to these procedures or the resulting reimbursement. Although Medicare fees vary somewhat in different regions, we can use Michigan as an example. For our carrier, the professional component of the fee for screening mammography is $40 and for diagnostic mammography $49. Our mammography group would say that the difference in effort between these two is three to one, not five to four. So we think there is an imbalance here.

Furthermore, compare this with other radiology options, for example, reading an abdominal CT scan, for which the payment is $72, or reading a head MRI for which the professional component is $134. Clearly, mammography is not economically attractive.

The next thing we could do to enhance access is increase the radiologists' productivity. Radiologists could work harder. We could use physician extenders. We could take advantage of technology, and we have computer assisted diagnosis systems that might be employed.

In recent times, radiologists' work loads have been increasing (Bhargavan and Sunshine, 2002), most notably in academic radiology (23 percent), but also in multi-specialty practice (17 percent) and even private practice (8 percent). Furthermore, a recent survey (Sunshine et al., 2002) reports that a majority (51 percent) of radiologists believe they already have too much work. These findings do not encourage us to think there is a likelihood that productivity will close the gap between supply and demand.

What kind of physician extenders do we have that might address the access problem? I think that we should have each job done by the person specifically trained to do that job. So I want radiologists spending as close to 100 percent of their time as possible doing work that only radiologists can do. I do not want them hanging films, retrieving old films, filling out quality assurance forms, or looking up data. Ultrasound in our institution is done by the radiologist, whereas ultrasound generally is done by a technologist. We need to train more technologists to do breast ultrasound, rather than having the radiologists do that. And physician extenders could take on the hugely important task of patient education.

What else might these physician extenders do? Could they prescreen screening examinations? Might they identify all of the clearly normal mammograms enabling the radiologist to move even more quickly through them for a final interpretation? Might they flag abnormal examinations, and, by adding another set of eyes, reduce the number of misses by the radiologist? Would this in essence be double reading? There are a number of publications that report that technologists and other non-physicians can be taught to do that (Hillman et al., 1987; Sumkin et al., 2003).

There are technologies that we could employ to make us more efficient. PACS (picture archiving and communication systems), which allow the storage and transmission of images, have great potential. This requires that the radiology be digital, and we are anxiously awaiting the results of the trial that Dr. Pisano is leading through the American College of Radiology Imaging Network comparing digital and conventional film-screen mammography in almost 50,000 women.

Voice recognition allows you to dictate your report at the time you remember the examination and have it ready right away; there are a number of ways that we could make that more efficient. Electronic medical records allow the use of a computer at the radiologist's side to look up any pertinent information on the patient. In fact, I think we can even do better than that. Relevant literature could be reviewed at the same time as a particular case is seen, including what patients with the same diagnosis look like. Electronic teaching files make it possible, while examining a particular image that might be amyloid of the breast, to review a series of patients with breast amyloid to see if the features are the same.

Computer assisted diagnosis (CAD) is something that many of us are quite optimistic about. These systems mark suspicious areas like micro-calcifications or masses. In one study (Astley et al., 2002) CAD was able to detect 87 of 90 cancers which is excellent. On the other hand, it also marked 556 of 810 normals. Would this help improve the sensitivity? My guess is that it might. On the other hand, you still need a radiologist to see what your CAD system is doing, so that you don't call back two-thirds of your patients.

In summary, my recommendations are as follows. To address the problem of too few radiologists, raise the CMS cap on house officer positions. This could either be an institutional increase, in which case radiology would be competing with all of the other subspecialties because the demand for more physicians in many other areas has not decreased since 1996, or it could be specifically targeted to radiology.

Second, to encourage radiologists to provide mammography, the problems of high liability and low reimbursement need to be addressed. We need tort reform and an adjustment of the reimbursement schedule.

Third, to increase the productivity of the radiologists we have, we should continue technology development. There are a number of exciting possibilities. I am optimistic about digital mammography. Our electronic information systems have been a great help in many areas in radiology and of course, I am looking forward to the use of CAD more extensively throughout the country. I think these technologies may help all of us save women's lives.

Better Models for U.S. Mammography Services: Implications for Accuracy and Encouragement of Screening. Better Quality Through Better Organized Mammography

, Ph.D., Director of Cancer Screening.

American Cancer Society

What might we achieve, or not achieve, through better organized screening? Much of this presentation grew out of a project supported by NCI, CDC, and the American Cancer Society and led by Dr. Helen Meissner at NCI, that we have been working on for the past two years called Lessons Learned About Cancer Screening, published as a supplement in the journal, Cancer (Meissner et al., 2004).

Screening is not a single event, but rather a cascade of events. It begins with an invitation to screening based upon risk—for average risk adults generally age and gender. It continues with the test itself. It is very important that that test be of high technical quality and that it be interpreted accurately. The results will be negative, positive, or indeterminate. Based upon those results, a timely follow-up in the near term or after the recommended screening interval will be required. Most women who undergo breast cancer screening will need repeat screening after a year or two years. In this series of events, there is tremendous potential for slippage, and failures at any one of the steps can nullify any gains or the high quality of any previous step and reduce the value of screening.

A successful screening program requires participation by a target population and health care providers and adherence to recommendations, especially the screening interval. Screening intervals are established based upon estimates of a detectable preclinical phase, the sojourn time. For all testing, we need to have adherence to quality assurance standards. For those women who have positive test results or even those women who have normal results, we need to have timely and thorough follow-up. Women who are diagnosed with cancer need to have state of the art treatment. Finally, we should have a comprehensive surveillance system to measure program performance, and we especially need feedback to participants. Participants include not only all the professionals involved in screening, diagnosis, and treatment, but it is also important that women hear how well we are doing in reducing deaths from breast cancer.

Let's begin with adherence to screening recommendations. In the United States screening is opportunistic, as contrasted with organized screening, which is more common in Europe. This means that it depends upon a coincidence of interest between providers and patients. The lack of population registries or reminder systems in the United States means that most American women do not get regular mammograms. Utilization of mammography is high, but utilization of regular mammography at appropriate intervals is not so high. For example, data from the New Mexico Mammography Project show that in that screening program 30 percent or fewer of women adhere to annual screening recommendations (Gilliland et al., 2000).

Failure to obtain regular mammograms is not harmful for the large majority of women who do not have breast cancer, but out-of-interval mammography does increase the risk of being diagnosed with advanced disease for those women who develop breast cancer during the missed interval.

Data over almost a ten year period from the Massachusetts General Hospital on a series of nearly 60,000 women and nearly 200,000 screening mammograms that resulted in the detection of 604 invasive cancers showed that tumor size was strongly associated with regular screening (Michaelson et al., 2002). Table 2.1 shows that at first screening, mean tumor size was 13.7 millimeters; on subsequent screens in women getting regular screening, mean tumor size was smaller (11.7 mm). During the same time period, mean size of 206 invasive tumors in women who were never screened was a centimeter and a half. Intervening breast cancers, that is, cancers that were diagnosed at some point after a normal screening mammogram were 16.8 mm overall, but larger if the interval since last screening exceeded one year.

TABLE 2.1. Consequences of Screening Patterns for Breast Cancer Detection.

TABLE 2.1

Consequences of Screening Patterns for Breast Cancer Detection.

Fifty percent of women in this series did not have their first mammogram until after the age of 50, although 25 percent of the cancers were diagnosed in women under the age of 50. Among women screened for breast cancer, the majority did not return for repeat screening within a 12 to 14 month interval, and by a year and a half after their last mammogram, only 50 percent of women had returned. Twenty-five percent of breast cancers were found in women with no history of a prior mammogram. The median tumor size for these women was a centimeter and a half, compared to a centimeter in women attending regular screening (not shown in Table 2.1). Thirty percent of breast cancers were not found on mammography, and these were also larger than those found on women who followed screening recommendations.

As the data on these intervening cancers in Figure 2.8 show, only three percent were found in the first six months after the last normal mammogram, and only nine percent were found between 6 and 12 months. So just a little over 1 in 10 of these were found in the first 12 months after screening, what we would call interval cancers. Based upon the estimates of doubling time, the majority of these tumors emerged as larger palpable masses, not because they were missed, but because simply too much time had elapsed since the last normal mammogram. In fact, Michaelson and colleagues concluded that since so many of these women did not adhere to screening recommendations, almost 50 percent of the invasive tumors were larger and, thus, potentially more lethal.

FIGURE 2.8. Breast cancer found after mammography.

FIGURE 2.8

Breast cancer found after mammography. SOURCE: Michaelson et al., 2002.

As noted earlier, opportunistic, that is, encounter—not population—based screening is one of the reasons for poor adherence to mammography screening recommendations. The situational context of encounters, then, is a limiting factor. Box 2.3 summarizes the problems with this approach to preventive health.

Box Icon

BOX 2.3

Deficiencies in Opportunistic Screening. Opportunistic (i.e., coincidental) preventive care is inherently limited: Encounter based, not population based

In contrast, organized screening relies on a systematization of care, that is, institutional policies, population registries, computerization, reminders, chart tools, and audits to meet standards of care. Screening in this context, then, is less dependent on encounters. Outreach is easily modified for individuals and populations. There are fewer demands on providers. The systems for invitations, appointments and follow-up can be integrated, and there is the built-in potential for evaluation which is more efficient and less expensive than chart audits.

Compared to the systems in Europe, our non-system, or aggregate collection of unconnected small systems, is inherently complicating for developing organized screening. Could it be individually based? In other words, should individuals keep their own reminder systems? Some Internet systems provide a calendar that can notify a woman when she is due for screening or some other encounter, or she can use a small booklet, such as those provided by Putting Prevention into Practice (Reynolds, T., 1999). Should systems be office-based where both chart reminders or office computer systems are possible? Could we construct a central program based on population registries? We have population registries, but they are typically not used for health. Could the state health department take responsibility? Or could a health plan or a consortium of health plans unite together around some common goals of tracking and notification for preventive care?

Different outreach models have shown different degrees of success. Encounter based systems can improve cancer screening rates, but they are inherently limited because patients must initiate encounters. If a woman does not return to her doctor in a year or two or three, the physician will not be looking at the chart and seeing that that patient needs screening. Continuity with providers and practices is a major problem today as people cycle in and out of health plans. With increasing age, encounters are more likely to be chronic visits versus preventive visits. Only about one in four adults over the age of 40 gets a regular checkup.

Preventive service office systems depend on establishing practice routines, tools such as flow sheets, or defined responsibilities among clinicians. Paper based systems are effective, but computers measurably increase productivity. Unfortunately, just because you build it doesn't mean it gets used. The literature documents high failure rates among these systems; oftentimes they work very well when first initiated, and then their efficiency decreases.

A meta-analysis of 108 studies of strategies to increase rates of adult immunization and cancer screening through interventions including reminders, organizational change, feedback, education, financial incentives, legislative change, mass media, and even separate preventive care clinics found that organizational change was most effective in improving rates of preventive care. Among such effective changes were use of separate clinics devoted to prevention, use of a planned care visit for prevention, and designation of non-physician staff to do specific prevention activities (Stone et al., 2002). Financial system incentives and reminder systems run second in effectiveness, and patient education, of course, was the least effective intervention. It should not be surprising that dedicating a time, place, and staff to preventive health is a more successful strategy than attempting to achieve some preventive health goals during encounters for acute and chronic conditions.

To summarize, both office systems and centralized systems have been shown to improve use of preventive care. When comparing an outreach system with an encounter system, patient-initiated encounters are the major limiting factor. Chart reminders and chart audits are less effective and less cost-effective than computerized outreach systems, and centralized systems provide for continuity and reduced stress on the practice and the individual provider. However, they do not eliminate the need for office routines and policy.

Given the way health care is not organized in the United States, single disease interventions have greater short- versus long-term potential; these programs, such as single disease computer software tools, generally may show some benefit, but inherently they will be less attractive to patients and providers than a comprehensive system. In addition, one reasonable strategy for breast cancer, since I think that a centralized system is relatively hopeless at the moment, might be simply to encourage radiology departments to manage the call-recall system.

Ultimately, a more organized approach to breast cancer screening would monitor population-based access. It would improve standards of screening based on evidence. It would monitor performance in terms of detection of small cancers, and it would implement technological improvements in early detection. However, there is no organization in the United States charged with ensuring that the availability of mammography is adequate to meet screening needs. There is no organization charged with ensuring that American women have access to mammography.

As shown in Figure 2.9, there are over a thousand fewer mammography facilities today than existed in 1994, yet we do not know whether this represents a consolidation of facilities and actually greater efficiency, or whether it represents markedly less access. We know that the decline has been greater in rural areas, so it is reasonable to suspect that rural women may have less access to mammography than they had previously. We also know that it is hard to reconcile a decline in facilities and an increase in units. Capacity at some of these facilities is clearly increasing, but we have no idea whether it is increasing enough.

FIGURE 2.9. Declining mammography facilities.

FIGURE 2.9

Declining mammography facilities. SOURCE: FDA data.

The GAO report on utilization (GAO, 2002) estimated that, assuming 4,500 exams per 10,000 women per year, there should be approximately 2.2 mammography machines for every 10,000 women in the population if around 90 percent compliance with screening recommendations is the objective. Only five states have a ratio of 2.2 machines per 10,000 women, and 11 states have a ratio of two per 10,000 women, so it seems that the majority of states do not have the capacity to deliver recommended services at the 90 percent adherence rate.

According to an American Hospital Association survey, in 2001 the job vacancy rate for technologists was 18 percent, and almost two-thirds of hospitals reported difficulty recruiting them. Also, fewer technologists were seeking mammography certification. The attractive jobs for radiologists are also the attractive jobs for technologists, and the unattractive jobs for radiologists are also quite unattractive to technologists. So, as mammography has become less appealing to radiologists, it has also become less appealing to technologists for many of the same reasons.

In a telephone survey that examined radiology residents' attitudes towards going into mammography, 63 percent said they would not like to spend 25 or more percent of their time in practice interpreting mammograms. The most common reasons for avoiding mammography were that it was not an interesting field, the risk of litigation was too high, and it was too stressful. Most residents (64 percent) reported that they would not consider a fellowship in breast imaging if offered (Bassett et al., 2003). As noted earlier, mammography is also unappealing to radiologists because earnings are comparatively lower, the financial contribution to the practice is small, and mammographers, therefore, are not as appreciated as some other members of a radiology practice. Work load, stress, and malpractice exposure are high. The procedure is repetitious and low tech, and it is not highly respected by colleagues.

I think each of these complaints can be addressed. We can certainly do something about earnings and the financial contribution to a practice. If we perform mammography more efficiently, and we raise the reimbursement rate, then the contribution is going to be higher. There are a number of ways that we can build solutions for the work load, both for technologists and for radiologists. There are ways that we can reduce stress. Malpractice exposure is high, but, personally, I do not think tort reform is going to solve this problem. I think we need more creative solutions, including thinking very seriously about a no-fault system, much like we have for vaccines. The procedure is repetitious and low tech, but higher tech imaging is coming along. The consideration of physician extenders is also quite reasonable, but here again the threat of increasing malpractice exposure for a supervising physician is a real obstacle to expanding the workforce to include non-physician interpreters. Again, I think the problems can be addressed. Of course, we should not expect all radiologists to want to do mammography. We just have to be sure that there are enough.

In thinking about what we might achieve through high quality, the differences between an unorganized and an organized health care delivery system provide some useful examples. Overall, however, as shown by some of the receiver operating characteristic data recently reported (Beam et al., 2003a), the quality of mammography in the U.S. is good, but it is quite variable. I think that is where the greatest concern lies.

In British Columbia, radiologists who read mammograms must pass an exam to show that they are proficient at finding small cancers. There is a strong interest in reading mammograms in British Columbia, because it represents extra income, and it is an environment where there is not a lot of competition from high tech procedures, certainly not compared with the U.S. The statistics for their program are really quite good as shown in Table 2.2 (British Columbia Cancer Agency, 2003).

TABLE 2.2. A High Performance Mammography Program in British Columbia.

TABLE 2.2

A High Performance Mammography Program in British Columbia.

I also want to point out the effect of age. The positive predictive value, that is, the proportion of women with an abnormal mammogram that actually have breast cancer, is lower in younger women, but these numbers improve as women get older. Across the board, the median tumor size is quite small, and the percent of node-negative tumors approaches 3 out of 4. This is an example of what ahigh performance program can deliver, and so I would encourage you to think not so much in terms of what our current data about mammography tell us is achievable, but more in terms of what the data tell us we might achieve through organized high performance systems. This should be our objective.

New York State has taken an oversight responsibility for the 292 facilities that participate in their state's Breast and Cervical Cancer Screening Program funded through the CDC National Breast and Cervical Cancer Early Detection Program (Hutton et al., 2004). The State Health Department regularly scans the data looking for outliers that exceed upper or lower bounds established for the program. They monitor the abnormal clinical breast exam rate, the abnormal mammography rate, the positive predictive values for biopsy, the cancer detection rate, and age, race, ethnicity and previous screening history among the participants. They visit facilities that have outliers, evaluate them, review medical records and mammograms for quality and interpretation, and they review their breast exam techniques.

Since the average radiologist, and even the average facility, detect few cancers in any given year, some surveillance measures may be strongly influenced by chance or other factors that could mistakenly portray a facility as performing very well or very poorly. Thus, this kind of program should not be, and is not punitive. Rather it is a responsible surveillance program designed to monitor performance and provide feedback, which, as I mentioned at the beginning of my presentation, is critically important to maintaining a high degree of quality assurance. That is why you have a visit. That is why you do surveillance and reviews. Following the data review, if corrective action is necessary, the state collaborates with the facility and the local screening project to implement corrective actions.

Table 2.3 provides an example that compares statistics from a facility that was identified as an outlier to the same measures from that facility after a visit and a plan of correction. The facility had relatively low volume, but it had a high abnormal mammogram rate, a relatively high rate of BI-RADS® 4 interpretations, a very low rate of additional investigational imaging, a high biopsy rate of women with abnormal mammograms, and a relatively low positive predictive value on biopsy.

TABLE 2.3. An Outlier Mammography Facility Before and After a Collaborative Corrective Program with New York State.

TABLE 2.3

An Outlier Mammography Facility Before and After a Collaborative Corrective Program with New York State.

During the state's visit, it was discovered that two staff radiologists had higher biopsy rates than the other three. In-service training for all five radiologists on the use of the BI-RADS® system and double readings for six months were instituted, and the facility was encouraged to obtain accreditation for its stereotactic biopsy program. After corrective action, the number of BI-RADS® 4 readings dropped to 4.3 percent; additional imaging to reconcile abnormalities more than doubled; biopsy rates in women after abnormal mammograms declined substantially, and the positive predictive value for biopsies increased.

This sensible surveillance strategy can improve the quality of breast imaging in ways that are difficult for voluntary programs or the Mammography Quality Standards Act to achieve, and it also can detect fraud and dangerously low quality. For example, the state discovered a facility that was reporting clinical breast examinations that were actually not being done and determined that the radiologists interpreting films were doing such a poor job that they shut the facility down, and the American College of Radiology withdrew its accreditation.

I turn now to a more detailed look at the advantages and disadvantages of organized screening. Organization can lead to more formal and uniform decisions about whether, whom, and at what intervals to screen. It also can install or improve uniform call-recall systems and triage, improve the timeliness of follow-up, minimize loss to follow-up, and improve quality assurance, monitoring, and evaluation.

It is important to understand also what organization may not accomplish. First of all, resource issues may take precedence over evidence, both nationally and locally. For example, screening programs in Europe may have elements, such as the recommended 3-year screening interval in the United Kingdom, that are due to resource limitations and are not in keeping with available evidence. Poor or incomplete population registries may limit the effectiveness of any call-recall system.

The screening program may be under funded; I cannot think of a single program in Europe that isn't stressed by lack of funds. Resource limitations may require various compromises that influence program goals, such as balancing sensitivity, specificity, and positive predictive value, or unequal access, or less attention to overcoming barriers. There simply may not be resources for outreach to hard to reach groups, even though the commitment is there in principle. And there may be delays in acquiring new technologies. For example, I know that there is quite a bit more digital mammography in the United States than in Europe at present.

Opportunistic screening may be more appealing to some groups in the target population. We have observed that it is sometimes very difficult to impose organized screening on a population that has always had a choice of where to go and whom to see for screening. We observed in some European countries that some providers were not enthusiastic about the organized program and participation rates by women in those programs were not as high as elsewhere. At times, participation in organized screening may not exceed that in opportunistic screening. Participation varies in both models by gender, socioeconomic status, perception of risk, rural-urban status, and various attitudes, and is quite a bit higher in the United States than in some organized settings in Europe. Organized screening does not eliminate the need for behavioral interventions over time focused on key target groups, such as groups with low income, groups that are institutionally adverse, and individuals who tend to refuse screening.

Nevertheless, based upon what we observe today and upon how much more we might achieve, I believe we must think about delivering organized breast cancer screening in this country. Organized breast cancer screening would increase adherence to regular screening. It would improve accuracy, and we would see an increase in the rate of cancers diagnosed before they become advanced. Data from Dalarna County in Sweden over a period of 40 years, shown in Figure 2.10, document survivals over 20 or more years in groups of women from two prescreening periods, 1958-1967 and 1968-1977, and compare those to survivals of two populations from the screening period (1978-1998), those who were exposed to screening and those who were not exposed to, or refused, screening. In the modern, screening period, survival is strikingly better for women participating in a program that has very high rates of adherence and very high quality (Tabar et al., 2003).

FIGURE 2.10. Dramatic improvements in survival in breast cancer in populations with high-quality organized screening: cumulative survival of breast cancer patients age 40-69 in Dalarna, Sweden.

FIGURE 2.10

Dramatic improvements in survival in breast cancer in populations with high-quality organized screening: cumulative survival of breast cancer patients age 40-69 in Dalarna, Sweden. Diagnosis and death 1958-1998. SOURCE: Tabar et al., 2003

We really should be striving to do the very best we can in mammography, and doing our best means devoting attention to increasing benefits and reducing harms associated with screening.Ultimately our goal for mammography must be saving women's lives by detecting breast cancers before they become advanced, when the patient has the best chance for successful treatment.

DR. PENHOET: We have time now for a question and answer period for the three speakers of the morning.

DR. SMITH-BINDMAN, University of California, San Francisco: I am inspired by the possibility of having a single organized screening program. I ask both Dr. Dunnick and Dr. Esserman about the feasibility of an organized system from their perspectives. Dr. Dunnick, you presented a very bleak picture of the way mammography is organized now. Financially it is not working at all; I think reimbursements in the academic environment are actually lower than in private practice, and there are a lot of impediments to building a better system. One solution is to throw more money at the system. I am not optimistic about that approach, because the current system is inefficient. Do you see as a possibility having freestanding organized screening programs as a way to give some status to radiologists who do mammography, to increase the finances, to increase the efficiency? Would that be a solution to the current broken system?

DR. DUNNICK: It certainly might help. Of course, as always, the devil is in the details. We need to know exactly what is proposed rather than just a general suggestion, then we would be able to assess that more easily. Mammography, of course, is not the only area that is relatively under-reimbursed. You really could look at the whole spectrum and try to align the reimbursement more closely with the work effort. We also have to support areas in nuclear medicine, pediatrics, as well as mammography.

DR. SMITH-BINDMAN: But sticking to mammography, I have trouble with the concept of mammography being under reimbursed. Regarding diagnostic mammography, clearly there is a lot more work that goes into it, probably tenfold more work than screening. But screening mammography could be very efficient in the current reimbursement scheme if radiologists didn't hang films, call reports, that is, basically do clerical jobs. In the study that is widely cited, the radiologists only spent a third of their time doing radiology (Enzmann et al., 2001). So the current $40 reimbursement for the professional component of mammography could be an effective reimbursement to a good mammographer who could interpret a mammogram in about a minute if all the clerical diversions were eliminated.

DR. DUNNICK: We might not have any good mammographers at Michigan then because a minute seems rather quick to me.2

DR. ESSERMAN: I think that is a very good question. There are opportunities to learn from how breast cancer screening is organized in Sweden and the U.K. Prior to about 1989 or 1990, there was no systematic screening program in the U.K. Obviously it is easier for them to implement changes because they have a centralized organization, but their screening program was not integrated into the National Health Service; it was set up separately. I am encouraged by Dr. Smith's discussion of what is happening in New York showing that there are other ways to impose organization and to set up a program with benchmarks.

I like the idea of thinking about opportunistic versus systematic. A reduction in breast cancer mortality rates is one of the benefits of the screening program in the United Kingdom, but it was not achieved by screening alone. Prior to the program, 90 percent of all breast cancer patients were cared for by general practitioners. Within 7 years of instituting the screening program, 95 percent of all women were treated in organized breast centers. That is not the case today in the U.S. So by taking an organized approach and looking at things systematically, you can make change. Yes, it is harder in our current climate, but I don't think it is impossible.

I think there are organizational examples of ways to leverage the time of the radiologists considerably. There are important ways to leverage biology as well. Some of the work towards risk stratification schemes may ultimately allow us to focus more on diagnostic mammograms and screen in a smaller group of patients, which could improve efficiency and solve some of the manpower issues. What is helpful about this report is its challenges to think about population registries and state systems to start some systematic programs.

DR. SMITH-BINDMAN: I had a second question directed to you regarding selective screening, which I think is a fantastic idea. We obviously don't have the tools yet to assess risk. We have age and breast density as a risk, but proteomics is in the future. I am concerned that if we have no system new tools will be embraced rapidly, and, rather than leading to selective screening, we will subject women to all of the tests, which is what we do in medicine, as opposed to thinking about how to use things efficiently.

DR. ESSERMAN: The committee members agree with you. Recommendation C proposes integrated technology centers to help design and evaluate systems. As Dr. Smith said, screening is a cascade, integrating the testing with the treatment approach. Understanding the biology allows us to avoid over-treating as well as under-treating. Programs like the NCI's Early Detection Research Network could be leveraged to help build centers around testing, developing, and deploying. People like Dr. Bohmer of the committee have noted that we are going to have to get to work on both implementing systems for population-based screening and for developing and deploying new technology properly.

DR. WARRICK, University of Texas School of Public Health: Dr. Esserman, what do you mean by regionalized programs and, in terms of understanding risk, are you suggesting that we abandon the Gail model (Gail et al., 1989)?

DR. ESSERMAN: By regionalized programs, I mean what Dr. Smith was describing, not having an opportunistic, but rather a systematic approach. For example, the U.K. program takes the population-based registry for a region, sends letters to all women in the screening age range who are registered, and invites them to screen. There is an organized system that tracks outcomes and monitors benchmarks. Instead of requiring extra work as in New York State, these things are part of the routine. That is an organized approach, and it is why systems are critical to realizing the full benefits of screening mammography.

In terms of the Gail model, no, the Gail risk model is not out. At a risk models meeting here a few weeks ago, integration of the approach to prevention and screening with understanding underlying risk was discussed as a way to tailor screening strategies. I think all of the risk models need to be integrated and new biologic tools added to better tailor screening. The Gail model has value in understanding that a five-year risk and a lifetime risk can actually be very important, and might be used as one of the screening strategies.

DR. SMITH: The value of some of these risk models lies in identifying women that are at measurably higher risk. Identifying a woman at low risk still obliges you to ensure that a breast cancer is detected this year or next year. Ultimately, even a woman who is at the lowest risk in the Gail model is still at appreciably enough risk to require screening. So, these models may be of value in identifying women who need a different screening interval, to begin at an earlier age, to undergo more frequent screening, something that is tailored to her uniquely. Even in some countries with a third the risk of the United States, there is interest in exploring how to prevent a late-stage diagnosis.

DR. PENHOET: Perhaps new technology will expand the definition of each of the four components in the Gail model. One of them is family history. Modern genetics should allow you to explore family history in much more precise genetic terms than simply as a broad category. With respect to the other factors as well, a deeper understanding could expand each one. So I don't think anybody would argue that the Gail model is not relevant, but there will be additional subcomponents under each of those four categories that are listed there today as the likely outcome, I think.

DR. TAPLIN, National Cancer Institute: Later in this symposium, I will be talking about an organized screening program here in the U.S. beginning back in the 1980s. So there are examples of organized screening from the past in the United States. It is an entirely possible model, but it requires leadership, oversight, and some commitment to an explicit set of guidelines.

Looking at the organized European programs with leadership and explicit guidelines, most of these have a two year screening interval. One of the underlying issues that you have raised today is capacity. The math is easy. If we went to a 2-year interval, especially for women 50 and above, you would automatically create more capacity. I ask how the panel feels about that, and why that isn't considered a possibility for addressing capacity and access problems in the U.S.

DR. ESSERMAN: I think that is a great comment. It may be that you can tailor that approach by age. The choice that was made in the U.K. was to screen every 3 years, not because they thought that was the optimal interval—they actually thought 2 years was the best interval for screening at ages 50 to 65. But they only had 128,000 pounds sterling annually. They made trade-offs because they decided that putting in a systematic and organized program where they could track outcomes would be their best first investment. In Sweden, they also have done a lot of work in looking at different intervals.

I think your point about capacity is very important. The opportunity to integrate risk models, to understand what kind of cancer you are at risk for, what the right interval should be, to understand the biology of disease, might allow us to tailor the interval much better in the future and to fit capacity to our need.

DR. SMITH: The screening interval should be driven by the estimated sojourn time. Therefore, screening every 2 years for women between ages 40 and 54 would not be advisable. Although we previously recommended to women in this age group that they be screened every 1 to 2 years, it seems the wrong message. Epidemiologists looked at the trials; some screened every year, some every 2 years; there was a benefit at each interval, but not the same benefit.

The trials and the Swedish experience have shown the need to tailor the screening interval to the detectable preclinical phase. We recommend annual screening for women over the age of 50 because that is what the HIP trial did. As we have learned more about sojourn time, we understand that nothing happens abruptly at the age of 50. Probably as women approach the age of 60—and this is what the Swedes did—one might extend the screening interval. This was a function of resources in Sweden. The data from San Francisco show better performance with annual screening compared to biannual screening in postmenopausal women although the improvement is less than it is in premenopausal women (Hunt et al., 1999). So if you can learn more about risk, for example, if a woman is not on hormone replacement therapy, has large breasts, and her mammograms are easy to read, then after the age of 60 it might be reasonable to screen her every 2 years.

DR. TAPLIN: I think this is a question for the panel to consider. We know that the sojourn time for women ages 50 and above is quite long; 3 years is a common estimate. So a 2-year screening interval is something to consider as a way of increasing capacity and access.

DR. ESSERMAN: It also seems that discontinuing hormone replacement therapy in women over 70 makes more of an impact than screening. So again, it is changing your underlying risk that can help you determine what the right intervals are.

DR. SMITH: But post-menopausal women with a family history do not eventually graduate to the longer sojourn time. These are women that really do need to continue to be screened annually.

DR. ESSERMAN: And if you have a systematic program, you will be tracking outcomes and looking at the data; you can look at variation in care regionally and at interval cancer rates. Because we don't have an organized approach, we are not collecting information routinely and systematically, and we can't answer those questions.

MRS. LAUDER: Given that there are fewer radiologists and breast imagers because of insurance reimbursement, liability, and lawsuits, are airline pilot restrictions on work loads an example of a strategy to improve efficiency and safety that might be applicable? Are there standards that inform radiologists about a limitation on the number of mammograms that they should interpret? Have any eye doctors come up with any recommendations as to what is physically possible? The idea would be to help in reducing the risk of human error, increasing efficiency, protecting the physician from lawsuits, and getting more accurate readings for the patients. I'm looking for ways to make mammography more attractive to radiologists.

DR. DUNNICK: That is an interesting question, and I don't have the answer. To put it another way, at the end of the day does the radiologist miss more cases than at the beginning of the day? Is there an increase in the error rate as one tries to become faster and read more cases? I don't actually know the numbers.

DR. SMITH: I think some of our colleagues in the audience might be able to address this. There is a literature on this that I have only begun to see, some of which evaluated how people tried to interpret other radiographs not just mammograms, the patterns of eye movements and the like. It would be very interesting to look at fatigue in the context of training and competence. Clearly a radiologist who was not comfortable reading mammograms will be reading under duress and may spend more time and experience fatigue due to uncertainty and anxiety. In addition, there will be the stress caused by the importance to women of detecting breast cancer and the threat of litigation for missed diagnoses. Of course, no matter what the level of competence, there will be fatigue at some point, but I think that radiologists who specialize in this field know when they are tired. They know when to stop reading and take a break, or how many films they can interpret during the day. We have done studies with cytotechnologists, to evaluate their patterns of reading pap smears. That led to limits under regulations of the Clinical Laboratory Improvement Act.

DR. BORGSTEDE, American College of Radiology: The question is an excellent one. A recent study analyzed the true to false positive ratios for numbers of mammograms read. The optimal was between 2,000 and 4,000. Radiologists reading fewer than 2,000 or more than 4,000 had poorer ratios (Kan et al., 2000).

DR. D'ORSI, Emory University: Numbers are not the only factor in adequate interpretation. You can read any number of mammograms and read them wrong constantly. So, along with numbers you have to review your results. Screening is a test that has built-in high accuracy because 99.5 percent of these exams are negative. If you simply read them all as negative, your accuracy will be 99.5 percent. The task is to pick up malignancy. False positives are the currency that you pay to detect subtle malignancy; as you increase false positives, your false negatives decrease. Obviously, there is a point where this is no longer productive. But before we dwell too much on the undesirability of false positives, we have to think what those false positives are getting for us. With good intelligent readers, they are getting us subtle malignancies.

DR. ESSERMAN: Alistair Gail, a Ph.D in visual psychology, has developed the U.K.'s performance test. This is a series of mammograms required as a training set for every mammographer. The program has organized special training sessions and modules based on the kinds of things that they miss. We should keep in mind that providing training and feedback are among the important factors in improving mammography.

Risk Stratification for Breast Cancer Detection: Better Quality Mammography for Women Through Better Focusing of Services

, M.D., Professor of Ambulatory Care and Prevention.

Harvard Medical School

I first want to congratulate the committee that produced this report. As a member of the committee that produced the previous report, Mammography and Beyond (Institute of Medicine, 2001), I have some idea of how hard you worked.

In my presentation today, I will discuss the science of risk stratification, breast cancer risk in context, and breast cancer risk perception. I will spend most of my time on the first topic.

The report lays out a hope for risk stratification. The idea is to take all women and assign them to groups that have ultra-low risk, medium risk, and high risk for developing breast cancer. There are many advantages to such a plan. From the individual woman's perspective, those that are low risk may be reassured, and they may also require less screening to protect themselves from breast cancer. For those at high risk, although it is disturbing news, they at least are informed and can take steps to prevent harm. From society's perspective, concentrating efforts on women in whom we can prevent most of the adverse effects of breast cancer would be wonderful.

As I delved more deeply into risk stratification, however, I began to think it may be more difficult to achieve than I, at least, previously thought.

Let's start with some risk factors which the committee summarized in tables in the report. I have listed in Box 2.4 six major risk factors. By major, I mean the relative risk is at least three, that is, the risk of breast cancer in women with these factors is at least 300 percent that of the risk of women without these characteristics.

Box Icon

BOX 2.4

Major Risk Factors for Breast Cancer with Relative Risk at Least 3 or Greater.

Increasing age is a well-known risk factor, one which we already use for breast cancer screening. The size of this risk factor depends on the cut points; I have compared women in their early seventies versus women in their early thirties; breast cancer is about 18 times more likely in the older women.

I had not considered genetic mutation as big a risk as the committee reported. However, women under 40 with a deleterious BRCA-1 mutation are 200 times more likely to develop breast cancer than women of similar age without deleterious mutations. The relative risk of the mutation decreases as women age, but is still high, 15, in women from ages 60 to 69 (Singletary, 2003).

The other risk factors, although called major, actually confer far less risk, including: atypical hyperplasia; radiation therapy (for things like Hodgkin's disease, so that is not going to be relevant to most women); increased breast density; and strong family history. Box 2.5 lists many of the risk factors that women often think about and that are frequently discussed in magazine articles; they are actually rather modest, with relative risks under three, and except for family history, they all may be related to exposure to estrogen over time.

Box Icon

BOX 2.5

Moderate Risk Factors for Breast Cancer Moderate Risk Factors for Breast Cancer with Relative Risks 1.0-3.0. Mother or sister with breast cancer Increased bone density

For risk stratification, the idea is to use these characteristics to stratify women into low- and high-risk groups. The best-known example we have of a risk stratification tool for breast cancer is that developed by Mitchell Gail and colleagues (Gail et al., 1989, and see NCI website http://bcra.nci.nih.gov/brc/). This tool was developed in 1989 from the Breast Cancer Detection and Demonstration Project based on risk factor information from about 200,000 women and was used to estimate expected breast cancer incidence in the Breast Cancer Prevention Trial. The risk factors in the model, which applies only to women over age 35 and assumes regular screening, include age, age at menarche, age at first birth or nulliparity, number of affected female first-degree relatives up to two, and history of benign breast biopsy or hyperplasia. Missing from this list, however, are major risk factors like breast density or mutations.

How well does this tool work in stratifying women according to their breast cancer risk? A recent report examines this question (Rockhill et al., 2003). The investigators studied a cohort of 82,109 women ages 45 to 71 who were part of the Nurses' Health Study. They followed the women for the period 1992 to 1997, having collected information on their risk factors for breast cancer. Of the 82,109 women, 1,354 developed breast cancer during the study period (1.65 percent). Using the Gail model, risk was estimated for each woman. All these risks were summed, and for the group the model was found to work remarkably well. The ratio of expected to observed cancers was really excellent (0.94) and even better in a high risk subsample (1.03). However, the Gail model did not work as well for individual women, and that is what is needed for risk stratification.

An individual woman either does or does not develop breast cancer, so her risk is either zero or 100 percent. It is only in groups of women that you get a range from zero to 100. Because it was not clear to me how the investigators assessed risks for individual women, I consulted Dr. Rockhill. I learned that she and her colleagues evaluated the accuracy of the Gail model in two different ways. One way was calibration of the model as I have just described, that is, determining the degree to which the percentage of a population actually developing disease is similar to the probability estimate of the model for that population. The Gail model estimated 1.55 percent of women would get breast cancer in this population of women from the Nurses' Health Study, and 1.65 percent actually developed breast cancer. So the expected to observed ratio was 0.94; that is the calibration of the model for the group.

Stratification requires a model to go one step further—to discriminate. Discrimination is the degree to which the estimated probabilities from the model are consistently higher for persons who develop disease compared to those who do not. Do the women who get breast cancer have a higher risk according to the model than the women who do not? This is calculated according to a concordance statistic, the values of which run between 0.5 (a coin-flip) and 1 (perfect discrimination). One woman who got breast cancer and one woman who did not get breast cancer are randomly selected, and it is determined whether the woman who got breast cancer had a higher risk score in the model than the woman who did not. All these determinations are summed, and the result is a percentage which represents the probability that for any randomly selected diseased/non-diseased pair of women the diseased woman has a higher estimated risk.

In the Nurses' Health Study, the resulting percentage was 58 percent. Now, 50 percent is flipping a coin; 58 percent is better, but let's face it, it is not much better.

Another way to demonstrate the concordance of 58 percent is with a graph. Figure 2.11 shows the percentage of women in the two groups with different estimated five-year risks according to the Gail model. Effective stratification should separate these groups. The Gail model did not separate the two groups. There is no place along the horizontal axis to draw a line, above which we would offer screening and below which we could reassure individual women. Also it is important to remember that the figure shows proportions of women, not absolute numbers. Almost 80,000 women did not develop breast cancer in this 5-year period, whereas 1,354 did. If the figure displayed absolute numbers instead of percentages, the group that did not develop breast cancer would swallow up the group that did.

FIGURE 2.11. Discrimination of the Gail model.

FIGURE 2.11

Discrimination of the Gail model. SOURCE: Rockhill, B, personal communication, reproduced with permission.

For several other medical conditions, we know that a large number of people at low risk may give rise to more cases of disease than a small number who are at high risk. This common situation seems to be true for breast cancer, and it limits the utility of stratifying women according to risk when considering breast cancer prevention strategies. This is a reality that we must understand as we think about risk stratification.

How big does a risk have to be to be useful for stratification? In the Gail model, the relative risks were small. Except for age, no factors with very large risks were included. So how large a risk will be needed to discriminate between women who will get breast cancer and those who will not? A report in the British Medical Journal (Wald et al., 1999) suggested that a relative risk of about 200 is necessary to discriminate well between groups, and risks of this magnitude are rare; there are a few examples like alpha-fetoprotein and spina bifida (at 242) and perhaps hepatitis B and hepatocellular cancer (at around 200).

In breast cancer only BRCA-1 or 2 in young women are in this range. To test BRCA as a discriminator, I made a calculation using the relative risks I mentioned earlier of 200 for women ages 20 to 49 and 15 for women between ages 60 and 69. Although we are not really sure about these risks, they will serve for the purposes of this example. Also, although not certain, I used the estimate that about 0.25 percent of women carry a deleterious genetic mutation for BRCA-1.

Table 2.4 displays the figures from my “back of the envelope” calculation using the SEER data for the risk (and numbers of cases) in the younger and older general populations. In the young age group with fewer cancers, the high-risk BRCA positive women account for half the total (5,000 of 10,000) which is consistent with the estimate of Wald and colleagues that about half of cases in a population can be accounted for by a group with a relative risk of 200. I emphasize that this is a very rough calculation. There are all sorts of subtleties that are not included, but it is sufficient to make the point for breast cancer.

TABLE 2.4. Discriminating Potential of a High Relative Risk for Breast Cancer, BRCA 1, in Young and Old Populations: Back of Envelope Calculation.

TABLE 2.4

Discriminating Potential of a High Relative Risk for Breast Cancer, BRCA 1, in Young and Old Populations: Back of Envelope Calculation.

Turning to older women, we have many more breast cancer cases in the general population because the underlying risk in the population is higher. The subset of the population with BRCA mutations accounts for less than four percent (1,400 of 37,000) of all breast cancers in this age group. It appears the risks that work for breast cancer risk stratification are ones with very large relative risks, or perhaps a combination of factors that add up to a very large relative risk.

In summary, risk stratification may be difficult because most risks for breast cancer are small and because many of these risk factors are spread out over the entire population. As Wald and colleagues point out, some of the risk factors are calculated by using the extremes of the population, but we have to apply them to the entire population, watering down the discriminatory effect. BRCA-1 may be an exception. Exciting developments in breast cancer research may lead to other examples where we can identify other small groups of women with large risks. Regardless of what happens, the take-home message is that risk models should be evaluated not only for their calibration, that is, how well they work for the whole group, but for their discrimination: how well they can separate individual women who are and are not going to develop diseases like breast cancer.

I now want to shift the discussion and talk about breast cancer risk in context. I was delighted that this was covered in the report. American women think that if they know anything at all about breast cancer risk, they know about one in eight women are going to get this disease. I was glad to hear from Dr. Smith that the American Cancer Society has decided that it was time to downplay that life-long risk, mainly because too many women translate it into a short-term risk. It is, perhaps, one reason why so many women think that there is an epidemic of breast cancer.

In my own view, at least health providers should know the absolute risks displayed in Table 2.5. One of our problems has been that so much of the dicussion about breast cancer is in terms of relative risks, which sound so much more threatening. While it is true that breast cancer is the biggest cause of death in women in their forties, it only accounts for 10 percent of the very few deaths that occur at that age (2 breast cancer deaths per 1,000 women). But women also need to understand the increasing incidence of breast cancer with age, and the resulting need for screening as they grow older. They must also begin to understand that a breast cancer diagnosis is not a death sentence.

TABLE 2.5. Absolute Risks Among Women of Developing and Dying of Breast Cancer in 10 Years.

TABLE 2.5

Absolute Risks Among Women of Developing and Dying of Breast Cancer in 10 Years.

Finally, they have to have some sense of putting this information into context with the rest of their lives, and indeed, so do we. We are spending a whole day on breast cancer quite appropriately at this symposium, but I know as a general internist that women have many other complaints and concerns.

Finding appropriate methods to communicate these facts to women is not easy. When I was at the University of North Carolina, and it got to be known that I was involved in breast cancer screening and prevention, several college students in their early twenties would come to my practice worried about breast cancer. For the vast majority, I could not find anything that would suggest they were at increased risk. I would tell them that although it is true as a woman gets older breast cancer risk increases, the risk at your age is somewhere around one in 100,000. This did not seem to be reassuring; they had no context for the numbers. Finally, I began to say, “Look, a 70-year-old man has five times your chance of getting breast cancer in the next year.” You could just see the light bulb go on. So we have to figure out a way effectively to communicate the risks of breast cancer to our patients.

That is the reason I was also delighted with the report's recognition that breast cancer risk perception is important. There is a great deal of fear out in the community about breast cancer. Years ago, colleagues and I conducted telephone surveys in two communities in North Carolina, and found that about a fifth to a quarter worried about breast cancer, about half feared finding it, almost three-quarters thought looking for it made women worry (Fletcher et al., 1993).

The report cites a survey (Black et al., 1995) of young women in their forties over-estimating their risk of dying of breast cancer twenty-fold, and their risk of getting it about six-fold. We clinicians have tended to think that this problem is not medical, and so we do not discuss it or address it very often with our patients. Thank goodness, the committee thought risk perception is important and made recommendations to address it.

What is the cultural context of all of this? The ancients thought of the breast as nurturing, as being very important for the survival of the species. They made deities out of women who were able to nurture twins. They got so excited about the breast that in 2000 BC, they gave one of their goddesses about 20 of them on her chest. Then over the centuries, society's connotation of the breast took on more of a sexual importance. Only in the last part of the twentieth century, have we begun to think of the breast as also signifying death and mutilation. I think there is too much of that sense in the perception of breast cancer risk in the United States today. Science and medicine obviously are not totally responsible for this perception, nor can we change cultural attitudes completely. But we should accept some responsibility for the modern fear of death and mutilation from breast cancer. It is time to work hard to give women a more realistic understanding of their risk of breast cancer.

In conclusion, the committee made two major recommendations about breast cancer risk. The first was to develop individually tailored risk prediction tools to identify women who would benefit from individualized approaches to breast cancer detection. I agree with this important goal, but I have suggested it may be more difficult than we thought. Problems worthy of attack prove their worth by fighting back. Maybe this is such a case. The other major recommendation was to develop tools that facilitate communication regarding breast cancer risk. This too may be difficult, but is very worthwhile.

The Promise of Biomarkers in Early Detection of Breast Cancer: Better Quality Mammography Through Better Focusing of Services

, M.D.

President and Chair, Human Proteome Initiative Committee, Professor, Department of Pediatrics, University of Michigan

We are now going to shift gears from talking about procedures and approaches that have been tried and tested over many years to talking about the promise of something on the horizon.

We should put the promise of molecular markers for breast cancer or for other cancers into the perspective of what we see them contributing to cancer management in the next 5 or 10 years. Such markers might help us to screen for, and make a diagnosis of, breast cancer by using a blood specimen to make a molecular diagnosis. The same factors, proteins, or molecular elements that are used to make the diagnosis may well help with imaging. If a factor is detectable in the circulation, then presumably it is coming from the tumor itself, and it might be tagged in a way that allows locating the tumor. So there is a continuum from molecular diagnosis in a positive screen, to visualizing tumor cells or locating the tumor, and then to arming the same factors with something that is toxic targeted to those tumor cells. Perhaps, quite a few years down the road, we may not need the imaging component, but at present it is extremely reassuring to be able to localize the tumor after a positive screen before proceeding to any kind of therapy.

How can we come up with novel diagnostics for breast cancer? There are many strategies available, although they have not been implemented to the extent that they should have been, whether because of a lack of funding or lack of resources of other kinds, I'm not sure. I think there is a tremendous opportunity to find the features of breast cancer with the greatest diagnostic promise.

We know a lot about breast tumors and about the genes that are expressed in these cancers. We have an opportunity to look for those genes, and perhaps the proteins, that are expressed in tumors but not in normal tissue, and ask which ones of those could have diagnostic potential, which ones are shed into, and could be detected in, the circulation.

We can also place tumors, or tumor cells, in incubation media to allow them to secrete their products into the media. After identifying those products in the media, we try to detect them in the circulation. And an even more direct approach is to look for the proteins in fluids such as nipple aspirate. With a complete list of those proteins, one can check for their presence and diagnostic potential in the blood.

If the ultimate goal is to be able to take a blood sample and make a diagnosis for breast cancer, why not, as another strategy, simply profile blood proteins and ask which ones can be detected in the blood of subjects with breast cancer and not in controls? Harnessing the immune system could represent yet another strategy. If there are tumor proteins that are aberrantly expressed or are abnormal, then the immune system, which is not compromised early during tumor development, may well react by producing antibodies to those abnormal proteins. If those proteins are identified, they could be put on a chip, and a drop of blood or serum could be reacted with them. A positive reaction would confirm that there are tumor antigens (proteins) present.

These are all logical ways to use current technology to find novel markers for the early diagnosis of breast cancer, although I would say that the entire present effort is really very modest, and a lot more effort has to go into systematically searching for those types of markers that would help us to diagnose breast and other cancers early.

Recall how we started the human genome project 15 years ago. It was with the promise that knowing all the genes in the genome surely would allow us to understand and cure diseases. Of course, now we realize that things are a lot more complex, that while there are perhaps only 30,000 genes, those 30,000 genes produce upwards of a million different proteins. I believe we should now develop strategies that allow us comprehensively to identify and characterize all the proteins being produced by breast cancer cells as a more direct way to find those particular proteins that could be promising diagnostic or therapeutic targets.

We and other groups are doing that type of work. We assume that the proteins that are in the circulation are coming primarily from the cell surface, and we think that in order to image breast cancer, knowing what is on the cell surface that could be targeted with an imaging agent would be very important. So, why not have a strategy to identify all the proteins expressed on the surface of breast cancer cells? Many technologies are available at the present time to do just that.

One strategy is to tag all the proteins on the cell surface with tagging agents like biotin. Then those cells are broken open and all the biotinated proteins are captured and systematically characterized. We have been working with lung cancer cells. There are thousands of proteins on the surface of these cancer cells. Although not all of them have diagnostic potential, surely those that are uniquely expressed on the surface of such cancer cells could have diagnostic as well as imaging and therapeutic potential.

An initiative to characterize all proteins expressed on the surface of breast cancer cells would have tremendous benefit through identifying those subsets that are important for diagnosis, molecular imaging, or therapy.

There are two or three groups at the present time that are investigating proteins that are secreted by tumor cells. Tumors that have been excised from patients are divided into small pieces, placed in incubation medium, and incubated. They release proteins or other factors which are then collected from the fluid and analyzed through proteomics. A comprehensive mass spectrometry directed approach identifies each protein, and this information is compared with our body of knowledge on gene and protein expression resulting in the selection of proteins that are candidates as diagnostic or therapeutic markers. So, this approach involves identifying the proteins one by one, and, once identified, determining whether they are present in the circulation and characterizing them.

One approach that seems to have attracted attention recently involves profiling the serum of breast cancer patients compared to controls through proteomics technologies. This approach identifies those proteins that are distinctive in the serum of breast cancer patients and, therefore, have the potential for making an early diagnosis. The challenge in this approach is that the marker proteins of interest are mixed in with 6 or 8 very high abundance proteins that make up about 90 percent of serum, medium abundance proteins that make up nine or ten percent of serum, and the many, many low abundance proteins that are found in the final one percent fraction of serum.

From a technology point of view, finding the very low abundance proteins among all of the more abundant ones represents quite a challenge. Figure 2.12 illustrates this problem. There are at least 5,000 proteins that are more abundant than a marker like prostate specific antigen (PSA), which is present in serum in the picomolar range.

FIGURE 2.12. Disease markers like PSA are present in trillionth molar concentrations as opposed to albumen which is present in greater than thousandth molar concentrations.

FIGURE 2.12

Disease markers like PSA are present in trillionth molar concentrations as opposed to albumen which is present in greater than thousandth molar concentrations.

A complex and precise technology is required (and fortunately available) that allows profiling thousands of very low abundance proteins and identifying those of high value that may represent markers for different disease states, including breast cancer.

The Human Proteome Initiative, with the support of NIH as well as numerous industry groups, is an effort to utilize all of the proteomics technologies to comprehensively quantify and characterize all the proteins in human serum. This initiative will provide an understanding of the range of variation in our serum and plasma protein constituents. It will develop the knowledge base for the normal serum and plasma proteome and how proteins change with age, with ethnicity, with physiologic states, and with dietary habits, among others. This is a major undertaking which is currently in its pilot phase and has already yielded very interesting findings.

Another proteomics approach provides a comprehensive profiling using multiple tool sets. A tube of serum or plasma is separated based on different protein characteristics into thousands of fractions, each amounting to about a microliter in volume. Then each one of those fractions is analyzed separately for its protein content, and the proteins are tagged to allow generation of quantitative data. Pre-treatment and post-treatment, or pre-disease and post-disease samples can be tagged with different agents, mixed together, and compared for changes in protein expression. A strategy like this enables sifting through thousands of proteins to pull out the ones that might be associated with a particular condition or disease state such as breast cancer, for example.

With such approaches, we are confident that we can discover the PSA equivalents for breast cancer, for colon cancer and for lung cancer. Specifically for breast cancer, there is an effort now supported by the Entertainment Industry Foundation that is targeting serum profiling using these technologies to find proteins that may be markers for the early diagnosis of breast cancer. This modest effort may provide the proof of principle that this comprehensive profiling of serum enables us to find early diagnostic markers. Furthermore, as I said earlier, this is only one among numerous strategies that could be followed to find potential markers, such as developing a better understanding of the cancer cell itself and what it is expressing on its cell surface, among others. Wouldn't it be exciting if the BCRF and the Entertainment Industry Foundation and perhaps other foundations were to get together to mount a major effort to develop an understanding of breast cancer cells for the purpose of identifying novel markers for early diagnosis and more effective therapy?

And finally, I want to describe more fully another strategy I mentioned earlier and that we and others are exploring. This approach relies on the immune system to tell us whether or not there are tumor cells in a person. Several technologies are available that allow review of all the proteins that could be expressed in a tumor and discovery of which ones are antigenic, that is, cause the formation of antibodies directed against themselves. We can display all the proteins from cancer cell lines on membranes, or blots, and, using sera from different subjects, explore which of the proteins from a particular cancer are recognized by the immune system, that is, act as antigens and generate antibodies.

In one of our studies of breast cancer proteins published three years ago, a particular group of three related proteins, called RS/DJ-1, from a breast cancer cell line, was recognized strongly by sera from four breast cancer patients but not by sera from healthy controls (Le Naour et al., 2001). Figure 2.13 shows the pattern of proteins reacting as antigens with sera from breast cancer patients as spots on the membrane, or blot. Because we detected the presence of antibodies against this protein group from a few breast cancer patients, we asked whether the protein antigen itself was in the circulation of breast cancer patients and could be a potential marker. When we looked for the antigen in the circulation, we discovered that 37 percent of patients with breast cancer had the antigen detectable in circulation at the time of diagnosis. Some patients had both antigen and antibody, some patients had only antibody, and others had only antigen detectable, and counting those that had either an antigen or an antibody, roughly 50 percent of new breast cancer patients had evidence of this particular molecular marker.

FIGURE 2.13. Western blot showing the presence of protein antigen-antibody reactions as potential markers from patients with breast cancer but not from healthy controls.

FIGURE 2.13

Western blot showing the presence of protein antigen-antibody reactions as potential markers from patients with breast cancer but not from healthy controls.

We are very early on in this process of discovering and then validating markers. But this is one among potentially dozens of antigens that could be detectable through antibodies or through the detection of the antigen itself. The goal would be to develop panels of such antigens that together would have the prerequisite sensitivity as well as specificity.

The particular technology that I just illustrated, works very well, but it is tedious, and it is low throughput. To address these problems, we have been able to take the same proteins from tumor cells and rather than put them on membranes dot them on micro-arrays or chips. This allows a more industrialized, higher throughput, higher sensitivity process. Specifically then, we are engaged in increasing the efficiency with which potential markers can be diagnosed.

In the protein micro-array approach, we divide the proteins into several thousand fractions, and the fractions are arrayed on a chip to produce micro-arrays, that is, little slides, that contain the entire breast cancer proteome or the colon cancer proteome. For example, our colon cancer chip arrays 4,000 protein fractions originating from colon cancer cells; it can be incubated with serum from a new colon cancer patient. We ask which of the 4,000 fractions the immune system recognizes by an antibody reaction. The answer turns out to be about 50 or so, and we have determined that this is reproducible from array to array. The ultimate goal is to produce chips dotted with protein antigens that can be incubated with one microliter amounts of serum from different subjects, and, based on the pattern of reactivity, can define a molecular signature of colon, breast, lung, or other cancer.

Although this is potentially far reaching, we are extremely early on in this process. We are beginning to learn what is making these proteins detectable in the circulation, and why the immune system is reacting against them. We think that there is tremendous processing of proteins, that a gene does not just encode for a single form of protein, but for something that the cell turns into many different forms. Some of those forms have associations with cancer and can stimulate immune reactions, but we have a lot more research and validation to do. Nevertheless, I am optimistic that we can find markers that are truly useful, better than PSA, for example, and that even as early in the discovery phase as we are today, a targeted approach toward funding the most informative markers for the early diagnosis of breast cancer could help to save women's lives

Bringing New Technologies into Service: Better Quality for Women Through New or Improved Technologies

, M.D., M. Sci., Chief Medical Officer and Director.

Office of Clinical Standards and Quality, Centers for Medicare and Medicaid Services

I am going to give you a framework for thinking about Medicare reimbursement policy for new technologies, focusing on some of the complexity that may not be well known, on some very fundamental statutory and regulatory barriers, and on the legal authority the program has to pay for screening and early detection technology.

Medicare reimbursement falls into five components listed in Box 2.6, each about as complex as the whole. Regulatory approval of payment for technology involves first of all approval by the FDA for at least one use or indication. It need not be the use that Medicare pays for. Payors like Medicare can and do routinely (but not always) cover off-label indications, both diagnostic and therapeutic. But if the technology falls under FDA statutory authority, Medicare requires that it be approved for at least one indication. There are ways in which changes in FDA regulatory policy related to technology can influence payment policy. I will discuss this further later, but it is particularly relevant in the case of breast cancer and the coverage of mammography.

Box Icon

BOX 2.6

The Five Components of Medicare Reimbursement. Regulartory approval (if applicable) Benefit catergory determination

The benefit category is the next step, and a major one, in the reimbursement cascade. Medicare is a defined benefits program, meaning the benefit category has to be defined in statute in order for Medicare to pay. Benefit categories include, for example, inpatient treatment, outpatient treatment, or durable medical equipment. And as you know, a prescription drug benefit was added in September 2003. Before the Medicare Prescription Drug, Improvement, and Modernization Act of 2003 (MMA, P.L. 108-173), Medicare could not pay for outpatient prescription drugs, no matter how needed or effective they were.

Diagnostic services are a Medicare benefit category, but screening and preventive services generally require separate legislation. Definition of a service as diagnostic is important because Medicare can pay for diagnosis but not screening. Medicare could pay for breast cancer diagnosis, therefore, including diagnostic mammography, by the use of new, not currently employed technologies.

Screening mammography, however, was added by a change in Medicare law which described screening mammography narrowly and in such a way that the precise definition was left to the public health law and the FDA. That is why Medicare can pay for screening mammograms, but it is also why, as I noted earlier, other technologies besides screening mammography for early detection of breast cancer could not be covered without a statutory or regulatory change at the FDA.

The actual language from Medicare law, section 1861(jj) states that the term screening mammography means a radiologic procedure provided to a woman for the purposes of early detection of breast cancer and includes a physician's interpretation of the result of the procedure. The term radiologic procedure is not defined in the statute or any Medicare regulations. Instead, it is defined in the Public Health Service Act and FDA regulations pursuant to that Act which limit the term to the standard mammogram, not PET scan, not CT, not MRI, and not ultrasound, among others. There is nothing, then, in the Medicare statute or regulations that would prevent inclusion of a much broader range of imaging technologies under the current statutory authority for paying for screening mammography if the FDA changed the definition of a screening mammogram as embedded in FDA regulations defining a radiologic procedure. Otherwise, it would probably require a statutory change to have any new breast cancer screening techniques paid for beyond the standard mammogram. At least, as I surveyed relevant staff in CMS, this seems to be the correct interpretation.

As I said, the definitions of screening and diagnosis are important in determining Medicare payment. Diagnosis means a test that is done in the presence of signs or symptoms of disease. In other words, if there is an abnormal finding on a mammogram, any technology used to evaluate that abnormal finding is considered diagnostic and a coverable benefit. Medicare may refuse to pay for a diagnostic technology or procedure based on a decision that it is not reasonable and necessary, but at least it is a coverable benefit as opposed to a screening procedure in a healthy woman that discovered the abnormal finding (unless that screening procedure had been added by specific statute).

For example, a very strong family history does not qualify any test as diagnostic. A woman could have the strongest possible family history of cancer and performing what could be considered a diagnostic study in the absence of signs or symptoms of disease or personal history of cancer would be considered screening and would not be coverable by Medicare (again, unless that kind of study had been specifically added to coverage by special statute).

Last year, CMS considered adding testing for diabetes as a benefit for patients with risk factors, but no signs or symptoms of the disease, and explored a regulatory change to achieve this. We discovered we could not add this benefit without a statutory change, and so diabetes screening was included in the MMA.

I suppose that, in theory, Medicare might change the rules to consider interventions to discover disease in a particularly high-risk situation as diagnostic (and reimbursable), but such a rulemaking process would not necessarily be a more efficient, faster process than a legislative change.

Unlike mammography, the statutory benefit for colorectal cancer screening provides that lab-based fecal occult blood testing and other screening tests as determined by the Secretary in consultation with experts are covered. Therefore, if additional new technologies, virtual colonoscopy, for example, met the standard of reasonable and necessary, they would be potentially coverable under the statutory authority for colorectal cancer screening. But, unfortunately, to emphasize what I have said, the mammography screening benefit's statutory language does not allow Medicare to add other breast cancer screening technologies.

I have talked a little of the reasonable and necessary concept, which is the subject of technology assessment and evaluating medical benefits. The Medicare statute provides payment only for things that are reasonable and necessary for diagnosis and treatment of illness and injury. That term is not further defined in any official legal documents. CMS, however, uses a standard definition, which is that there has to be adequate evidence to conclude that the item or service improves net health outcomes experienced by patients (such as improved function, quality of life, morbidity, or mortality), generalizable to the Medicare population, and as good or better than currently covered alternatives.

CMS uses a standard evidence-based-medicine framework, relying on the usual rules of evidence, no different than the U.S. Preventive Services Task Force. There is no formal economic analysis done as part of a reasonable and necessary determination. Although, there is no legal prohibition against considering costs, it is longstanding Medicare practice not to do so when making coverage decisions. A technology or procedure could cost $5,000, $50,000, or $500,000 per life year saved; it would meet the test of reasonable and necessary in any of these instances if it improved health outcomes.

Coverage decisions can be made at the local and/or national level. Many people think that reasonable and necessary reviews at the local level apply a somewhat lower evidence standard of proof and may rely more on expert opinion. The usual view, therefore, is that, in terms of introducing new technologies, bringing those in through the local contractor process is less burdensome than coming in for a national coverage decision.

The national coverage process is diagramed in Figure 2.14. It involves a formally defined series of steps for submitting a request for coverage, so that an evidence review can be referred to an outside advisory committee or a formal technology assessment carried out.

FIGURE 2.14. The national process for determining Medicare payment coverage.

FIGURE 2.14

The national process for determining Medicare payment coverage.

In reasonable and necessary decisions for diagnostic tests (like diagnostic mammography, for example), CMS looks for studies of test performance, classic sensitivity and specificity, for impact on patient management and outcomes, which may depend on whether or not there is a beneficial intervention available, or asks if the information itself provides a benefit. Certainty in diagnosis may be useful by itself, or information about diagnosis or prognosis may influence decisions about the use of other health care services, institutionalization, or other care. If there is empirical evidence that knowledge of diagnosis influences the care or quality-of-life of the patient, that certainly would be a potential basis for meeting the reasonable and necessary standard.

In some situations, however, like PET scanning for Alzheimer's disease, where treatments are relatively ineffective and not particularly toxic, it is preferable simply to go ahead and treat based on the clinical story rather than run the risk of withholding treatment based on a possible false negative scan. In this situation, therefore, the test does not actually improve a health outcome.

Sometimes, a CMS review of coverage results in a split decision, such as the coverage decision on PET for breast cancer. It is covered for diagnosis not for screening, consistent with the usual rules on benefit categories I discussed earlier, but it is covered as an adjunct to standard staging in loco-regional or distant recurrence and monitoring for response to therapy. It is not covered for evaluating abnormal mammograms or palpable breast masses or for evaluation of axillary lymph nodes to decide on lymph node dissection.

Assuming that CMS could make reasonable and necessary decisions about screening and early detection of breast cancer, we still need to know who is doing the evidentiary studies, what is the quality of the studies, and what methods are required to study comparatively sequences of multiple studies. It appears that handling these questions is not part of anybody's research agenda. As the IOM report points out, the funding for applied clinical research, how to use technologies once they are developed, is really nobody's domain. This is what I am calling the systematic gap in evidence from applied clinical research.

Decision makers are interested in using high quality evidence to support clinical and health policy choices, but the quality of available evidence is inadequate (Tunis et al., 2003). The whole concept of practical or pragmatic clinical trials is one that has been talked about since the original discussion of pragmatic attitudes and therapeutic trials in 1967 (Schwartz and Lellouch, 1967). I am talking about the gap in studies in which the hypothesis and design are developed specifically to answer the questions faced by decision makers. These trials select clinically relevant alternative interventions to compare, include a diverse study population with broad patient eligibility recruited from heterogeneous “real world” practice settings, allow natural variation and are minimally intrusive on care, and collect data on a broad range of health outcomes both functional and economic (Tunis et al., 2003). I emphasize that the key is clinical research that is designed to answer the questions of decision makers like patients, like clinicians, and like payors and purchasers. That research looks very different from clinical research that is designed to answer fundamental questions about etiology, causation, and the like.

CMS, under Dr. McClellan, is very interested, given that we are a major consumer of information needed to make decisions about payment and coverage, in trying to find ways to get into this business. We are working in a number of ways to try to facilitate the support and infrastructure for practical clinical research, research on comparative effectiveness, through Section 1013 of the MMA. We have new collaborations with NCI, NHLBI, and FDA to focus on this issue, looking into alternatives or modifications to clinical trials, registries, quasi-experimental studies, among others, to try to improve the evidence base. We also have a lot of interest in exploring coverage under protocol, that is, paying for emerging and promising technologies, but only in the context of a clinical trial or some kind of clinical study with systematic collection of evidence.

I would like to conclude with some general comments on health care costs. We all know that we are spending a lot of money on health care, so it is important that we know the risks and benefits of the interventions we are using. While I said there is no explicit consideration of costs in Medicare coverage and payment, cost is the universal context of health care decision making. Both in the IOM report as well as in any other discussion, we must think about the economic implications of new or added technologies, the additional clinical value or the additional social value, and how that is related to the investment. There are many other unmet needs in society for health care services that should be balanced against additional spending for improvements in breast cancer technology or increases in payment to improve the quality of mammography.

DR. PENHOET: We have time now for questions and answers from our last three speakers.

DR. DUNNICK: What is the timing regarding the suggestion about testing technologies in practical clinical trials since we cannot test these with the classical randomized controlled trials or we will be waiting 10 or 15 years for results?

DR. TUNIS: One of the limitations of the evidence-based framework for coverage and payment policy is the long time it can take to prove something actually works, time during which introduction into practice should have occurred. The answer to getting relatively quickly from proof of principle at the bench to early experience at the bedside is to have some framework, as I have described, where there is a subset of technologies that look promising and could be reimbursed in the context of defined protocols in order to find out which actually work and which do not. So often at present as technologies are first introduced into practice, there is uncoordinated trial and error that produces no information about whether they are or value, and that wastes a lot of money and time.

DR. ESSERMAN: Our report states that the organization of screening mammography and breast cancer care is important and could make for greater cost-effectiveness and better quality. But there is no funding or infrastructure to support that. I'd like to hear you comment on that.

DR. TUNIS: I think we are actually at an inflection point in terms of payors thinking more about how payment policy might promote the efficient organization of care. Historically, Medicare has been a resource-based payor, that is, paying for resource consumption. In that context, there is no place for discussing how to pay for delivery of a service in a high quality way. Now, however, there may be an opportunity to make a case to both private payers and CMS for payment for models of care that are efficient. We can look at our regulatory and statutory authority and find out how we might facilitate that. Medicare now has a new Section 721 which allows a new payment mechanism for coordinated care for chronic illness, moving away from resource consumption. If you can do it better and cheaper, we will find a way to financially reward that.

DR. ESSERMAN: In the committee, we considered whether proteomics would have to go head-to-head with mammography, that is, would have to have equivalent sensitivity and specificity. But that could hamper development of inexpensive and easy tests that have very high sensitivity and low specificity for use in a primary-secondary screen system. That is partly an FDA issue and partly a CMS issue; but what would enable or support this kind of integrated approach to combining tests and looking at more clever ways of harnessing technology?

DR. TUNIS: The standard framework for evaluating new technologies does not easily apply to such staging of tests. I don't have the answer to how to do that. If you came up with a framework by which that kind of question could be answered, we payors would have to look at it, but at the moment we have rather simple-minded evidence standards that do not apply very well.

DR. SMITH: Dr. Fletcher, I thought you addressed the issue of risk stratification very nicely in terms of the Gail model in context and its application to identify a population for study. The problem is, short term risk is deceptively small, long term risk is deceptively large, and that makes it very hard for women to get a handle on how to think about risk mathematically for a disease that means a lot to them.

As risk grows over time, screening needs to be thought of in terms of social insurance. Most women will not develop breast cancer in their lifetime, but screening can measurably reduce the risk of being diagnosed with advanced breast cancer that could result in a premature death. With respect to all-cause mortality, ultimately only about three and a half percent of women die of breast cancer, but that is not nearly as important as the contribution that breast cancer makes to premature mortality. In any given year, deaths of women diagnosed with breast cancer in their forties account for about 16 or 17 percent of all breast cancer deaths.

We looked at the two-county trial to estimate the effect of a reduction of risk of dying of breast cancer on all-cause mortality. As you said, women in their forties have a very low risk of dying of anything. But among the causes of death breast cancer is quite a significant factor. The breast cancer mortality reduction we saw in that group meant a reduction in risk of dying of anything by about 50 percent.

I am wondering how you would reconcile the near-term risk versus the long-term risk against the background of something simple to do that can ensure, even though you have a low probability of dying of anything, that you have significantly reduced your risk of dying of breast cancer by participating in screening.

DR. FLETCHER: I think you are illustrating why this is such a complicated, almost counterintuitive area. The prevention paradox is that for the vast majority of people undertaking an intervention, for example, screening for breast cancer, there is no benefit. Yet, for the women who do benefit, it may be quite substantial.

As you said, in the younger age groups, we are talking about 10 to 15 percent of deaths caused by breast cancer. I think what I showed reaches the same conclusion as what you said. Stratification of risk is going to be tougher than at least I had previously thought. We need groups of risks that are really quite a bit more substantial than most of the risks we have so far identified. Furthermore, regardless of what we come up with in terms of a new model incorporating brand-new technologies, we must validate it not only in terms of calibration, but in terms of the ability to discriminate among women if it is to be used for risk stratification.

DR. PETITTI, Kaiser Permanente: The promise of the ultra-low breast cancer risk group is demonstrated by the study of Cummings and colleagues showing that at some age there is a serum marker (serum estradiol level), which is not based on proteomics, that identifies a group of women that have a very low risk of developing breast cancer over some reasonable time frame, say 4 years (Cummings et al., 2002). When you think about it, the ability to decide when you have the risk of a 70-year-old man and don't need to have mammography is very important. There are analogies in other fields of cancer screening; we are now finding that a 55-year-old woman who is human papilloma virus negative might not need a Pap smear every year, and there are analogies from the cardiovascular field, where someone who has a low density lipoprotein of 80 and a high density lipoprotein of 100 probably would not be a candidate for a screening test for early cardiovascular disease. So I think the ultra-low risk group is as important as the high risk group.

DR. WARRICK: Are low risk, high perception women disproportionately utilizing mammography capacity, and if so, does this explain 60 percent of women reporting having had mammograms to the Behavioral Risk Factor Surveillance System, and only a little more than 30 percent of Medicare eligible beneficiaries getting screened. Should the Breast and Cervical Cancer Early Detection Program be modified based on these findings?

DR. FLETCHER: As mammography utilization increased in this country, for a long time women in their 40s utilized it more than women over 50. I think now, the women over 50 have a slightly higher percentage utilization. And the women over 65 do not utilize it nearly as much.

DR. DUNNICK: Women at a higher risk do have mammography more frequently, but still, 30 percent of them do not have periodic mammography. This difference is present from 41 to 49, but not over 50, which is interesting.

I wanted to ask Dr. Hanash about the exciting material he presented; when can we expect a fusion between basic bench science and a clinically usable test, given the problems of looking at probes for breast cancer, and also the problems that we see with PET scanning and with BRCA positive women.

DR. HANASH: Aside from funding problems, there is a major issue with respect to validation of interesting markers. The initial discovery work is usually done with very contrasting groups, those with overt disease and those who are completely normal, and in the disease group something promising shows up. But if our goal is early detection, the disease group is not representative of an early detection population. Having access to samples from an appropriate early detection population could be highly informative, but those types of samples are scarce and access to them is limited.

For example, we found a few potential lung cancer markers, and asked if we could have access to samples taken over a period of time from subjects who later were diagnosed with lung cancer to see if our markers worked to identify early disease. We were asked about evidence that our markers were effective for early detection. We answered that we did not have such evidence; we hoped testing the samples would provide the evidence. The people controlling the samples would not allow them to be used without some data on our markers' effectiveness in early detection. So, although we ultimately did get some access, initially we were in a catch-22, having to do our discovery work with samples from not the most appropriate study subjects.

For prospective validation of markers for screening, obviously it is impractical to embark on a validation study in a very low risk population. So there, I think you would have to consider strategies demonstrating that a marker builds on existing tests by improving sensitivity and specificity. It would expedite things if you could demonstrate early on that your panel of markers improved specificity and sensitivity of CT based screening for lung cancer or mammography based screening.

DR. FLETCHER: Randomized trials were mentioned in validating new technologies which, especially for prevention, take decades to complete. But relatively simple evaluations that do not take so long and do not require randomized trials can be carried out for new screening technologies. For example, for any new screening test, it is important to determine the test characteristics, sensitivity and specificity. Sometimes, even if sensitivity and specificity are determined for a new test, the evaluation is carried out in a diagnostic situation on patients who are symptomatic and/or have an abnormality. Assuming that the results of such an evaluation would generalize to a screening situation is dangerous. It may be reasonable to start out evaluating a new test in a diagnostic situation, in which you quickly know those who have cancer and those who do not, to learn how the test performs. But then the test's accuracy must be evaluated in a screening situation, because the spectrum of cancers is likely to be very different in that group. Too often, this kind of evaluation is not being done in a systematic way with newer screening technologies. I do not want us to think that we cannot know anything without a randomized trial. Systematic evaluation of the accuracy of a screening test does not require a randomized trial.

Finally, I just want to remind everybody that sometimes randomized trials end up with rather unexpected results. The Women's Health Initiative (WHI) leaps to mind. Here we thought women my age were supposed to be on long-term hormone replacement therapy to prevent several important chronic diseases, and all of a sudden not only the WHI but the Heart and Estrogen/Progestin Replacement Study and the Million Women Study are giving the lie to that conclusion. I was on the Board of Scientific Advisors at NCI, and there was concern about the high cost of the WHI. In retrospect, the cost was nothing compared to the billions of dollars being spent every year by women for a prevention therapy that we now see in an entirely different light. This teaches us to persist with randomized trials every once in awhile, even if they are expensive.

DR. ESSERMAN: I wonder whether there is a benefit to having a lot of these test sets public, whether something like the NCI's Early Detection Research Network is going to facilitate making the data available not just to the researchers, but to the scientific community in general? Will that approach to keeping track of and sharing data early on accelerate discovery and deployment?

DR. HANASH: There is this notion that one gene, one protein, one marker may not be enough, hundreds of them together might be required to be informative. If this is the case, obviously having all of the data in the public domain would be extremely useful so others could mine the same set of data using different kinds of software tools and different statistical approaches.

Others have thought, that patterns could emerge that are not understood, and that they will represent the diagnosis for various cancers. I hesitate to recommend that we rely on patterns that are not understood for a diagnosis. I think the resources and technology are available to decipher the unknown features and patterns, to link back to the disease process, so that if it does not look very plausible early on, we know there is a problem.

DR. NORTON, Memorial Sloan-Kettering Cancer Center: I would second that. Whole books have been written about how misleading blind looking at patterns can be and how not understanding the mechanism can lead you far astray in applications of technology developed in one area to other areas.

I want to emphasize that one of the important things about IOM reports is that they can influence policy makers. We have examples in this report of recommendations that I think should be publicized and acted on. One of them is the notion that breast cancer screening, when applied in other countries in an organized fashion, has clearly been shown to reduce mortality. The British epidemiologist, Richard Peto, has shown elegantly over time that as you introduce existing technology and do it in the proper fashion, you see a reduction in mortality. You deal with a country like ours, where screening is not well reimbursed, you see the response. It is a regulatory issue and a statutory issue. The fact is, we know that people are dying because of the mis-application or the lack of application of the technology. It conceptualizes a very important area that we have to address, which is, how do we effect societal change, which means governmental change as well.

I also think that probably the data already exist to answer many of our questions, or at least, the samples already exist. But we haven't heard about HIPAA, the Health Insurance Portability and Accountability Act, which prevents us from doing a lot of the retrospective work that is necessary, to correlate data from samples in serum banks with outcomes to try to address some of the important questions. There, too, we have a barrier that is getting between us and the ability to solve problems

DR. FLETCHER: From the perspective of an epidemiologist, we are running more and more into trouble with HIPAA regulations, too. I certainly hope the research community is going to be able to work to correct some of those.

DR. HANASH: The research community is vested in this, so it has to come from a third party, as opposed to we researchers trying to make a plea with the regulatory agency. The consumer and the public have to participate.

Footnotes

1

The Apex Foundation support was given in memory of Mabel Frost McCaw and Joan Morgan, and in honor of Sallie Nichols, Beth Weibling, Jane Carson Williams, Bonnie Main, and Amy McGraw.

2

Mean time for screen film mammography reading was 79.5 seconds (range, 15.8-444.8 seconds) and for digital mammography reading was 159.16 seconds. (range, 23.0-587.1 seconds) in a study at the Department of Radiology, Michigan State University, by Aben GR, Bryson, HA, Bryson TC, presented at the International Workshop in Digital Mammography, Chapel Hill, NC (personal communication by Etta Pisano, 2004).

Copyright © 2005, National Academy of Sciences.
Bookshelf ID: NBK83873

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (3.1M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...