NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Institute of Medicine (US); National Research Council (US); Pignone M, Russell L, Wagner J, editors. Economic Models of Colorectal Cancer Screening in Average-Risk Adults: Workshop Summary. Washington (DC): National Academies Press (US); 2005.

Cover of Economic Models of Colorectal Cancer Screening in Average-Risk Adults

Economic Models of Colorectal Cancer Screening in Average-Risk Adults: Workshop Summary.

Show details

MAJOR CHALLENGES TO MODELING THE COST-EFFECTIVENESS OF CRC SCREENING

The workshop benefited from presentations by leading researchers on the current state of knowledge about the natural history of CRC and the effects of screening, follow-up and surveillance. Those presentations took place on the afternoon of the first day and are published in the appendixes to this workshop report. They covered the following topics:

The presentations were intended to identify the best evidence both to improve models and to identify gaps in knowledge. Together with the collaborative modeling exercise, they stimulated workshop participants to confront three major issues that challenge the ability of models to provide useful information to health policy makers.

Uncertainty

Workshop participants spent a good deal of time addressing the uncertainty underlying the costs and effects of colorectal cancer screening. Louise Russell and Michael Pignone argued that pervasive uncertainty makes bottom-line conclusions about the comparative performance of different screening strategies, although not about the overall cost-effectiveness of screening itself, potentially inappropriate because they presume a degree of precision that the current state of knowledge cannot support and may never be able to support. Ironically, it is exactly that kind of precise, bottom-line guidance that decision makers seek.

The Extent of the Problem

The workshop presentations underscored how little is known about many aspects of screening or its consequences. Brian Mulhall's review of the wide range of estimates of fecal occult blood test sensitivity and specificity for adenomas in people who are recommended for screening suggested that uncertainty about test performance is not limited to new, emerging, or uncommon technologies. Fecal occult blood testing, one of the oldest technologies available for CRC screening, has been the focus of several large-scale randomized screening trials, all of which have demonstrated that it can reduce mortality from CRC (Jorgensen et al., 2002; Mandel et al., 1999; Scholefield et al., 2002). However, according to Mulhall, none of those trials has provided definitive evidence on its sensitivity and specificity for adenomas.

The range of uncertainty about the performance of other common screening technologies, such as sigmoidoscopy and colonoscopy, may be lower, but there is still substantial variation in findings across studies, according to Mulhall. Long considered the gold standard for detection of adenomas, with a sensitivity that was believed to be close to 100 percent, colonoscopy was found in a recent head-to-head comparison with virtual colonoscopy, a new radiological procedure, to miss about 11 percent of advanced adenomas (Pickhardt et al., 2003).

Martin Brown emphasized the uncertainty surrounding estimates of CRC treatment cost, which is a major component of the lifetime cost of a screening program. Variation in estimates of this cost, and its distribution over the years following diagnosis, can make for very large differences among models in estimates of the net cost of screening. A strategy that prevents a large number of cancers is far less costly when treatment costs, and thus savings from early detection, are high rather than when they are low. Although it might seem easy to make accurate estimates of such costs because billing and claims data are available from health care providers or insurers, prices can vary widely from provider to provider and across different payers. Moreover, estimates vary depending on whether they are based on prices charged or on audits of the amount and value of the labor and other inputs required to produce each service. Thus, Brown concluded, a seemingly straightforward element—the cost of treating colorectal cancer—is in practice subject to considerable uncertainty.

In her discussion of the current evidence on compliance with CRC screening, Sally Vernon emphasized the uncertainty about compliance in a screening program that continues over a patient's lifetime. Though much is known about factors that affect patients' adherence to screening, notably insurance coverage, education, and physician recommendations, survey evidence is insufficient to provide accurate estimates of the levels of adherence to periodic screening that could be expected over the long term. William Lawrence observed that we do not know whether compliance rates estimated from one-time consumer surveys represent a combination of some patients who fully adhere to a lifelong screening program and others who never receive any screening, or whether they represent a more homogenous population all of whom adhere to a screening strategy intermittently. In Lawrence's view, these distinctions have important implications for the effectiveness and cost of screening programs.

Uncertainty about compliance is also high because surveys define compliance differently. Sally Vernon observed that surveys that measure the number of patients who receive fecal occult blood test kits from their physicians typically report high compliance, whereas those that measure the number of test kits returned for analysis show much lower rates.

Uncertainty about the natural history of colorectal cancer in average-risk individuals was also a topic of discussion throughout the workshop. T.R. Levin summarized the evidence on several important aspects of that history. His review of one aspect—the proportion of CRCs that arise de novo, without spending time as a pre-cancerous adenoma—illustrates how difficult it can be to resolve uncertainty. Cancers arising from fast-progressing adenomas or from adenomas that are difficult to detect with even the most sensitive screening tests could be mislabeled as de novo. The difference could be important for comparing strategies because, for example, underestimating the proportion of cancers that arise de novo could favor screening technologies that have high sensitivity for adenomas over those that are better at diagnosing early cancer. Brian Mulhall observed that the emergence of molecular assays in the near future may offer new opportunities for definitive research on the question of de novo cancer.

How to Think About Uncertainty

A prerequisite to dealing with the effects of uncertainty is to recognize that it comes in different forms. Rocky Feuer offered a three-level classification. The first type, which he referred to as stochastic variation, arises from inherent randomness in disease processes and human behavior across the members of a population. Put simply, not everyone's disease follows the same course and screening and treatment do not have the same effects from person to person. Dealing with stochastic variation by itself is straightforward. Modelers would simply specify the known distribution of values for an input parameter. Statistical confidence intervals for model outcomes can be generated through analytic or simulation techniques. Feuer pointed out that the uncertainty discussed by workshop speakers and participants does not fall into this category.

The second level—parameter uncertainty—refers to the far more common situation in which the true values of the parameter, e.g., test sensitivity or the cost of treating CRC, are not well understood. Estimates about the population distributions of model inputs, drawn from medical and epidemiological research, are the “assumptions” on which models are built. Feuer explained that sensitivity analysis is the most appropriate approach to dealing with this kind of uncertainty. In sensitivity analysis, modelers let assumptions vary across a range of likely values and the resulting range of costs and effects is reported. When many parameters are uncertain, as they are in the case of colorectal cancer screening, experts recommend the use of probabilistic cost-effectiveness analysis to generate a bottom-line “confidence interval” or “credible interval” of cost-effectiveness ratios (Briggs et al., 2002; Gold et al., 1996). Such an interval allows users to understand the simultaneous effect of uncertainty about many parameters on the range of cost-effectiveness ratios that result.

The third level of uncertainty is structural, according to Feuer. In that case, researchers may have little to go on about the relationships and interactions among key parameters. They may therefore choose to model those relationships in different, perhaps even arbitrary ways. The debate over whether some cancers arise de novo or from pre-existing adenomas is an example of structural uncertainty. To deal with this unknown, some modelers have assumed that such lesions are simply very fast-moving adenomas, while others have assumed that they can never be detected as adenomas. Other examples of structural uncertainty are the effects of including or excluding the consequences of detecting non-adenomatous polyps, the cost of a patient's time engaged in screening, or the impacts on individuals' quality of life from both screening and colorectal cancer. In Feuer's view, the five models highlighted in this workshop represent five different approaches to resolving structural uncertainty.

Strategies for Managing Uncertainty

As workshop participants grappled with how best to deal with the effects of parameter and structural uncertainty on the ability of CEA models to produce the answers policy makers want, many ideas surfaced on how to reduce, or at least manage, those effects.

Research strategies. To reduce the uncertainty about important assumptions, many participants called for more primary research, particularly on those factors that account for the greatest variation among models. The areas most frequently cited were costs and compliance. Laura Seefe outlined work that the CDC is conducting with several states to enhance the utilization of CRC screening. That research should provide more evidence on the degree of compliance that can be expected from different screening program designs. Alan Gerling endorsed more studies of the impact of public awareness programs and other recruitment strategies on adherence, along the lines of those currently underway in pilot studies in the United Kingdom. Michael Pignone suggested that research aimed at getting better estimates of the lifetime cost of treating colorectal cancer might do more to resolve differences among models than would research on other parameters.

Another approach mentioned by several workshop participants to help understand the effect of uncertainty is to evaluate model predictions against independent results from well-designed trials of screening programs. The presentations by leaders of the five research teams showed that several have used data from large fecal occult blood testing screening trials to evaluate the extent to which their models' predictions of cancer incidence and mortality over time agree with the results found in the trials. The ongoing PLCO trial (Schoen et al., 2003; Gohagan et al., 1995), which is testing sigmoidoscopy screening, will soon provide a new dataset to support validation, according to Robert Schoen. NCI's CISNET program acts as a catalyst for sharing useful databases from NCI-sponsored studies among member research teams, said Rocky Feuer.

Karen Kuntz reminded the workshop participants that the paucity of data on important assumptions often leads model builders to use data from trials to inform their choice of values for critical assumptions, such as the sensitivity and specificity of screening tests. She warned that validating a model with data that were also used in part to build model assumptions does not provide a true test of the validity of model predictions.

Short of evaluating the predictive validity of models with independent data sources, research teams can assess other measures of validity,14 such as whether a model contains all of the components of cost and effect that one would expect to be important. For example, Reid Ness's presentation of his team's reanalysis of the pre-workshop exercise revealed that leaving out the cost of working up non-adenomatous polyps can have a large effect on model outcomes. Louise Russell mentioned that an important component of cost omitted in all of the CRC models presented at the workshop is the value of patients' time lost in screening. That omission biases model results toward screening technologies that are time intensive for the patient and against technologies that are fast and convenient. Martin Brown explained that with no published empirical studies of this cost component modelers typically exclude it.

Several participants expressed skepticism that either research approach—primary research on uncertain assumptions or excluded components or greater availability of independent data sets for model validation—will fully resolve uncertainty, in part because of the cost of generating new information but also because technological advances in screening and treatment continually create new unknowns. In technical terms, the confidence or credible interval for the cost-effectiveness of one strategy is likely to overlap that of others. Recognizing this reality, they suggested steps that might help decision makers make more appropriate use of the information that CEAs can generate.

User strategies. One line of thinking expressed at the workshop was that policy makers have little choice but to accept the discrepant results from models because those results simply reflect the lack of medical and epidemiological evidence. Policy makers could adopt a message that focuses on the value of CRC screening in general, leaving specific choices of strategies to physicians and patients. Robert Dittus suggested that an appropriate message for providers might be, “here is a collection of approaches that we think are good and they all fit within the general realm of ‘it's a whole lot better than nothing.” Robert Smith, on the other hand, argued for opinion leaders to advocate a practical screening strategy that offers the greatest protection and best outcomes for patients given what we know today.

Others noted that colorectal cancer screening involves high cost as well as great medical benefits, so choosing one strategy over another can mean differences of billions of dollars and hundreds of thousands of years of life when summed over the entire U.S. population 50 years of age and older. Although Richard Lilford observed that “the complexity of choice is so great in medicine that the questions are unanswerable,” he also recognized that decisions must nevertheless be made based on the best information available. In his view, models offer one type of information to assist in those decisions. The ultimate choice of screening strategy, according to Lilford, will be influenced by political pressures and preferences as well as by models laying out costs and effects as best they can.

Some participants addressed ways to help policy makers better understand the levels of uncertainty represented in models. Judith Wagner commented that editors of medical journals can play a useful role in this regard. The pre-workshop exercise revealed, for example, how uneven and sometimes vague the descriptions in published papers were of models' assumptions about compliance and diagnostic follow-up protocols. Clear descriptions of the assumptions in these and other important areas should be a priority. Wagner also observed that published CEAs of CRC screening have often evaluated a single screening strategy not examined in published work by other modeling teams. That practice makes it difficult for readers to assess the level of agreement across models. Authors seeking to evaluate the cost-effectiveness of a new screening technology might be asked to report on the model's outcomes for a common set of well studied screening strategies, such as the five strategies used in the pre-workshop exercise. Decision makers could then assess whether the cost-effectiveness results for the new technology might differ if assessed by other models.

Some participants called for research teams to provide more access to their models, even suggesting that models be placed in the public domain on the Internet to allow decision makers to test the impact of different assumptions or strategies. Others pointed out, however, that models entail a substantial investment in researchers' intellectual capital, which could be compromised by open access. Rocky Feuer suggested as a middle ground the Model Profiler currently under development as part of the CISNET program. The Profiler, described in Feuer's presentation, is expected to provide open web-based access to detailed information on model structure, assumptions and outputs, but not to the models themselves.

Modeling Reality or an Ideal World?

Another issue that threaded its way through the discussion concerned which of the following two questions CRC screening models should seek to answer: “What can be expected to happen under a given strategy?” Or “What could happen if the strategy is implemented under ideal conditions?”

The modeling of compliance is an obvious example of this issue. Perfect adherence to a strategy, including the periodic screening examinations, the specified follow-up, and surveillance protocols is the ideal, but it is not achieved in practice, as Sally Vernon showed in her presentation.

Another example of the tension between modeling the ideal and modeling reality is how models handle diagnostic follow-up of a positive sigmoidoscopy. Some experts recommend that polyps found on sigmoidoscopy be biopsied or removed during that procedure, with referral for a full colonoscopy only if the polyp is found to be a high-risk or advanced adenoma. But Todd Anderson's presentation on the frequency of follow-up and surveillance procedures suggests that the vast majority of polyps removed from patients undergoing sigmoidoscopy are removed in a subsequent colonoscopy. Whether models are based on recommended or actual practice in this regard could affect the costs and effects of sigmoidoscopy.

Several participants noted that limits on the supply of screening procedures represent an area in which reality may force departures from ideal conditions. Certain procedures—notably colonoscopy, but in the future, perhaps, virtual colonoscopy—require trained specialists to perform or interpret them. Like most people, medical specialists respond to economic incentives and higher reimbursements for screening or surveillance procedures would induce them to do more. However, some strategies may require so many colonoscopies that the supply of gastroenterologists would be completely inadequate.15 The same may be true of radiologists, should virtual colonoscopy become a routine screening procedure (Herdman and Norton, 2005). The supply of specialists cannot be expanded quickly, so real constraints on capacity may have to be taken into account. Sandeep Vijan observed that assuming low compliance for certain screening or surveillance procedures is one way models could implicitly account for such constraints.

Seth Glick emphasized the divergence between test performance under ideal quality assurance programs and test performance in current practice. The quality of many screening processes may be poor today, in Glick's view. Estimates of test sensitivity, specificity, and medical risk, usually taken from studies where good quality assurance existed, therefore overestimate the performance of screening in the absence of strong quality assurance programs.

The basic problem, in the view of Martin Brown, is not the choice between the real and the ideal per se, but the failure of researchers to make explicit their choices about it. Moreover, their implicit choices may vary within a model, with optimal assumptions in one area and realistic assumptions in another and no explicit rationale for the differences. He and several other participants argued that both kinds of analysis are useful. For example, analysts could tell patients and physicians what can be expected if consumers fully comply with a strategy, and then show the decrements in effects and costs resulting from less complete compliance. Michael Pignone cautioned that researchers who model ideal conditions need to include the full costs of achieving those conditions. Quality assurance and high rates of compliance do not come free. They are usually the result of intensive programs of behavior change that must continue for the duration of a screening program.

How Complex Should Models Be?

Participants returned repeatedly throughout the workshop to the question of whether models should be capable of evaluating complex screening strategies. This question surfaced often because the workshop participants, including the modelers, are fundamentally interested in the health policy question—what screening strategy is best?—not in modeling for its own sake. When the high lifetime costs of some very effective screening strategies become apparent in all models, a natural next step is to explore how those costs could be reduced, without compromising effectiveness, by fine-tuning strategies. Such fine-tuning drives modelers to add more branches to their strategies, which places even greater demands on the clinical and epidemiological evidence available to support such modeling.

Many ideas for complex screening strategies were put forward at the workshop. Participants suggested strategies such as changing the screening, follow-up, or surveillance schedule as a person ages, offering different screening strategies to different demographic groups with different relative risks, or changing a schedule contingent on the results of a previous screening, follow-up, or surveillance test. For example, Ann Zauber observed that men and women have different profiles of adenoma and CRC incidence, with women developing CRC an average of 10 years later than men do (Chu et al., 1994; Cooper et al., 1995; Devesa and Chow, 1993). Different screening strategies for men and women might make sense and could be explored by models. Donatus Ekwueme raised similar possibilities for tailoring strategies by race in recognition of the systematic racial differences in incidence, prevalence, and location in the colon of adenomas and polyps (Devesa and Chow, 1993; Theuer et al., 2001; Walker Jr et al., 1995). Michael Pignone and Marjolein van Ballegooijen suggested that it might make sense to alter the type of screening test as a person ages, saving more sensitive but more expensive tests until the individual is at higher risk of advanced adenomas or CRC.

To John Inadomi, the most important clinical question is whether surveillance following polypectomy is cost-effective and how often it is needed. He and Reid Ness argued that selective post-polypectomy surveillance strategies—where high-risk individuals are monitored more often than those at low-risk of future polyps or cancer—have the greatest potential to reduce costs. But, to make such judgments without imperiling outcomes, accurate data on the factors that matter are needed. Deborah Schrag summarized the results of the National Polyp Study (Winawer et al., 1993), which found that intensive surveillance strategies offer little additional benefit compared with protocols that condition future surveillance on the nature of the polyp removed and on the result of the first surveillance test.

Mark Fendrick raised the possibility of adjusting screening strategies to account for individuals who had already received a colonoscopy for symptoms or non-screening reasons. David Lieberman commented that in reviewing a large endoscopic database he found 40 to 50 percent of all colonoscopies were performed either for vague symptoms or for rectal bleeding. The results of most of those procedures either are negative or show benign polyps. If models were adjusted to assume that 40 to 50 percent of individuals have already been screened before a formal screening program starts, rather than assuming that no one has been screened, as they currently do, the predicted cost of screening would be lower.

Despite the enthusiasm for complex strategies (and for estimating their potential to save costs or increase effectiveness), several participants sounded notes of caution. Mark Fendrick emphasized barriers to making complex screening strategies operational. He described the difficulty one major medical center had in providing same-day follow-up colonoscopy for people with positive screening sigmoidoscopic examinations, even when those patients had been clinically prepped beforehand for a possible colonoscopy. He and Sandeep Vijan also warned that presenting too many options could overwhelm patients and ultimately reduce their willingness to participate in screening or surveillance. Robert Dittus held out hope that new medical information systems, such as automated test ordering and electronic medical records with built-in guidelines, will make it easier for physicians to implement complex strategies.

Several participants held that if complex strategies offer substantial hope for moderating costs without reducing the benefits of screening, then models should stand ready and be capable of assessing them. But, argued Amnon Sonnenberg, given the information requirements of complex models, we may be expecting models to do too much. Sometimes if models become too complex, they go off the mark simply because they must make too many assumptions based on too little evidence. Thus, the discussion of complexity ended with a reprise of the first problem for modeling, as for decision making in general: uncertainty.

Footnotes

14

For a description of different kinds of validity, see the research methods web page maintained by William Trochim at http://www​.socialresearchmethods​.net/kb/index.htm (Trochim, 2004).

15

Recent news accounts suggest that waiting times for colonoscopy are growing in certain areas of the country (Kowalczyk, 2004). New evidence also suggests that some surveillance colonoscopies are being done more frequently than is recommended by professional guidelines (Mysliwiec et al., 2004).

Copyright © 2005, National Academy of Sciences.
Bookshelf ID: NBK83882

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (5.4M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...