NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Lin JS, Perdue LA, Henrikson NB, et al. Screening for Colorectal Cancer: An Evidence Update for the U.S. Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2021 May. (Evidence Synthesis, No. 202.)
Screening for Colorectal Cancer: An Evidence Update for the U.S. Preventive Services Task Force [Internet].
Show detailsOverall
We conducted this review to support the USPSTF in updating its recommendation on screening for CRC. Since the previous recommendation was published in 2016, we have included 70 new studies. Among them are 13119, 122, 125, 127, 130–132, 135–137, 139, 149, 150 studies that assessed the effectiveness or comparative effectiveness of screening on CRC incidence and/or mortality, 28 new studies154, 155, 158, 159, 161, 162, 164, 166, 168, 170, 171, 173, 179, 180, 187, 189, 196–200, 202–204, 206–209 that assessed the diagnostic accuracy of screening tests, and 37 new studies119, 125, 127, 130, 135, 136, 150, 198, 217, 218, 221, 226, 231, 237, 240, 244, 248, 250, 260–262, 270, 271, 281, 282, 287, 290, 298, 302, 303, 307–313 that assessed harms.
Numerous tests have been studied for their use in screening for CRC in average-risk adults, including FS, colonoscopy, CTC, capsule endoscopy, gFOBT, FIT, and sDNA-FIT, as well as serum- and urine-based tests (Table 29). These tests have different levels of evidence to support their use, of proven ability to detect cancer and/or precursor lesions, and of risk of serious adverse events. At this time, most trials comparing screening modalities are limited in their study design and power to evaluate the comparative effectiveness on the reduction of CRC incidence/mortality or comparative harms. Therefore, they cannot answer questions on the relative benefit and harms (tradeoffs) between the tests. Currently seven randomized controlled trials of CRC screening are underway (Appendix I). Two trials have a usual care arm: SCREESCO (n=200,000), comparing FIT and colonoscopy to usual care and NordICC comparing colonoscopy to usual care (n=66,000). The other five trials are comparing various screening strategies: FIT versus colonoscopy (COLONPREV, CONFIRM), FOBT versus FS (Norwegian trial), CTC versus FS (Italian trial), FOBT versus FOBT and colonoscopy (Japanese trial). Three trials have reported baseline detection rates,135, 221, 357 but the primary results from these trials are unlikely to be published over the next few years, due to the long followup time needed to assess differences between groups in CRC incidence and mortality rates. Notably, only one of these trials recruited adults younger than 50 years (Japanese trial); but with a smaller sample size (n=10,000), it is unlikely to inform any decisions about the age to begin screening. With that in mind, this systematic review of the available evidence will be used in tandem with microsimulation modeling conducted by CISNET CRC, which addresses issues around the comparative effectiveness of available tests, as well as decisions around age to start/stop screening.
Robust data from well-conducted, population-based screening RCTs demonstrate that both intention to screen with Hemoccult II and FS can reduce CRC mortality. However, Hemoccult II and FS are no longer routinely used for screening in the United States. Therefore, we have limited empirical data on true programs of CRC screening and screening modalities used in clinical practice today. Expensive, large population-based trials of newer tests may not be necessary, as evidence-based reasoning supports the theory that screening with endoscopy or stool tests with a sensitivity as good as, or better than existing tests (without a tradeoff in specificity) will result in CRC mortality reductions similar or better than reductions shown in existing trials.358 Our review reveals that newer stool tests meet those requirements, including single-sample testing via FITs (e.g., OC-Sensor and OC-Light FIT families) and three-sample testing via HSgFOBT (i.e., Hemoccult Sensa). Stool tests that maximize sensitivity, such as FITs that use lower cutoffs or sDNA-FIT(i.e., Cologuard), have lower specificity and therefore require new trials or modeling exercises to understand the tradeoff of more false-positive test results. Other non-invasive testing (i.e., serum or urine tests) with test performance similar to or better than stool tests (i.e., based on test accuracy and adherence to screening) would also be expected to result in CRC mortality reductions similar to or better than reductions in existing trials. Thus, if the spectrum of disease detected by sDNA, serum, or urine testing is similar to that detected by stool testing checking for occult blood, then large population-based trials may not be necessary to evaluate their effectiveness in screening average-risk adults for CRC. Although imperfect, colonoscopy remains the criterion standard for assessing the test performance of other screening tests and is widely regarded as the standard for colorectal cancer screening in the United States. However, the mortality benefit of colonoscopy has not been evaluated in trials and the superiority of colonoscopy compared with other tests in a screening program has not been established. Colonoscopy is also significantly more invasive, with greater accompanying procedural harms, and potential harms of overdetection (unnecessary polypectomy/surveillance) than other available testing. CTC has evidence to support the adequate detection for CRC and larger potential precursor lesions. Although risk of immediate harms from screening CTC (such as bowel perforation from insufflation) is very low, it is unclear what (if any) true harm is posed by cumulative exposure to low doses of radiation or detection of extracolonic findings. Noninvasive serum and urine tests are promising given the potential for better patient acceptability (and therefore adherence) than stool-based testing.359 Serum testing for circulating mSEPT9 in one study appears to have slightly lower sensitivity and lower specificity to detect CRC than commonly evaluated/employed FITs. And a metabolomic urine test shows promise for similar detection of AA than serum testing, but no evidence yet exists for its test performance to detect CRC in a screening population. Likewise, evidence for use of capsule endoscopy in a screening population is limited to very small test accuracy studies with high incompletion or inadequate study rates. Below we summarize the evidence and implementation concerns for direct visualization tests (FS, colonoscopy, and CTC) and stool tests (gFOBT, FIT, sDNA-FIT) with evidence to support their use in screening.
Direct Visualization Tests
Endoscopy
FS and Colonoscopy Benefits
Four large population-based RCTs evaluating screening FS showed that intention to screen with one-time FS (or, in the PLCO trial, two rounds of FS) was consistently associated with a decrease in CRC incidence (IRR 0.78 (95% CI, 0.74 to 0.83) and CRC-specific mortality (IRR 0.74 (95% CI, 0.68 to 0.80) compared with no screening at 11 to 17 years of followup. Despite this robust evidence, recent utilization data in the United States suggest that FS (with or without stool testing) is very uncommon (<1%).360 Public and clinician perceptions of accuracy of colonoscopy versus FS, given the reach of endoscopy, also play an important role in the low utilization of FS compared with colonoscopy.361 Although from included studies, FS is associated with a reduction in CRC incidence and mortality for both proximal and distal cancers, albeit greater reductions for distal cancers. We found no studies estimating the test accuracy of FS compared with a colonoscopy reference standard. Estimates of FS sensitivity and specificity are based on a limited number of relatively small studies with suboptimal study designs (e.g., tandem FS studies, simulated studies using colonoscopy and assumed FS reach to splenic flexure).90 Sensitivity of FS to detect CRC calculated from the PLCO trial is 69.6 percent362; however, the test accuracy of FS to detect CRC may depend on the referral criteria, as criteria resulting in greater followup colonoscopy may detect a greater number of cancers—particularly proximal cancers. For example, the PLCO trial used nonbiopsy referral-based criteria for followup colonoscopy and had the highest referral rate to colonoscopy (about 33%) of all the trials.
Only one prospective cohort study has evaluated the association of the receipt of screening colonoscopy and CRC mortality in average-risk adults.21 However, this study is part of a larger evidence base of population-based case control studies and retrospective cohort studies demonstrating an association of screening colonoscopy and reduction in CRC incidence and/or mortality.19, 20, 156, 363–367 This included study using data from the Nurses’ Health Study and the Health Professionals Follow-Up Study found that CRC mortality was lower in people with at least one screening colonoscopy versus those who never had a screening endoscopy (adjusted HR, 0.32 [95% CI, 0.24 to 0.45]) at 24 years of followup. Another included study conducted among Medicare beneficiaries found that receipt of screening colonoscopy was associated with a lower incidence of CRC after 8 years as compared with no screening colonoscopy in people ages 70 to 74 years; this study did not report CRC mortality outcomes.128 This magnitude of association from observational studies should not be compared with the magnitude of effect in CRC mortality in intention to treat analyses from RCTs of screening FS. Currently, one large screening RCT in average risk adults, NordICC, evaluating the impact of screening colonoscopy to usual care on CRC incidence and mortality in Norway, Sweden, Poland, and the Netherlands, is underway.368
We included only four studies for which we could derive community-based relevant estimates of test accuracy, evaluating screening colonoscopy against a criterion standard. However, none of them were designed to estimate the test performance for CRC. Based on three studies, the per-person sensitivity for colonoscopy to detect adenomas 10 mm or larger ranged from 89.1 to 94.7 percent and the per-person sensitivity to detect adenomas 6 mm or larger ranged from 74.6 to 92.8 percent. Colonoscopies in these studies were conducted by experienced endoscopists; test performance will vary in clinical practice based on the adequacy of bowel preparation and colonoscopist performance/experience. A separate body of evidence addressing adenoma miss rates from tandem colonoscopy studies, not included in our review, confirms that colonoscopies can miss adenomas. A 2019 systematic review of 43 studies of over 15,000 tandem colonoscopies demonstrated that miss rates for adenomas and AA are higher than previously appreciated.95 Both the effectiveness and test accuracy of colonoscopy may vary depending on a number of factors including the examiner quality. The American Society for Gastrointestinal Endoscopy, American College of Gastroenterology, and U.S. Multi-Society Task Force have issued guidance and recommendations for the technical performance and quality improvement targets for colonoscopy.369, 370 In addition, there is a growing body of evidence, not included in this review, that evaluates whether technological advancements in colonoscopy to improve adenoma detection, namely chromoendoscopy or digital/virtual chromoendoscopy (e.g., narrow band imaging, flexible spectral imaging color enhancement), endoscopic technologies to increase mucosal surface area inspection (e.g., wide-angle lens or full-spectrum endoscopy, through-the-scope retrograde viewing device), and computer aided detection using artificial intelligence can improve detection, but data are limited to support widespread adoption in screening or average-risk populations.371–375
FS and Colonoscopy Harms
Serious adverse events from screening colonoscopy or colonoscopy in asymptomatic persons are estimated at 14.6 serious bleeds (95% CI, 9.4 to 19.9) and 3.1 perforations (95% CI, 2.3 to 4.0) per 10,000 procedures. This estimate of serious bleeds is higher than appreciated in the prior review to support the 2016 USPSTF recommendation (8.2 serious bleeds per 10,000 procedures, 95% CI, 5.0 to 13.5).1 Overall, it appears the risk of major bleeding and perforation is higher with increasing age. Other serious harms (e.g., infections, other GI events, cardiopulmonary events) were not consistently reported, and four studies evaluating harms in people who received colonoscopy versus those who did not found no increased risk of serious harms (including MI, CVA, or other cardiovascular events) as a result of colonoscopy. Serious adverse events from screening FS are rare (0.5 [95% CI, 0 to 1.3] serious bleeds and 0.2 perforations [95% CI, 0.1 to 0.4] per 10,000 procedures); however, screening FS may require followup colonoscopy. Serious harms of colonoscopy following screening FS are estimated at 20.7 serious bleeds (95% CI, 8.2 to 33.2) and 12.0 perforations (95% CI, 7.5 to 16.5) per 10,000 colonoscopies.
Case reports of fatal or near-fatal outcomes in average-risk people undergoing routine colonoscopy include splenic rupture, retroperitoneal or intra-abdominal hemorrhage, retroperitoneal gas gangrene, bowel infarction or ischemic colitis, small bowel perforation, colonic gas explosion with electrocautery, and appendicitis or appendiceal abscess.82 In addition, there have been case reports of transmission of communicable diseases (i.e., hepatitis C virus, human papillomavirus) using unsanitized colonoscopes and chemical colitis from glutaraldehyde, which is used to disinfect endoscopes.82
We found no studies directly assessing the harms of cancer overdiagnosis (i.e., cancer detected through screening that would have not otherwise clinically manifested during a person’s lifetime). One Markov modeling study using data from over 4 million screening colonoscopies from Germany’s national screening colonoscopy registry, found that the risk of overdiagnosis was very low in people ages 55 to 79 years and 28 percent of the overdiagnoses occurred in people older than age 75 years.376 Another potential harm is the overdetection of adenomas (i.e., adenomas detected through screening that would not develop into cancer and/or otherwise clinically manifested during a person’s lifetime) leading to unnecessary procedures or more intensive colonoscopy surveillance.
CTC
CTC Benefits
While we found no studies examining the impact of screening CTC on cancer incidence or mortality, there is a robust evidence base evaluating the test performance of screening CTC in average-risk adults. However, none of these studies were designed to estimate test performance to detect cancer. Based on seven studies of CTC with bowel preparation, the per-person sensitivity and specificity to detect adenomas 10 mm or larger ranged from 66.7 to 93.5 percent and 86.0 to 97.9 percent, respectively; and to detect adenomas 6 mm or larger ranged from 72.7 to 98.0 percent and 79.6 to 93.1 percent, respectively. It is unclear whether the variation in test performance is due to differences in study design or populations studied or differences in bowel preparation, CTC imaging, reading protocols, and radiologist experience. In the included studies and current practice there is variation in bowel preparation (e.g., full, partial, none) and CTC technical enhancements (e.g., increasing detectors, fecal tagging, electronic cleansing, computer aided detection, insufflation techniques). Because some variation in accuracy is likely due to CTC protocol and/or radiologist ability, both the American College of Radiology and the International Collaboration for CT Colonography Standards have recommended practice guidelines and quality metrics, as well as specifications for training and certification.377–379 In practice, the standard appears to be dry preparation (sodium phosphate, magnesium citrate, bisacodyl) rather than wet preparation (PEG) because of patient preferences and because PEG can leave liquid in the colon that can potentially obscure lesions.380 Fecal tagging now appears to be routinely employed (oral ingestion of high-density oral contrast agent so that residual colonic contents can be differentiated from polyps) and appears to decrease the need for cathartic preparation. Additionally, there are different contrast agents, either barium- or iodine-based (ionic and nonionic), and the selection of which to use is largely based on local experience. Current practice centers on multidetector row CT scanners, which uses much thinner slices with faster scan times, resulting in better imaging and decreased radiation dose. Finally, there are differences in reading software. Commonly used reading software allows for both two- and three-dimensional display. The selection of the primary method appears to depend on radiologist preference. Other practice variations that influence the impact and implementation of screening CTC includes colonoscopy referral or surveillance criteria, as well as coordination with colonoscopy resources. Currently, there is consensus that large lesions (≥10 mm) should be referred to colonoscopy for polypectomy. There is variation in practice for smaller lesions, such that 6- to 9-mm lesions may be referred to colonoscopy for polypectomy or be monitored with CTC surveillance (with a followup CTC in 3 years), and the smallest lesions (≤5 mm) may be ignored or monitored. The American College of Radiology states that people with lesions of 6–9 mm should be offered colonoscopy and lesions smaller than 5 mm need not be reported.205, 378, 381, 382 Preference for CTC over colonoscopy may be, in part, due to difference in bowel preparation. Ideally, while same-day colonoscopy could avoid duplicate preparation, it may result in suboptimal colonoscopy if limited bowel preparation is used for CTC and would require close coordination between radiology and gastroenterology departments/services.
CTC Harms
Immediate serious adverse events from screening CTC appear to be uncommon. Perforations were the most commonly reported harms (estimated at 1.3 per 10,000 examinations [95% CI, 0 to 2.9]); however, these perforations were detected radiographically (not symptomatic) and sustained by room-air manual insufflation which is no longer used in practice. However, like FS, CTC may require followup or therapeutic colonoscopy, and we did not find sufficient evidence to estimate serious adverse events from colonoscopy followup procedures.
Potential harms from CTC include exposure to radiation, especially if used in a program of screening that requires repeated examinations. Radiation dose in our included studies ranged from 0.8 to 5.3 mSv, consistent with a 2012 survey of academic and nonacademic institutions which found that the median radiation dose per screening CTC examination was 4.4 mSv,383 384, 385and a 2018 narrative review reporting the typical radiation exposure associated with a CTC examination at ≤3 to 6 mSv (which is higher than radiation exposure from digital mammography or CT for lung cancer screening).386
Given that the average amount of radiation exposure from background sources in the United States is about 3.0 mSv per year,387 ionizing radiation from a single CTC examination is low. Even low doses of ionizing radiation, however, may convey a small excess risk of cancer.388, 389 We identified no studies directly measuring the risk for stochastic effects (i.e., cancer) caused by radiation exposure from CTC. We can indirectly estimate these adverse effects, however, based on the range of effective radiation dose for CTC reported in the literature and estimates of lifetime attributable risk of malignancy (i.e., all solid cancers and leukemia) from the National Research Council report “Health Risks From Exposure to Low Levels of Ionizing Radiation.”387 Based on this report, the council predicts that approximately one additional individual per 1,000 would develop cancer (solid cancer or leukemia) from an exposure of 10 mSv above background using the linear no-threshold (LNT) model. In comparison, 420 individuals per 1,000 would be expected to develop cancer from other causes over their lifetimes. Because of limitations in the data used to develop risk models, the risk estimates are uncertain, and variation by a factor of 2 or 3 cannot be excluded.387 Multiple organizations support the LNT model to estimate potential harms of radiation exposure of less than 100 mSv, including the Nuclear Regulatory Commission, the International Commission on Radiological Protection, the U.S. National Council on Radiation Protection and Measurements, the United Nations Scientific Committee on the Effects of Atomic Radiation, and the U.K. National Radiological Protection Board. Other organizations, however, believe that the LNT model is an oversimplification and likely overestimates potential harms of low-dose radiation exposure, including the Health Physics Society, the France Academy of Sciences/National Academy of Medicine, and the American Nuclear Society.390 The effective radiation dose in CTC targets the abdomen and would not likely increase the risk of certain prevalent cancers (e.g., cancers of the breast, thyroid, or lung), although the risk for leukemia or abdominal organ cancer may remain. This risk estimate is consistent with other published literature on radiation exposure risk from CT.388, 391
Modeled data based on the National Research Council’s assumptions, and using a mean dose of 8 mSv for women and 7 mSv for men per CTC examination, found that the benefits of CTC screening every 5 years (from ages 50 to 80 years) far outweigh any potential radiation risks, with 15 cases of radiation-related cancers per 10,000 persons screened (95% CI, 8 to 28) versus 358 to 519 CRC cases prevented per 10,000 persons screened.392
Extracolonic Findings
CTC also detects extracolonic findings, which could be a benefit (e.g., detection of intervenable extracolonic cancer, abdominal aortic aneurysm) or harm (e.g., overdiagnosis, procedural harms from subsequent testing). Extracolonic findings are very common and increase with age. Approximately 1.3 to 11.4 percent of CTC exams have extracolonic findings that necessitate actual diagnostic followup. Only a small proportion of CTC exams have findings that ultimately require any type of definitive treatment (≤3%). Therefore, judicious handling of the reporting and diagnostic workup of extracolonic findings is crucial to minimize the burden of testing (and associated cost and harms of testing), as many findings ultimately prove to be of no clinical consequence. Additional reading software may allow for repurposing CTC examinations to obtain bone mineral density from the lumbar spine to screen for osteoporosis if desired/indicated.393, 394 It remains unclear whether detection of extracolonic findings represents a true overall benefit or harm based on empirical evidence.
Harms of Bowel Preparation
Common bowel preparation agents for FS include enemas and occasionally oral laxatives, while bowel preparation agents for colonoscopy and CTC include PEG solution, oral sodium phosphate solution, and sodium picosulphate, with or without additional oral laxatives. Common minor adverse events include nausea, vomiting, abdominal pain, abdominal distention/bloating, anal irritation, headache, dizziness, electrolyte abnormalities (e.g., hyponatremia, hypokalemia, hypocalcemia, hyper- or hypophosphatemia), and poor sleep. Therefore, the necessity of bowel preparation can affect adherence to endoscopy or CTC. However, serious adverse events (e.g., severe dehydration, symptomatic electrolyte abnormalities) are generally limited to people with major predisposing illnesses, and the selection of a bowel preparation agent may depend, in part, on underlying comorbidities (e.g., sodium phosphate use is generally avoided in people with renal, cardiovascular and GI motility impairment, sodium picosulfate is generally avoided in older adults).82 Overall, existing systematic reviews on bowel preparation for endoscopy suggest similar tolerability based on the number of minor adverse events, no difference in efficacy of preparation, and no clinically significant adverse events with PEG or sodium phosphate.395, 396 Case reports of serious adverse events from bowel preparation from PEG or sodium phosphate in average-risk people undergoing colonoscopy include acute renal failure and acute phosphate nephropathy, ischemic colitis, symptomatic hypokalemia, seizure secondary to hyponatremia, and Boerhaave syndrome (barogenic esophageal rupture).82
Stool Tests
To date Hemoccult and Hemoccult II are the only stool CRC screening tests that has been evaluated in RCTs. These trials demonstrate that intention to screen with gFOBT can decrease CRC-specific mortality by 9 to 22 percent (biennial screening, five studies) or by 32 percent (annual screening, one study) in a program of screening after 11 to 30 years of followup compared with no screening. However, only one of these trials demonstrated a reduction in CRC incidence.143 In general gFOBT has been replaced for the most part by more sensitive stool based tests (i.e, HSgFOBT or various FITs). In the United States, many health systems and coordinated screening programs now use FITs, as opposed to gFOBT, to screen for CRC.397–401 FITs usually require only one sample and eliminate dietary and medicinal restrictions, which generally improves ease of and adherence to testing.402, 403
We found one prospective cohort study that evaluated a national screening program in Taiwan in which one to three rounds of biennial FIT were associated with lower CRC mortality compared with no screening at up to 6 years followup (adjusted RR 0.90 [95% CI, 0.84 to 0.95]). We excluded one large (n=192,261) RCT conducted in rural China that compared single FIT screening to no screening because of its applicability to US practice,404 and another ongoing RCT of FIT screening to no screening in Thailand.405 In this trial, a single round of FIT testing had no statistically significant impact on CRC mortality (RR, 0.88 [95% CI, 0.72 to 1.07]) at 8 years of followup. Other studies evaluating national FIT screening programs were excluded because they did not have an unscreened contemporaneous comparator arm, they had very limited followup, and/or their analyses were at high risk of bias. In general, studies with a contemporaneous control group demonstrated that an invitation to FIT screening resulted in a greater number of cancers detected than no invitation to screening and/or a higher proportion of early-stage CRC with an invitation to FIT screening compared with no invitation to screening.347, 349, 406 One additional excluded study of a FIT screening program conducted in the United States (Kaiser Permanente Northern California) that had a historical control group found that implementation of organized annual screening with a FIT (OC-Sensor) in people ages 51 to 75 years compared with usual care was associated with higher screening participation and decreased CRC mortality over time.407
Despite the lack of trials on stool tests used in clinical practice, tests that identify the same spectrum of disease as Hemoccult II do not need to be evaluated in large population-based RCTs if they have the same or better performing sensitivity and specificity. Both Hemoccult Sensa and FITs have higher sensitivity than Hemoccult II without a tradeoff in specificity. However, Hemoccult Sensa has more limited data, significant imprecision around test accuracy and requires three stool samples. Based on 2 studies with colonoscopy as the reference standard, the sensitivity to detect CRC ranged from 0.50 to 0.75 (95% CI range, 0.09 to 1.0) and the specificity ranged from 0.96 to 0.98 (95% CI range, 0.95 to 0.99) for Hemoccult Sensa. Based on 13 studies with colonoscopy as the reference standard, the OC-Sensor FIT family had a sensitivity to detect CRC of 0.74 (95% CI, 0.64 to 0.83) and a specificity of 0.94 (95% CI, 0.93 to 0.96) using the manufacturer recommended cut-off of 20 μg Hb/g feces. The OC-Light test, by the same manufacturer but with a different methodology, also performed similarly in four studies. Findings from comparative effectiveness studies in which Hemoccult II was compared with various FIT assays are consistent with this thinking as test positivity and CRC detection with FIT were consistently higher than Hemoccult II. It is possible that the sensitivity of FIT to detect CRC is lower in subsequent rounds of screening, but this is based on a small number of studies with methodologically limited study design and smaller numbers of cancers in subsequent rounds. Although sensitivity and specificity of a screening tests should not theoretically vary with disease prevalence, the variation in test accuracy may be due to a change in disease spectrum (e.g., stage of cancer) which is happening alongside a change in prevalence.408
Cologuard (sDNA-FIT) has greater sensitivity but lower specificity than OC-Sensor when applying manufacturer-recommended cutoff of 20 μg Hb/g feces. Based on four studies, the sensitivity to detect CRC was 0.93 (95% CI, 0.87 to 1.0), and the specificity was 0.85 (95% CI, 0.84 to 0.86). Lowering the threshold of FITs also maximizes sensitivity with a tradeoff in specificity. For example, when a cutoff of 10 or 15 μg Hb/g feces was applied, OC-Sensor had a similar sensitivity and specificity to detect CRC as Cologuard. Our findings are consistent with a 2019 systematic review94 of the test accuracy of FITs. Decision models help in determining optimal sensitivity and specificity of stool (or other non-invasive screening tests) in a program of screening for CRC, and to understand the trade-offs of optimizing sensitivity. In addition, the value of current sDNA-FIT testing in practice remains uncertain when compared with FITs using lowered cutoffs to maximize sensitivity, because of the higher rate of unsatisfactory samples and 10-fold higher cost of the sDNA-FIT compared with FITs.
Harms of Stool Testing
There are no hypothesized serious adverse events resulting from noninvasive stool testing other than the risk of missed cancers (false negatives). However, serious adverse events may result from followup colonoscopy for abnormal stool testing. Serious harms of colonoscopy following abnormal stool testing are estimated at 17.5 serious bleeds (95%, CI 7.6 to 27.5) and 5.7 perforations (95% CI 2.8, 8.7) per 10,000 colonoscopies.
Contextual Issues
Adherence
Overall adherence to CRC screening in the United States has increased but remains suboptimal, and has consistently lagged behind recommended screenings for other cancers.73 Adherence to a single round of serum testing appears to be highest, followed by stool testing (FIT greater than gFOBT), and lowest for a single CTC or colonoscopy, although estimates of adherence to screening vary widely across studies, setting, and populations.75, 82, 409,359 While adherence to a single stool test is greater than a single colonoscopy, it requires annual or biennial testing, adherence to repeated stool-based screening varies widely between studies, although generally declines over multiple rounds of screening, and screening is highest in people who have already completed one initial screening test.410–424 Additionally, completion of colonoscopy following abnormal stool-based screening tests ranges widely, from as low as 50 to up to almost 90 percent in the United States, with variation by health care setting and type of stool test.414, 425–431 Last, adherence is variable by age, sex, and race/ethnicity; however, much of this variation is explained by health insurance generosity and access to preventive care.76, 411, 432–436 The evidence on adherence to initial CRC screening, repeated screening, and colonoscopy following abnormal stool testing is detailed in Appendix G.
Differential adherence to screening tests influences the benefits and harms of screening program and may influence the selection of a preferred strategy. To illustrate the impact of adherence on screening, one microsimulation modeling analysis compared the benefits and life years gained (LYG) assuming 100 percent adherence versus reported adherence to initial screening.437 This analysis evaluated strategies recommended by the USPSTF in 2016 (i.e., flexible sigmoidoscopy every 5 years, colonoscopy every 10 years, annual FIT, annual HSgFOBT, sDNA-FIT every 3 years, CTC every 5 years) and serum testing for mSEPT9 every 1, 2, or 3 years, starting at age 50 years and ending at age 75 years. The analysis assumed a 35 percent adherence to flexible sigmoidoscopy, 38 percent to colonoscopy 42.6 percent to FIT and sDNA-FIT, 33.4 percent to HSgFOBT, 22 percent to CTC, and 85 percent to serum testing. Estimates were derived from the literature, with the exception of the estimate for sDNA-FIT which was assumed to be the same as FIT. This analysis also assumed a 76.2 percent adherence to colonoscopy following an abnormal screening test, but 100 percent adherence to subsequent surveillance colonoscopies. The model was then calibrated to the National Health Interview Survey data that suggests 62.4 percent of individuals are up to date for CRC screening. While this analysis had some limitations, it demonstrated that when reported adherence was taken into account, serum testing averted 23 deaths per 1000 individual screened compared to 20 deaths averted using colonoscopy, and 11 to 16 deaths averted for using flexible sigmoidoscopy, CTC or stool-based testing. This modeling study concluded that adherence rates above 65 to 70 percent would be required for any stool- or serum-based screening tests to match the benefits of colonoscopy with 38 percent adherence.
Tailored Screening
In addition to considering the age to start and stop screening, some current CRC screening recommendations are tailored by race/ethnicity, family history, and multivariable risk assessment (Table 3). No screening recommendations are tailored by sex or gender, although sex is included in multivariable risk assessment.
Age
Because of the higher incidence of CRC in adults under age 50 years over time, in 2018 the ACS issued a qualified recommendation to start screening at age 45 years. Earlier age to initiate screening is primarily based on the epidemiology of disease and modeling studies accounting for the incidence of CRC by age. To date, we have little to no empiric evidence evaluating potential differences in the effectiveness of screening, test performance of screening tests, and the harms of screening in younger age groups (i.e., <50 years vs. older than 50 years). While a few studies of effectiveness (KQ1) recruited adults less than 50 years, none of these studies report stratified analyses by younger age subgroups. Any age differences in older gFOBT and FS screening trials were not statistically significant. Any differences in the effectiveness of screening in younger ages would be attributable to varying the underlying risk/incidence of CRC and/or natural history of disease, as well as differences in test accuracy by age. Limited studies demonstrate no difference in test performance (KQ2) of stool testing or harms of colonoscopy in people younger than 50 years. Although we do not hypothesize that colonoscopy or CTC are more harmful in younger adults than older adults, starting screening at younger ages will accrue more procedural harms and ECF, which should be weighed against any incremental benefit of earlier start to screening.
It is yet unclear whether the spectrum of sporadic CRC in younger adults mimics that seen in a traditionally screened age group, as there is evidence to suggest that a large proportion of the increase in CRC in those under age 50 is rectal versus colon cancer, and those with earlier onset CRC tend to have distinctive clinical features, have a more advanced stage at diagnosis, and poorer overall survival rates, which may be due to a difference in screen- versus symptom-detected disease and/or a more aggressive natural history.438
Current recommendations also differ on the age to stop screening; they range from ages 74 to 85 years. Few studies include older adults age 75 years and older to conduct robust subgroup analyses for the effectiveness, test accuracy and harms of screening. Limited empiric evidence suggests that screening colonoscopy may not result in the same benefit in reduction of CRC incidence in adults ages 75 to 79 years compared with those ages 70 to 74 years.128 In addition, limited evidence suggests that CTC has lower sensitivity in older adults350 and the specificity of sDNA-FIT decreases with advancing age351 (higher false positive screening). And more robust evidence consistently demonstrates increasing serious harms from colonoscopy (as well as ECF on CTC exams) with advancing age.
Race/Ethnicity
Due to the higher incidence of CRC in Black people compared with white people (and other races/ethnicities), the USMSTF in 2017 recommended screening African Americans at age 45 years, and others at age 50 years. To date, we have little to no empiric evidence evaluating potential differences in the effectiveness of screening, test performance of screening tests, and the harms of screening by race/ethnicity (i.e., Black versus white). While effectiveness studies (KQ1) include nonwhite adults, none report stratified analyses by racial/ethnic subgroups. Again, any differences in the effectiveness of screening would be attributable to varying underlying risk/incidence of CRC and/or natural history of disease. We do not hypothesize that there are any differences in test performance or harms of screening tests by race/ethnicity; and as expected there are limited studies demonstrate no difference or inconsistent findings in test performance (KQ2) of stool testing or harms of colonoscopy by race/ethnicity.
Most of the evidence to date suggests that differences in risk of CRC and CRC mortality between Black and white adults is primarily driven by differences in receipt of good quality screening and subsequent care of cancer rather than inherent biological differences.439 Furthermore, race is a social construct reflecting much more than heritable disease risk, and therefore confounded by may other factors increasing risk for developing cancer and progression of cancer, not limited to behavioral and environmental risk factors.440 However, there remains uncertainty on the differences in carcinogenic mechanisms contributing to observed disparities in outcomes between Black and white adults.441 While there is some evidence for a difference in the distribution of adenomas in the proximal versus distal colon, and in tumor markers in Black versus white people, the clinical significance of this difference on CRC incidence and mortality is unclear.
Sex
Although no recommendations tailor screening by sex, there is evidence to suggest differences in the effectiveness of screening, test performance of screening tests, and harms of screening in men versus women. Screening FS and selected gFOBT trials suggest a greater benefit in CRC mortality reduction in men than women. These results may be explained by the differences in sex-specific CRC incidence and mortality, as well as differences in the distribution of CRC in the colon (i.e., distal versus proximal) between men and women.442, 329 Results were somewhat inconsistent for the FIT test accuracy, with some evidence to suggest that sensitivity may be higher (with lower specificity) to detect CRC in men compared with women. A 2019 systematic review evaluating the effect of sex (and age and positive threshold) on FIT test accuracy found that the 95% CI intervals overlapped between men and women.443 Likewise, results were inconsistent for serious harms from colonoscopy, with some, albeit limited, evidence to suggest slightly higher rates of complications in men compared with women from screening colonoscopy.
Family History
Family history of CRC represents an approximation of genetic risk and is typically characterized in terms of the number of affected relatives, the degree of relatedness, and their age at CRC diagnosis. Individuals at the highest risk are those from families with known genetic syndromes, multiple affected relatives, and/or relatives with early age cancer diagnosis, particularly before age 50 years. At more moderate risk levels are people with one or more FDR or second degree relative (SDR) with later onset cancer. A systematic review of eight large population-based cohorts found that the prevalence of family history of one FDR with early-onset cancer was approximately 0.3 percent, while the prevalence of a single FDR with history of late-onset (after age 60) CRC was more than 3 percent.68 Because our review focuses on the evidence to support screening in generally average risk adults, our discussion about the evidence for screening focuses on those at “moderate risk” as opposed to those with the highest hereditary risk for whom most U.S. guidelines recommend early and more frequent colonoscopy (i.e., colonoscopy is typically recommended at age 40 or 10 years before the relative’s age at diagnosis and repeated at 5–10 year intervals).63, 444 (Appendix H Table 1) The evidence on initiation of earlier screening in people with moderate familial risk for CRC is summarized below and detailed in Appendix H.
A large body of observational evidence spanning multiple countries and populations suggests that CRC risk increases as intensity of family history of CRC increases (more relatives, closer in relation, younger age at diagnosis), providing a plausible hypothesis for a screening benefit at earlier ages in these groups. Pooled risk estimates for a single FDR with CRC over age 60 are elevated compared to people with no family history (1.83, 95% CI, 1.47–2.25).63 A systematic review of risk for CRC associated with family history found that the risk for CRC increased from 1.8 percent for a 50 year old with no family history to 3.4 percent with at least one affected relative and to 6.9 percent with two or more affected relatives.445 A review of reviews conducted for the Canadian guidelines found similar increased levels of risk across nearly all types of studies and populations.444
There is limited empiric evidence on the effectiveness of screening, test performance of screening tests and harms in people at moderately increased risk of CRC due to family history and no evidence in this group under age 50 years. Although some studies do include people with a family history of CRC, most do not report results stratified by familial risk. One included observational colonoscopy study in health professionals found that in people with a FDR family history of CRC, the association with CRC mortality was no longer statistically significant after 5 years (multivariate HR 0.91; 95% CI, 0.55 to 1.52) compared with a sustained association beyond 5 years in people without a family history (multivariate HR 0.43; 95% CI, 0.32 to 0.58) (p=0.04 for interaction).21 One excluded population-based case-control study found that previous colonoscopy was associated with decreased CRC risk in people with all levels of family history. Regardless of family history status, colonoscopy was associated with a lower CRC risk (OR 0.25 [95% CI, 0.22 to 0.28] for people without family history and OR 0.45 [95% CI, 0.36 to 0.56] for people with family history).446 Neither of these studies report results for adults under age 50 years. No included studies reported variation of test accuracy or harms by family history.
Multivariable Risk Assessment
Although the concept of individualizing CRC screening recommendations has become more compelling as we have learned more about modifiable and non-modifiable risk factors, multivariate risk assessment for CRC risk is not commonly used in clinical practice70, 447 and currently there is no commonly used/accepted risk assessment tool to help tailor CRC screening.69 In 2019, one international guideline panel, as part of the BMJ Rapid Recommendations series, issued a weak recommendation against screening in asymptomatic adults ages 50 to 79 years with an estimated 15-year CRC risk below 3 percent using a validated multivariate risk assessment tool (QCancer) which includes a number of variables in addition to age, sex, race/ethnicity, and family history.80 In theory, multivariate risk assessment could also identify persons at higher risk for CRC and in whom to initiate screening earlier than age 50 years.
While many risk models or scores have been developed to predict the risk of CRC and/or advanced neoplasia, there are no trials evaluating the benefits and harms of implementing risk assessment to guide CRC screening. Two recent systematic reviews summarize the performance (mainly discrimination) of risk prediction models for CRC and/or advanced neoplasia in asymptomatic general risk adults.447, 448 A 2016 systematic review identified 52 models described in 40 studies for assessing risk of CRC or advanced neoplasia in average-risk populations; in aggregate these 52 models considered 87 different risk factors obtained through medical records, self-reported questionnaires, and laboratory testing inclusive of genetic biomarkers.447 Commonly included factors were age, sex, family history (generally specified as FDR), BMI, and lifestyle factors (e.g., smoking, alcohol, diet, exercise). Overall, the discrimination of the models ranged from an area under the curve (AUC) of 0.65 to 0.70. The authors found that, in general, models including lifestyle behaviors (obtained by questionnaire) and genetic biomarkers did not have better discrimination than models with risk factors that could be routinely obtained through medical records (i.e., age, sex, family history, smoking, +/− alcohol). In external validation studies, 10 of these models showed acceptable discrimination, AUC 0.71 to 0.78. These include two models containing only three variables (age, sex, and BMI or family history).447 A 2018 review focused on multivariate risk tools for advanced neoplasia only and identified 17 original risk scores described in 22 unique studies.448 Findings from this review were consistent with the 2016 review in the commonly included factors and discrimination (AUC) of the risk tools. This review also demonstrated a substantial variation in discrimination even for the same risk score across different studies. The review conducted meta-analyses of discrimination for each risk score evaluated in more than one study and found that the most evaluated risk scores (4 or more studies) had less optimal discrimination (AUC 0.61 [95% CI, 0.59 to 0.64] to 0.64 [95% CI, 0.60 to 0.68]. The risk tool with the highest discrimination (AUC 0.70 [95% CI, 0.61 to 0.79]) was only evaluated in two studies.
Two publications externally validated a series of risk models identified in the 2016 review in large population-based cohorts in the United Kingdom and Europe.449, 450 One study externally validated 14 different risk models to predict CRC in a large (n=373,112) population-based cohort in the UK (UK Biobank).449 Another study externally validated 16 different risk models for CRC in two large population-based cohorts, the European Prospective Investigation into Cancer and Nutrition (EPIC) (n=491,992) and United Kingdom Biobank n=475,629).450 These two studies externally validated overlapping risk models. Overall these two studies found that the performance of published risk models for CRC varied widely. Both studies concluded that there are several models (including QCancer) with easily identifiable risk factors that possess good calibration and discrimination, and thus are promising for implementation. Both studies call for modeling plus or minus clinical impact studies to further evaluate their promise for clinical practice.
Only four studies examined risk prediction for advanced colorectal neoplasia specifically in adults younger than age 50 years.451–454 These studies were development and initial validation studies in large generally asymptomatic populations in Korea. The models demonstrated that a combination of risk factors similar to those in other models (e.g., age, sex, BMI, family history, smoking, laboratory tests) can identify people at higher risk for advanced neoplasia (AUC from 0.66 to 0.72). These models do not appear to be externally validated. In general, these studies included populations with lower average BMI (when reported) than U.S. populations, and given the 10-fold difference in CRC incidence internationally, there is a need to validate in broader populations applicable to U.S. populations.
Limitations of the Review
Our review focused on the benefit of CRC screening on mortality, the test accuracy of generally available CRC screening tests, and the potential serious harms of these screening tests in average-risk adults. We therefore excluded studies in symptomatic people and people with the highest hereditary risk; this exclusion criteria resulted in very scant evidence for certain technologies such as capsule endoscopy and newer serum- and urine-based testing. We also narrowly included trials or prospective cohort studies designed to evaluating the impact of screening on CRC incidence or mortality. We acknowledge that excluded well-designed nested case-control studies may be at lower risk of bias than included prospective cohort studies (e.g., more accurately capture screening history, exam indication). While our review addressed some important contextual issues related to screening (e.g., adherence to testing, risk assessment to tailor screening, test acceptability and availability), we did not include an assessment of the mechanism of benefit of the different screening tests (primary prevention vs. early detection), methods to increase screening adherence, prevalence of interval cancers between screenings, potential harms of overdetection of adenomas or unnecessary polypectomy, technological enhancements to improve the diagnostic accuracy of colonoscopy, and surveillance after screening. Our review was commissioned along with microsimulation decision models from CISNET, which address the comparative effectiveness and tradeoffs of screening strategies that vary in ages to start and stop, interval of screening, and screening modality; therefore, we do not include modeling studies in our review. At the request of the USPSTF, we did however summarize recent decision modeling analyses, other than those conducted by CISNET, that addressed the comparative effectiveness and tradeoffs of various screening strategies, with a particular attention to those analyses evaluating age to start screening (Appendix F). Given our U.S. centric focus, we limited our review to evidence conducted in countries with the highest applicability to U.S. practice and given resource limitations, only articles published in English were considered for inclusion.
Emerging Issues and Future Research Needs
Screening for CRC is a complex and active area of research. Unlike other routinely recommended and conducted cancer screening, there are multiple viable options for CRC screening, with: 1) varying levels of evidence to support their use, 2) intended aim to detect cancers, potential precursor lesions, or both, 3) test acceptability and adherence, 4) intervals of time to repeat screening, 5) need for followup testing (including surveillance incurred), 6) associated serious harms, 7) availability in practice, 8) associated cost, and 9) advocacy for their use. The best-quality evidence, in terms of robust study design and reduction in mortality, is limited to FS and Hemoccult II, modalities that are no longer routinely used for screening in the United States. Rigorous test accuracy studies for technologies that identify a similar spectrum of disease as endoscopy and stool testing for occult blood evaluated in trials are likely sufficient to adopt newer tests without new screening trials. Ongoing comparative RCT may also fill this evidence gap for currently used tests (Appendix I), and, assuming tests detect a similar spectrum of disease, modeling studies can provide valuable insight into the comparative net benefit of tests especially with (rapid) technological advancements that may improve test accuracy and/or reduce harms. Decision modeling can synthesize available data to inform the effectiveness of a wider range of testing modalities than possible in practice, including evaluation of newer tests, different test intervals, and populations with differing risk for CRC. Evidence to address gaps in our understanding of the clinical importance of smaller lesions (<10 mm), the role of sessile serrated lesions in both the natural history of disease and the performance of screening tests to detect these lesions, variation in the disease process across the large intestine (rectum, distal and proximal colon), and any variation in the natural history of disease by age, sex, race/ethnicity and family history, as well as any variation in test accuracy by age, sex, race/ethnicity and family history will inform current decision models. In addition, evidence to address gaps in understand around test accuracy and adherence to screening over sequential rounds of screening are also important to inform current decision models.
Much-needed future research should include trials or well-designed cohort studies in average-risk populations to evaluate the effects of programs of screening using colonoscopy, the best-performing FITs, CTC, and new serum- and urine-based tests on cancer mortality and incidence. Studies including adequate sampling of adults ages 40 to 49 years, people with moderate family history risk, and different race/ethnicities to allow for robust subgroup analyses, and/or employing multivariate risk assessment to guide screening would also be important in understanding how best to implement screening. In addition, studies to confirm the screening test performance of promising FITs with thus-far limited reproducibility (i.e., only one study) would be helpful to offer other FIT options to OC-Sensor and OC-Light. Likewise, test accuracy studies adequately powered for cancer detection to establish and/or confirm the screening test performance of promising serum- and urine-based tests (e.g., high sensitivity to detect CRC and/or advanced adenomas) are needed to bolster a menu of options for screening that may have greater acceptability and feasibility (and therefore adherence). In particular, promising serum tests are Epi proColon which has a single adequately powered test accuracy study with sensitivity at or below, and specificity much below commonly studied FITs, and a novel serum test for circulating tumor DNA (LUNAR-2) that has a large prospective cohort study (ECLIPSE) in progress.455 Serum-based tests to screen for colon cancer is an active field of study with other tests at various stages of development and testing.456–458 The metabolomic urine test, PolypDx has a single small study establishing its ability to detect advanced adenomas on par with Epi proColon but thus far no data on test accuracy to detect cancer. In general test accuracy studies to clarify any differential in detection of proximal versus distal test accuracy, and the detection of precursor lesions with more potential for malignant transformation (e.g., serrated sessile lesions) would also be informative. It is also important to understand the contribution of technological advancements to existing technology (e.g., enhancements to optical colonoscopy or CTC) on test performance in average-risk adults as well as on reducing harms (e.g., decreasing radiation exposure, less aggressive bowel preparation). Last, the clinical impact of the identifying extracolonic findings remains unknown. More complete and consistent reporting of the downstream benefits and harms of the initial detection (subsequent workup and definitive treatment) of C-RADS E3 and E4 findings need to be published in observational studies or trials with longer-term followup.
Conclusion
CRC screening continues to be a necessary and active field of research. Since the 2016 USPSTF recommendation, we have gained a greater appreciation of the increasing CRC incidence in adults under age 50 years and we have more evidence on effectiveness and test accuracy of newer stool tests (FIT and sDNA-FIT), and the test accuracy of an FDA approved serum test (Epi proColon) for use in persons declining colonoscopy, FS, gFOBT, or FIT. We have also identified a new metabolomic urine test (PolypDx) with limited test accuracy data, thus far limited to detection of adenomas. We also have more data on colonoscopy harms demonstrating higher estimates of major bleeding than previously appreciated in 2016.
Current screening modalities, including colonoscopy, FS, CTC, various high-sensitivity stool-based tests, and a serum-based test, have different levels of evidence to support their use, different test performance to detect cancer and precursor lesions, and different risks of harms. At this time, comparative studies of the various screening tests cannot answer questions of the relative benefit and harms (tradeoffs) between the tests. The use of accompanying decision analyses will help inform the comparative benefits and harms of the screening strategies. Recommendations regarding which screening tests to use, or whether there is a hierarchy of preferred screening tests, will depend on the decisionmaker’s criteria for sufficiency of evidence and weighing the net benefit. Actual implementation of recommendations will depend on a number of additional factors, including patient preference and available resources.
- Discussion - Screening for Colorectal Cancer: An Evidence Update for the U.S. Pr...Discussion - Screening for Colorectal Cancer: An Evidence Update for the U.S. Preventive Services Task Force
Your browsing activity is empty.
Activity recording is turned off.
See more...