U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Viswanathan M, Reddy S, Berkman N, et al. Screening to Prevent Osteoporotic Fractures: An Evidence Review for the U.S. Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2018 Jun. (Evidence Synthesis, No. 162.)

Cover of Screening to Prevent Osteoporotic Fractures: An Evidence Review for the U.S. Preventive Services Task Force

Screening to Prevent Osteoporotic Fractures: An Evidence Review for the U.S. Preventive Services Task Force [Internet].

Show details


Key Questions and Analytic Framework

The investigators, U.S. Preventive Services Task Force (USPSTF) members, and Agency for Healthcare Research and Quality (AHRQ) Medical Officers developed the scope, key questions (KQs), and analytic framework (Figure 1) that guided the literature search and review. The KQs are as follows.

1 This figure is an analytic framework depicting the key questions (KQs) within the context of the populations, interventions, comparisons, outcomes, time frames, and settings (PICOTS) relative to the effectiveness and harms of screening and treatment for osteoporosis. The figure illustrates the relationship between osteoporosis for asymptomatic adults age 40 years and older without known reasons for secondary osteoporosis. KQ1 concerns the relationship between screening and risk assessment and reduced fracture-related morbidity and mortality. The population is shown as being at an increased risk or not at an increased risk (KQ2a and KQ2b respectively). If the population is not at increased risk, they are shown as circling back to the beginning of the pathway (KQ2c). KQ2d examines bone measurement testing in screening. For those at an increased risk, the harms of screening are examined (KQ3). The pathway indicates that populations at an increased risk are determined to have normal bone mass or osteoporosis or low bone mass. If the population has osteoporosis or low bone mass, they continue to treatment (KQ4). A pathway from osteoporosis or low bone mass following treatment leads to reduced fractures and harms of treatment are also examined (KQ5). Finally, the outcome of reduced fracture-related morbidity and mortality is depicted.

Figure 1

Analytic Framework. Abbreviations: KQ=key question

Key Questions


Does screening (clinical risk assessment, bone density measurement, or both) for osteoporotic fracture risk reduce fractures and fracture-related morbidity and mortality in adults?


What is the accuracy and reliability of screening approaches to identify adults who are at increased risk for osteoporotic fracture?


What is the evidence to determine screening intervals and how do these vary by baseline fracture risk?


What are the harms of screening for osteoporotic fracture risk?


What is the effectiveness of pharmacotherapy for the reduction of fractures and related morbidity and mortality?


How does the effectiveness of pharmacotherapy for the reduction of fractures and related morbidity and mortality vary by subgroup, specifically in postmenopausal women, premenopausal women, men, younger age groups (age <65 years), older age groups (age ≥65 years), baseline bone mineral density, and baseline fracture risk?


What are the harms associated with pharmacotherapy?

We include two contextual questions to help inform the report. We do not show these questions in the analytic framework because they were not analyzed using the same rigorous systematic review methodology as the studies that met the report’s inclusion criteria. At the title and abstract and full-text article review stages, reviewers categorized studies not included to answer KQs that related to the specific contextual questions.

Contextual Questions

  1. What is the evidence from modeling studies about different fracture risk thresholds for identifying patients for further evaluation or treatment?
  2. What is the evidence from modeling studies about the effectiveness of screening strategies (screening, risk assessment, or bone measurement) that use (a) different ages at which to start and stop screening and (b) different screening intervals?

Contextual Question 1 is addressed in the introduction. Contextual Question 2 is addressed in the Results section (for screening intervals, along with other included evidence on screening intervals) and in the discussion section (for starting and stopping ages).

Search Strategies

We searched PubMed, the Cochrane Library, and Embase for English-language articles published from November 1, 2009, through October 1, 2016, with active surveillance through March 23, 2018. We used Medical Subject Headings as search terms when available and keywords when appropriate, focusing on terms to describe relevant populations, screening tests, interventions, outcomes, and study designs. Appendix A describes the complete search strategies. We conducted targeted searches for unpublished literature by searching ClinicalTrials.gov, Drugs@FDA.gov, Cochrane Clinical Trials Registry, and the World Health Organization International Clinical Trials Registry Platform. To supplement electronic searches, we reviewed the reference lists of pertinent review articles and studies meeting our inclusion criteria and added all previously unidentified relevant articles. We included citations from the previous report and from other systematic reviews in our handsearch yield.

Study Selection

Newly Identified Studies

We selected studies on the basis of inclusion and exclusion criteria developed for each KQ for identifying populations, interventions, comparators, outcomes, timing, settings, and study designs (PICOTS) (Appendix B). Appendix C lists studies excluded at the full-stage review stage. We imported all citations identified through searches and other sources into EndNote X7.

Two investigators independently reviewed titles and abstracts. We dually and independently reviewed the full text of abstracts marked for potential inclusion by either reviewer. Two experienced team members then resolved disagreements.


We included studies that focused on adults age 40 years or older. For screening questions (KQs 13), we required studies to have included a majority of participants without history of low trauma fractures, endocrine disorders likely to be related to metabolic bone disease, or chronic use of glucocorticoid medications. If information on the proportion of low trauma fractures was unavailable in the report, we sent an inquiry to the author. In cases of nonresponse, we planned to include these studies and noted lack of information on prevalent fracture rates. For treatment questions (KQs 45), we also required that a majority of included participants had an increased fracture risk (as defined by the study [typically bone mineral density (BMD) status).


For screening questions (KQs 13), we searched for studies on risk assessment tools, bone measurement testing, or a combination of risk assessment and bone measurement testing. Eligible risk assessment tools included any paper-based or electronic instrument that compiled and compared various demographic or clinical characteristics for individuals to establish an absolute or categorical risk estimate. Eligible bone measurement testing included dual-energy X-ray absorptiometry (DXA, central or peripherally measured), quantitative ultrasound, dental tests, vertebral fracture assessment, and trabecular bone score (Appendix B). All tests and instruments needed to be feasible for primary care settings (i.e., could be ordered, administered, or interpreted by primary care providers) and be available in the United States; we excluded tests and instruments that were not commercially available. We required instruments to have been externally validated. For tests and instruments that included bone measurement testing (imaging and nonimaging machine-based tests), we required that the investigators measure bone mineral density in participants before the occurrence or identification of the fracture.

For treatment questions (KQs 45), we limited eligible interventions to pharmacotherapy approved by the U.S. Food and Drug Administration (FDA) for treating or preventing osteoporosis. These include (a) antiresorptive therapies, specifically bisphosphonates, estrogen agonists/antagonists, hormone therapy, and Receptor Activator of Nuclear Factor kappa-B ligand (RANKL) inhibitors and (b) anabolic therapies, specifically, parathyroid hormone. We did not summarize the evidence on calcitonin because it is no longer a first-line therapy for osteoporosis.


For the overarching question on the benefits and harms of screening and health outcomes (KQ 1 and KQ 3), we included studies that compared screened with unscreened groups. For questions on screening accuracy and screening intervals (KQ 2), we included studies that evaluated fracture risk assessments or bone tests. For treatment benefits (KQ 4), we included studies comparing treatment with placebo. For treatment harms (KQ 5), we included studies comparing treatment with placebo or no treatment.


For KQ 1 and KQ 4, we included data on fractures, fracture-related morbidity, fracture-related mortality, or all-cause mortality. Fractures included major osteoporotic fractures defined as fractures of the hip, distal radius, proximal humerus, and vertebrae (clinically presenting). We also included and recorded separately morphometric (asymptomatic) vertebral fractures. For KQ 2, eligible outcomes included test characteristics (e.g., accuracy, reliability) for bone measurement tests and accuracy and reclassification for fracture risk assessment instruments. For KQ 3, we looked for evidence on outcomes such as unnecessary radiation, labeling, anxiety, false-positive results. We focused our systematic review on studies of risk assessment tools and bone measurement tests that predicted future fracture risk as an outcome, rather than identification of osteoporosis defined operationally by BMD. For KQ 5, eligible harms included serious adverse drug events, discontinuation attributed to adverse events, cardiovascular events, hot flashes, esophageal cancer, gastrointestinal events, osteonecrosis of the jaw, atypical fractures of the femur, and rashes.


Outcomes for KQ 1 studies had to be measured 6 months or more following screening. Although we had planned to limit the KQ 4 and KQ 5 studies outcomes to those measured 6 months or more after the initiation of treatment, we also included harms (KQ 5) measured at shorter intervals for completeness of reporting. All timings were considered for KQ 2 and KQ 3 (although studies for fracture prediction, we required that assessments of outcomes occur after fracture risk assessment or machine-based tests).


We required the overarching screening question (KQ 1) to be in primary care settings or other settings similar to primary care. For all other questions, we also included studies in specialist settings. For all KQs, we limited our search to studies conducted in the United States or in countries with very high human development indexes.59

Study Designs

For screening questions (KQs 13), we included randomized controlled trials (RCTs), controlled clinical trials, and systematic reviews of trials. For questions on screening accuracy and screening intervals (KQs 2 and 3), we also included systematic reviews of observational studies and observational studies other than case series and case reports. For treatment questions (KQ 4 and KQ 5), we included systematic reviews, RCTs, and controlled trials published since any recent relevant review. For harms (KQ 5), we also included observational studies published since any recent relevant review.

Studies in the 2010 USPSTF Review

We applied, dually and independently, the inclusion and exclusion criteria described above to all studies included in the 2010 USPSTF review. (Note that the review was published in 2010,2,3 and the recommendation statement in 20111). We resolved disagreements by discussion and consensus; if necessary, we sought adjudication of conflicts from other experienced team members.

We also conducted a check of the quality ratings of studies included in 2010 to ensure that studies met our current quality rating criteria. If the reviewer did not agree with this earlier assessment, we re-rated the quality of the study through dual review. Among included studies from the 2010 report, one reviewer checked for errors in previously generated abstraction tables and updated them as needed.

Data Abstraction and Quality Rating

We abstracted pertinent information from each newly included study; details included methods and patient PICOTS. A second investigator checked all data abstractions for completeness and accuracy. Two investigators independently evaluated the quality (internal validity) of each study, corresponding to USPSTF predefined methods criteria.60 The criteria by which the USPSTF requires individual study quality to be assessed differ by study design, but ultimately each study is to receive a rating corresponding to good, fair, or poor quality. We selected several tools for developing quality ratings, with specific tools corresponding to the design of the study that was being evaluated.

For studies with treatment outcomes (KQs 1, 3, 4, and 5), we rated quality as good, fair, or poor based on a tool developed by the Cochrane Collaboration for assessing the risk of bias of RCTs.61 When relevant, we also applied supplementary items developed by the RTI-University of North Carolina Evidence-based Practice Center for evaluating additional bias concerns relevant to cohort and case control study designs.62

For screening studies (KQ 2) assessing diagnostic test accuracy, we used the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool;63 for diagnostic prediction model studies, we used a preliminary version of the in-development Prediction Model Study Risk of Bias Assessment Tool (PROBAST).64 Based on these two tools, we evaluated each study as low, unclear, or high risk of bias. Low corresponds to good quality, high to poor quality, and unclear identifies studies for which we could not make a determination on the risk of bias.

The quality of existing systematic reviews that we integrated into this review were evaluated using ROBIS,63,65 a tool designed to evaluate the risk of bias of systematic reviews. Using this tool, each systematic review was rated as low, unclear or some concerns, or high risk of bias. As with the PROBAST and QUADAS tools, low risk of bias corresponds to good quality, high to poor quality, and unclear represents uncertainty. Appendix C describes the quality rating criteria for each tool. We did not review the quality of individual studies contained within any good-quality systematic reviews that we included.

We resolved disagreements by discussion and consensus. We rated studies with fatal flaws as poor quality. For RCT and cohort studies included to answer KQ 1, 3, 4, or 5, “fatal flaws” that could result in poor-quality (i.e., high risk of bias) ratings included the following: groups assembled initially were not close to being comparable or were not maintained throughout the study; unreliable or invalid measurement instruments were used or not applied equally among groups (including not masking outcome assessment); and key confounders were given little or no attention. For RCTs, intention-to-treat analysis was lacking. For case-control studies pertaining to KQ 3 or 5, fatal flaws included major selection or verification (diagnostic workup) bias, a response rate less than 50 percent, or inattention to confounding variables. For KQ 2 screening studies, fatal flaws in at least one domain could lead to poor-quality ratings. Such flaws include cross-sectional design for risk prediction (i.e., predictors measured at same time as incident fracture in cases) and spectrum bias resulting from subgroups created through convenience groupings (such as quintiles) that do not represent a clinically rational categorization of participants.

Data Synthesis and Analysis

In Chapter 3 on results, we describe the yield from newly identified included studies and studies identified in the previous review that continue to meet current inclusion and quality criteria. We then present a synthesis of the last update and current findings.

When at least three similar studies were available, we conducted quantitative synthesis of AUCs and event rates in studies with random-effects models using the inverse-variance weighted method (DerSimonian and Laird). For studies presenting multiple doses of medications, we selected the dose closest or equal to the FDA-approved dose, unless otherwise specified. We conducted sensitivity analyses using restricted maximum likelihood estimates to explore whether DerSimonian and Laird random-effects models underestimate variance for small meta-analyses.66

For all quantitative syntheses, we calculated the chi-squared statistic and the I2 statistic (the proportion of variation in study estimates due to heterogeneity) to assess statistical heterogeneity in effects between studies.67,68 An I2 from 0 to 40 percent might not be important, 30 percent to 60 percent may represent moderate heterogeneity, 50 percent to 90 percent may represent substantial heterogeneity, and 75 percent to 100 percent represents considerable heterogeneity.61 The importance of the observed value of I2 depends on the magnitude and direction of effects and on the strength of evidence for heterogeneity (e.g., p-value from the chi-squared test or a confidence interval for I2). However, as precision and the number of subjects increase, I2 may become inflated toward 100 percent, and may not reflect clinically relevant heterogeneity.69 All quantitative analyses were conducted using OpenMetaAnalyst.70 We additionally conducted sensitivity analyses using Comprehensive Meta Analysis.71

We interpret AUCs close to 0.50 as being no better than chance; AUCs of 1.0 represent perfect test accuracy.

The discussion chapter summarizes conclusions from the previous 2010 review, the 2011 USPSTF statement, and the implications of the new synthesis for previous conclusions. In addition, we assess the overall summary of the body of evidence for each KQ using methods developed by the USPSTF, based on the number, quality, and size of studies; consistency of results among studies (similar magnitude and direction of effect); and applicability of the results to the population of interest.

Expert Review and Public Comment

A draft report was reviewed by content experts, representatives of federal partners, USPSTF members, and AHRQ Medical Officers, and was revised based on comments, as appropriate, to include suggested citations that met our inclusion criteria. Additionally, we updated the report to add details on a newly published trial of screening72 and summarized the accuracy of clinical risk assessment instruments on identifying osteoporosis in younger women.

USPSTF Involvement

This review was funded by AHRQ. Staff of AHRQ and members of the USPSTF participated in developing the scope of the work and reviewed draft manuscripts, but the authors are solely responsible for the content.


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (5.7M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...