This chapter discusses best practices and recommendations for developing, validating, and using decision-analytic models in general, as well as in the context of systematic reviews to inform decisionmaking of stakeholders such as the United States Preventive Services Task Force (USPSTF). We took a multipronged approach to gather information on best practices and recommendations for the development decision models and their use in practice. First, we sought to identify existing recommendations for best practices in decision and simulation modeling by conducting a literature search to document the current best practice recommendations and identify gaps in the literature. To complement our literature search, we conducted a focus group of expert modelers to discuss, characterize, and qualify best practices in decision and simulation modeling in general. Included were issues such as model formulation and characterization, model development and construction, handling and presentation of modeling assumptions, definition and presentation of parameters, outcomes to incorporate into the model, model analysis, model testing, validation and implementation, presentation and communication of results, and perceived gaps in the literature. Lastly, we created a profile of potential best practices in coordinating a simultaneous systematic review and modeling exercise based on responses from interviewees selected because of their involvement in recent decision analyses used inform the recommendations of the USPSTF. We interviewed modelers involved in those analyses, as well as USPSTF members, about lessons learned from conducting decision and simulation models alongside systematic reviews.
Summary of Literature on Best Practices
Systematic Review Methods
A search was conducted in Medline from database inception to March 2010 to locate best practice recommendations for economic analyses and decision analyses. We relied primarily on key word searches, such as “decision analytic model” or “Markov model,” since MeSH terms are not well designed to facilitate indexing of such literature. We also used the search strategy employed by Philips et al.40 to update the search they completed for their review of good practice guidelines from 2005 to March 2010. Specific search strings are provided in Appendix 7. Articles were not limited by country of origin, but they were limited to the English language. The trial search, which does not include a language limitation, did not yield any relevant article titles in a language other than English. To complement the review, we searched the grey literature for published guidelines from professional societies, governmental bodies, and other health-related organizations using Google search engines.
Articles were initially screened by one reviewer scanning titles and abstracts. Papers were included if the paper provided either (1) general guidance on key elements that constitute a good decision or simulation model or (2) explicit criteria against which the quality or validity of a decision or simulation model might be assessed. While the goal was to identify articles that discussed best practices as applicable to decision analytic and simulation modeling in general, we also examined modeling practices for specific disease conditions. For the latter, papers were included if the paper provided either (1) insight on modeling that could be applicable to other conditions or (2) a comprehensive and critical review for specific clinical domains. Full articles were pulled for selected papers and examined by three reviewers each for inclusion. Disagreements regarding inclusion status were resolved through consensus. Articles not satisfying any of the inclusion criteria were excluded.
Systematic Review Results
Figure 2 depicts the article flow chart of all searches. The initial search produced 616 articles. A total of 42 articles underwent full review, of which 39 were retained for the final set.
The final set of articles, listed in Table 20, was classified into five different categories:
- Articles that propose and/or discuss good modeling practices (N=7).
- Articles that discuss the roles, uses and/or value of modeling in general (N=4).
- Articles that focus on a specific aspect of modeling such as uncertainty or validity (N=20).
- Articles that propose comprehensive guidelines for modeling in a specific clinical area such as coronary care, screening (N=3).
- Articles that review and compare models in specific clinical areas or comparative modeling (N=5).
Discussion of Selected Best Practices Articles
The articles shown in Table 20 provide insight into several key issues pertaining to the establishment of best practice guidelines; these include: (1) model definition, (2) purpose of a model and its appropriate use, (3) model evaluation, and (4) challenges in using models. These core concepts can be integrated into a set of recommendations and guidelines for the use of modeling alongside systematic reviews, and to inform key stakeholders, such as the USPSTF, regarding the employment of models in their studies and recommendations.
Weinstein et al. define a model as an “analytic methodology that accounts for events over time and across populations based on data drawn from primary and/or secondary sources” (p.350).27 This definition is further developed by Weinstein et al. as “a logical mathematical framework that permits the integration of facts and values and that links these data to outcomes that are of interest to health-care decisionmakers” (p. 9).54 While similar, there are some differences to note between these two definitions. One difference is the specification of an “analytic framework”27 versus a “logical mathematical framework.”54 The later specification of the use of mathematics versus stating a model as a framework suggests the later definition may be more restricted in its scope. Moreover, Weinstein et al. preface the definition with the requirement that a model “synthesize[s] evidence on health consequences and costs from many different sources.”54 This is an important point to consider in the definition of a model, the synthesis of multiple, disparate data in order to inform or support a decisionmaker. Specifically, the synthesis of multiple data sources makes decision modeling unique from other modeling methodologies, such as statistical modeling.
Model Purpose and Use
Beyond the definition of a model is the discussion of the purpose of a model as well as the appropriate application of a model to a particular situation. Weinstein et al. propose that the purpose of a model is to “structure evidence on clinical and economic outcomes in a form that can help inform decisions about clinical practices and health care resource allocation” (p. 9).54 They go on to suggest that the “value of a model lies not only in the results it generates, but also in its ability to reveal theoretical connections between inputs (i.e., data and assumptions) and outputs in the form of valued consequences and costs” (p. 10).54
Brennan and Akehurst stress the fundamental cultural difference between biomedical researchers and the health technology assessment/health economics communities.55 The latter have a paradigm of cost-effectiveness and the need to support policy decisions while the former have a paradigm of experimental data and hypothesis testing. As a result, as stated by Luce, health economists tend to recognize and accept “the necessity of various types of analytical models to enrich and broaden results from experimental research when it is available and to find substitutes for experimental data when it is not available”.80 Brennan and Akehurst propose that decision-analytic modeling plays a role through five different perspectives: (1) extending results from a single trial, (2) combining multiple sources of evidence to answer policy questions, (3) generalizing results from one context to another, (4) modeling to inform research strategy and design, and (5) modeling and analyzing uncertainties in the knowledge base.55
Models have been shown to be of considerable value to compare test-and-treat strategies in order to make recommendations on testing for a wide variety of diseases, helping to establish the links between the outcome of the test and the patient-relevant outcomes.76 Trikalinos and colleagues explain the characteristics of many comparisons of test-and-treat strategies in which modeling is especially helpful.76 These characteristics include: (1) integration of evidence from disparate sources, (2) evaluation of uncertainties and assumptions, (3) the analysis and evaluation tradeoffs, (4) determining the effect of succession of technologies, and (5) the consideration of hypothetical conditions for diseases with no effective treatment.
The assessment of both definition of a model and the purpose of a model sets the stage for an exploration of the condition or situations in which models are well suited to serve their purpose, namely helping decisionmakers make more informed decisions through the synthesis of information.
With the definition, purpose, and uses of models examined, it is then instructive to ask, “What makes a good model?” or “How does one evaluate a model?” The literature offers some answers to this question. Several baseline conditions are discussed in the literature, which specify the basic requirements for the use of a model; these can be considered a minimum threshold of characteristics. As stated by Weinstein et al., “models should be used only after careful testing to ensure internal accuracy (internal validity), to ensure that their inputs and outputs are consistent with available data (calibration), and to ensure that their conclusions make sense (face validity).”27
Weinstein et al. offer a summary of the key components of model validation, as well as evaluation: (1) transparency; (2) verification; (3) corroboration; (4) face validity; and (5) accreditation.27 Much attention is placed on validity in this evaluation methodology. Strong emphasis is placed on assessing face validity, and using multiple modeling efforts to establish convergent validity. Regarding model validation, they also warn against important elements such as (1) the nature of change in contexts that are not accounted for in models, (2) the rapid pace of technological change, and (3) population and sub-population characteristics that may be subject to change not included in the model. They caution modelers to be aware of changing contexts and applicability of models to other populations, not initially studied.
In a subsequent article on the assessment of decision models, Weinstein et al. enumerates a more detailed set of criteria for model evaluation.54 The structure of models must first be assessed to determine two main points, (1) the degree that the inputs and outputs are relevant to the decisionmaker and (2) whether the model follows the theoretical basis of the disease, especially the causal linkages among variables suggested in the literature. Additional evaluation dimensions include: (1) specific criteria for state-transition/Markov model structure, (2) an inspection of the data, (3) specific attention to the modeling and quantitative methods used, (4) the incorporation and exclusion of particular data, and (5) a robust validation process.
Sculpher et al. propose that criteria for assessing the quality of models be grouped in a framework that consists of three main categories: (1) structure, (2) data, and (3) validation.28 In a subsequent review and consolidation of existing guidelines on the use of decision modeling, Philips et al. adopted similar categories in their proposed good practice guidelines for decision-analytic modeling in health technology assessment: (1) structure, (2) data, and (3) consistency.51
Beyond the “technical” quality of decision models, the 2000 consensus conference on guidelines on economic modeling in health technology assessment proposed additional characteristics of good decision analytic models. These characteristics expand the scope of the technical quality of a model and address the fact that models need to be: (1) useful for informing the decisions at which they are aimed, (2) clear, interpretable and readily communicated, and (3) parsimonious and not unnecessarily complex.
Challenges in Modeling
Modelers face many challenges as they seek to assist decisionmakers and improve the quality of decisionmaking. In the context of medical tests, Trikalinos et al. summarize these challenges and offer examples of situations where these challenges are faced.76 These challenges include: (1) insufficient data on key input quantities (such as prevalence, test performance, effectiveness), (2) the potential non-transferability of performance across studies, (3) the choice of modeling outcomes (e.g., event-free survival, survival, QALYs), (4) the methods for meta-analysis, and (5) challenges in the parameterization and appraisal of complex models.
This list echoes Tavakoli et al., who also emphasize that one of the major difficulties in developing decision models lie in identifying data, specifically: (1) epidemiological data needed on the risk of subsequent outcomes in the natural history of a disease, (2) effectiveness data essential to estimate putative treatment benefits and harms as well as the probabilities of various outcomes given specific decisions over clinical pathways, and (3) health state valuation data necessary to estimate the utilities to be attached to specific outcomes.75 By definition, models are simplified representations of a real problem and therefore are incomplete and inherently suffer limitations. However, they are precisely useful for that reason. They promote transparency by pinpointing the influential constituents of each problem and by providing systematic uncertainty analysis to fully appreciate the impact of parameter estimates. To capitalize on the potential value of models, it is this necessary to clearly identify and communicate their assumptions, challenges and limitations.
Gaps in the Best Practices in Modeling Literature
There is extensive and fairly consistent guidance for model users, although it is vague at times. For example, while it is recognized that model structure is important, the guidelines are not explicit about how one judges this. Because the focus of good modeling practice guidance is on the technical aspects of models, they do not tend to provide guidance for the process of modeling, including the expertise required to conduct a modeling study, the best ways to illustrate and present models and modeling results, and best ways to develop capacity to understand decision models and overcome the black box problem. In addition, much of the modeling guidelines are focused on Markov models and less on other types of models such as dynamic models or discrete event models. Nor is there much guidance provided on the optimal approach to choosing the type of model for a particular problem.
The International Society for Pharmacoeconomics and Outcomes Research Society–Society for Medical Decision Making (ISPOR-SMDM) Joint Modeling Good Research Practices Task Force was recently convened (May 2010) with the goal of providing guidance for: (1) delineating the approach and design of modeling studies and the identification and preparation of required data, (2) selecting a modeling technique, (3) implementing and validating the model, (4) addressing uncertainty around model results, (5) reporting the modeling study results to assure transparency, and (6) using model-based study results to inform decisionmaking. This Task Force will produce several papers, including an overall summary paper, to be submitted for publication in 2011.
Expert Modelers Focus Groups
The goals of conducting a focus group of modeling experts were to elicit, characterize, and precisely qualify best practices in decision and simulation modeling. These include model formulation and characterization, model development and construction, handling and presentation of modeling assumptions, definition and presentation of parameters, outcomes to incorporate into the model, model analysis, model testing, and validation and implementation (including results presentation and communication and perceived gaps in the literature). To complement the systematic review of best modeling practices, we used a focus group methodology to collect more in-depth information on how to characterize and qualify best practices in decision and simulation modeling in general. The focus group, with prior consent of the participants, was recorded, analyzed, and summarized.
The four participants for the focus group were identified by the Principal Investigators, Technical Expert Panel (TEP) and the Task Order Officers (TOOs). The focus group was conducted on May 16-17, 2010 in Atlanta, GA, in conjunction with the 15th annual international meeting of ISPOR. Focus group participants and instructions are provided in Appendix H. Prior to the focus group, participants were provided with a summary of preliminary findings from interviews with Evidence-based Practice Center (EPC) members (Appendix I) and with a selection of three articles on best practices from Table 1 (the articles are listed in Appendix I).
In looking at best practices in decision and simulation modeling along systematic reviews, three aspects of models were identified as the key components that need to be addressed:
- The scientific and technical quality of the model.
- The interaction between the model and the decisionmaker(s) the model is intended to inform.
- The communication of the model and model results to a lay audience, beyond the decisionmaker.
Furthermore, the focus group felt that it is essential to put the discussion of modeling within the proper context of a decisionmaking framework, where the main goal of modeling is to generate an unbiased synthesis of available evidence on the basis of clearly stated assumptions to produce information not otherwise available to support, but not make, decisions made by individual(s) in charge of making complex but well-defined decisions and/or recommendations.
The focus group noted that, in their opinion, when specifically tasked with making a complex decision, most individuals, such as members of the USPSTF, value the availability of a decision analytic framework and welcome decision models to aid their decisionmaking process. The main issues regarding acceptance of models generally come from stakeholders and broader communities that are affected by the decision and may or may not welcome or accept the resulting decision and/or recommendations.
Regarding the technical quality of models, the focus group felt that existing guidelines identified in the review of literature performed in the first study captured all necessary dimensions of quality fairly well. Additional thoughts are summarized below.
Structure Versus Data
Evaluating the model structure, including assumptions made, and assessing the completeness and quality of the input data, as described in the literature on best practices are clearly essential prerequisites for using models alongside systematic reviews. However, it is also important to recognize that models can be very effective in predicting what could happen given input data and model structure. A model that has excellent structure but is impaired by lack of good parameter estimates for key inputs will not have high predictive ability but still has tremendous value in identifying and understanding (through sensitivity analyses) key drivers of outcomes and systematically studying the impact of key parameters. Thereby, at a minimum, informing further studies needed to refine estimates of such key parameters.
When data with different strengths of evidence are used jointly in a model, along the spectrum from expert opinion to strong evidence, it is important for modelers to clearly state and analyze the relative weight that different pieces of data carry in driving the outcomes. Equally important is the knowledge and disclosure of what was not included into the model, what factors or other variables are not taken into account.
In all modeling efforts, at a minimum, there should be a clear display and discussion of: (1) testing performed on the model (both structure and results), (2) assumptions and their impact on the results, (3) data input and parameters and their joint impact on the results, and (4) key drivers of the results. The latter point is important as usually, a handful of key elements drive the results of a model. Those need to be made very explicit and such discussion should make sense to clinicians who are experts in the clinical domain addressed by the model and should be consistent with the underlying theory and natural history of the disease and its progression.
The involvement of clinical experts in the development of the model should be evident, especially as it relates to the natural history of the disease, the formalization of the disease progression, the identification of and rationale for relationships between key variables, and other “a priori structuring” tasks. Ideally, a visual depiction of the underlying disease mechanics would enhance the perception of content validity of the resulting model.
Just as the development of a model should integrate modeling and clinical expertise, the evaluation of a model needs to be conducted by both modeling and clinical experts. A modeler without the proper clinical expertise would naturally focus solely on the technical aspects of the model, but, if unfamiliar with the clinical domain, would not be in a position to judge face validity. Only a clinical expert would be in a position to judge whether the clinically important decision points have been captured and whether the underlying disease theory is appropriately integrated into the structure of the model.
Gaps in the Best Practices Literature
The focus group perceived that an important neglected aspect of best practices in modeling alongside systematic reviews resides at the interface between the model and the decisionmaker the model is intended to inform. Four key issues were discussed: (1) nature of evidence produced by models, (2) nature and extent of involvement of the decisionmaker in the modeling effort, (3) transparency versus trust in the model, and (4) communication and visualization.
Not surprisingly, the consensus among expert modelers is that models do constitute “inferential” or “carefully manufactured” evidence that would not have been otherwise available and need to be incorporated along with other evidence generated through systematic reviews. The nature of the evidence generated may differ and may need to be viewed through different lenses, but it provides information to support decisions that other evidence cannot provide. Furthermore, one could argue that there is an implicit “mental model” that is being applied in reviewing and evaluating evidence in systematic reviews and that, theoretically, such mental model should be made more explicit.
One of the potential problems related to the acceptance, and therefore subsequent use and usefulness of models, is the notion that models are first developed by a technical team which passes on the results to decisionmakers without prior built-in interaction between the technical team and the decisionmakers. Such process may be ineffective and lead to the wrong model being developed, misunderstanding of the model and its results, and low acceptance and use. As stated above, in a framework of decision support, the development of a model should be a multidisciplinary effort that involves clinical expertise, modeling expertise, and the decisionmakers from inception to completion of the modeling project. A modeling report generally has multiple audiences and it is necessary to ensure that the model and its results are carefully explained and understood by the relevant stakeholders. Understanding, acceptance, and use of models would be greatly enhanced with built-in interaction and involvement of the decisionmakers with the modeling team.
The issue of transparency is somewhat of a paradox. Transparency is certainly essential to allow review and evaluation of models by peer expert modelers. However, it is generally not what stakeholders want (i.e., to know every technical detail of the model), even though most say they want transparency. One could argue that when stakeholders say they want transparency, they really mean that they want to “trust” models. While model soundness and evaluation should be left to peer expert modelers, presentation of results is key in building trust and acceptance for stakeholders and users. Learning from, and researching novel methods and applications of, computer visualization in other fields would be very beneficial and lead to compelling ways of visualizing disease progression and the comparative impact of alternative interventions. We need more studies, perhaps performed by behavioral psychologists, to better understand how to present models and associated results so as to build such trust. For most lay people with respect to modeling, and hence the majority of stakeholders and users of the models (at the exception of researchers perhaps), transparency into the intricacies of a model would not help, in fact, they may even detract. Models are used for many purposes—from weather predictions to economic forecasting—with the focus being on the presentation of the model findings and not in the model specifications. Similarly, focusing on the visualization of output of decision and simulation models in health care would be a major step forward in increasing trust and acceptance of models by users and stakeholders.
Another issue with respect to the acceptance of models and model results is that a model might be very good, but users may have trouble interpreting the results of the model. In this case, the user might reject the model itself to avoid dealing with the tradeoffs revealed by the model. In addition, one could postulate that the well-known anchoring and adjustment bias1 may play a role in how users “judge” the results of a model and eventually accept/reject such results. While it would be worthwhile testing such hypotheses, it is clear that the way models and results are communicated to users plays a critical role in user/stakeholder trust and acceptance. The ultimate test of how good a model is resides in its usefulness and actual use. Finding individuals who can clearly and simply explain what a model is and does to lay audiences is necessary to increase acceptance of models and associated results to the general public. The focus group did not have specific recommendations on how to find or train individuals in that regard but did stress the importance of building such expertise.
Two additional elements were discussed by the focus group: the creation of a model registry and the need to incorporate human behaviors in models.
Creation and Management of a Model Registry
Just as ClinicalTrials.gov was created to form an organized registry of federally and privately supported clinical trials conducted in the United States and around the world, a similar registry could be created for models used. Such a registry would provide information about a model's purpose, the modelers, and provide a location where the model could be peer reviewed and possibly used and better disseminated. A number of issues would need to be addressed for such a registry to work, including secured access, intellectual property issues, computer codes, et cetera. It would, however, create tremendous value, increase acceptance, and accelerate dissemination of models.
Incorporating Human Behaviors in Models
A potential area of improvement for models is to capture critical human behavior that can influence outcomes as part of the model itself. For example, few models systematically attempt to incorporate issues such as treatment adherence, patient and provider behaviors, or compliance as part of the modeling of clinical pathways. Again, a model, with clearly stated assumptions, can inform such issues. For example, a model can provide estimates of benefits of a new intervention for a population of patients but only if full compliance is achieved. Comparing the results of such model to actual results, should they be available, might lead to the erroneous conclusion that the model is wrong if that assumption is not explicitly stated and is not subjected to uncertainty analysis. The model could be right and help focus efforts on (1) obtaining better estimates of actual compliance, or, even better, (2) how to increase compliance to reap the benefits of a new intervention.
Interviews of Cancer Modelers and USPSTF Members
The goals of this study were to evaluate the strengths and weaknesses of current approaches to conducting a simultaneous or sequential systematic review and modeling exercise, evaluate stakeholder perceived needs and whether needs were met, and to make recommendations for the process of conducting future similar projects. To that effect, we interviewed relevant members of the Oregon Health & Science University EPC, modeling groups, and USPSTF members to evaluate the lessons learned from the colorectal cancer, breast cancer, and cervical cancer modeling projects that were conducted alongside systematic reviews, and their impact on USPSTF decisionmaking.
We worked with the TEP and the TOOs to select the most appropriate composition of respondents from among the Oregon Health & Science University EPC, the 16 members of the USPSTF, USPSTF partners, and selected cancer modeling groups and consortia. The final sample consisted of the leaders on each of the three cancer modeling projects (cervical cancer, breast cancer, and colorectal cancer), members of each modeling team, and members of the USPSTF who were involved in the development of the models and/or voting on recommendations (the evidence for which included modeling). Interviews, lasting approximately one hour each, were conducted via telephone over the course of April 5, 2010, through May 25, 2010. The interview participants are shown in Table 22. The interviews focused on lessons learned from the three cancer modeling efforts and the subsequent recommendations that were made.
The interview guide for this set of interviews focused on strengths and weaknesses of current approaches, perceived needs, degree to which needs are met, lessons learned from the cancer screening modeling projects, and perceived impact of these projects on USPSTF decisionmaking. The interview guide was then tailored to the different groups (modelers vs. USPSTF members). A general outline for the interviews is provided in Appendix J.
This section synthesizes the lessons learned from the interviews within the framework presented in the preceding section. Four key themes emerged from the interviews:
- Communication and presentation of model results and rationale to stakeholders
- Modeling literacy of stakeholders
- Recommendations for future projects.
Modality refers to the primary design utilized, or that resulted, in each of the three cancer modeling efforts. The communication theme was universally discussed by all interviewees and was perceived as the most critical success factor for future projects. This theme involved issues ranging from written reports and documentation, to discussions with the media, to the visual presentation of results, and the rationale for the employment of models to address the key questions. Modeling literacy concerns stakeholders' dexterity with modeling and their ability to interpret the results and use them in order to make judgments and subsequent recommendations. Finally, all respondents expressed lessons learned and made specific recommendations for future efforts involving modeling alongside systematic reviews. While there was a high degree of consistency among respondents regarding communication and modeling literacy, there were differences among them regarding recommendations for future projects. Selected verbatim quotes from the interviewees are provided by theme in Appendix K.
Two dimensions were repeatedly used to describe each of the three cancer projects. First was whether the modeling effort was “coordinated,” meaning the extent to which the systematic review team and modeling team coordinated their work for the USPSTF. Within the dimension of coordination, a temporal sub-dimension addressed whether the two components, systematic review and modeling, were conducted simultaneously with one another, or sequentially (i.e., the modeling effort following the completion of the systematic review). The second dimension that describes modality is the employment of a single modeling team or the utilization of a modeling syndicate, including multiple modeling teams, working to develop independent models to address the same questions, but using the same systematic review as a source of information.
The issue of modality framed the discussion of “lessons learned” from each project and was mentioned in each interview. In all of the interviews, the interviewees began the discussion with mention of the modality, and the majority described it in terms of coordination, sequence, and the number of modeling teams used.
There is high agreement among the interviewees that future projects should always employ multiple modeling teams developing different models. These suggestions ranged from a low of “at least two” to a high of “three to five,” depending on how many people have been studying the disease,” and the availability of modeling expertise for that condition. The rationale is simple: multiple modeling groups, using the same parameters from a systematic review, will develop models containing different assumptions, transitions, and representations of the natural history of a disease. To the extent that these differing models generate similar results, then the effort has high “convergent validity.” Additionally, this method allows for detailed sensitivity analyses of the input parameters and assumptions that each model utilizes. Such an approach was described as the “foundation of CISNET [the Cancer Intervention and Surveillance Modeling Network]” and the “reason why CISNET is so well respected and the quality of its work is so highly regarded.” This practice was also referred to as “comparative modeling” and was unanimously suggested as a best practice and recommended for future projects.
With regard to coordination between the modeling teams and the systematic review team, there was much agreement among the interviewees. All suggested that future projects employ a coordinated effort between modelers and systematic reviewers. There were several reasons given for this approach. There is benefit in both parties participating in the question refinement process and in defining the scope of evidence that will be reviewed. This ensures that evidence useful for the modeling effort is not neglected or ignored by the systematic review. Coordination allows for the standardization of many important project components, such as definitions and terminology and units of measure. In several cases, modelers mentioned that the lack of coordination isolated the modeling effort and detached the modelers from the key questions, needs, and goals of the USPSTF with regard to the utility of the modeling effort because of the inability to interact in detail with the systematic review team. This was reported as a “dissatisfying experience,” but also one that “questioned the opportunity to improve the quality and robustness of the models.” While there were differing degrees of commitment to a coordinated effort, suggesting a range of solutions from complete integration of the two groups to “several meetings during the systematic review process between modelers and reviewers,” there was no dissenting opinion which suggested the efforts not be coordinated. The only mention of rationale for an uncoordinated effort is the reality that a systematic review may be completed and then used to inform model parameters for a different or subsequent effort. Additional subsequent efforts notwithstanding, future projects should strive to be coordinated efforts between the systematic review team and the modeling team(s). On the other hand, little to no communication is necessary between distinct modeling teams so as to preserve independence and maximize the value of multiple models examining the same questions.
While there was a high degree of agreement with respect to coordination, differences emerged when the temporal nature of coordination was addressed, namely sequential or simultaneous completion of the systematic review and development of the models. Those in favor of a sequential method cited two main reasons, first, that the systematic review needs to be completed so that key questions or assumptions have been established and that all key parameters have been identified. Once this is complete, then the modeling team can integrate the systematic review findings into the modeling effort. Although sequential, this remains coordinated, in that the modeling team is involved in the systematic review, either as formal members of the team, giving guidance to the reviewers as to the evidence needs of the model, or in the form of several “readouts” of information and progress with the reviewers and modelers. In addition, many models already exist for many diseases. These models have certain assumptions, (e.g., natural history of disease), already established, so a new modeling effort might be more focused on updating and adding parameters to already established models, versus development of a new model. In this case, a sequential process may be more efficient, in that the extent of the modeling effort is new parameters, sensitivity analyses, and inspection of results. Several modelers supported a sequential process for this very reason. That said, in the development of a new model, where there is no existing basis to begin, those same modelers supported a simultaneous process during which the modelers would be interacting with the reviewers to develop the underlying model structure and assumptions in conjunction, and then using the systematic review as input parameters for the modeling effort. Additional rationale for a simultaneous process extended the supporting points for overall coordination, including the ability for modelers to impact the nature of the review and key questions, as well as for reviewers to help “identify nuances of the questions material to the model, such as natural history of disease or the identification of sub-populations of interest.”
Communication to Users and Stakeholders
Communication of models and model-based results used as key evidence was cited most frequently as the top issue that needed to be addressed to improve the success and acceptance of these projects in the future. Regarding communication, we focus on the needs for improved communication with and between stakeholders for these projects and the subsequent recommendations that are generated. Although a tangential issue, we address stakeholders' overall “model literacy” in the next section. Communication can be segmented into a few salient issues: (1) USPSTF communication of recommendations whose rationale is based, to some degree, on results from a decision or simulation model versus with “evidence from more traditional sources”; (2) transparency and understandability of models and their results; and (3) discussion of models with the larger stakeholder population of providers and patients. Regarding the USPSTF communication of recommendations that include models, one interviewee captured the essence clearly, “The task force should tune their communications so that the science writer at The Washington Post or New York Times could understand, and then convey an accurate reporting of, the recommendation to the general public.” Many of the USPSTF members claimed the media training recently provided to task force members was helpful in their interactions with the media, but more broadly in their communications with a variety of audiences.
USPSTF members cited the largest challenge was due to the actual models themselves. “The modelers need to [do] a better job at clearly and simply communicating the results of their models.” Many interviewees mentioned the lack of standardization of model terminology, outputs, and results presentation as a challenge for the broader communications of results. One interviewee opined, “In the early 1980s, epidemiology faced this same issue…a group was formed and the science established an encyclopedia of epidemiology, which set the standard and began to allow for broader understanding and acceptance of epidemiology and results from epidemiological studies. Decision modelers should take this template and create their own encyclopedia.” Further, one USPSTF member suggested that all projects should begin with a specification of what the outputs should look like, describing tables and figures in detail. Once this alignment has taken place, the expectations are set and the results are easier to communicate. This improves the presentation and communication of results, but still leaves the description of the model, methodology, assumptions, structure, and techniques to be communicated. “Even with a high degree of transparency, it is still very difficult to describe these models to those that are not familiar with the models…technical appendices are indecipherable.”
The question of communications with audiences beyond the USPSTF or other policymakers is even more complex. Many of the interviewees were unsure how to overcome this issue. Some recommendations are captured in the next section regarding model literacy, but the consensus solution seemed to be the reporting of standardized results (e.g., quality adjusted life years, number needed to treat, etc.) consistently for each model, along with “an accessible appendix that clearly and simply describes the metrics and the model.” “Perhaps this is easier articulated, than actually achieved.”
After the discussion of communication, overall stakeholder literacy of modeling in general and the necessity of ongoing training and education were universally identified by all interviewees as a significant challenge to future projects. “Modeling is a unique discipline…it's not generally included in medical training, so even those that have used it for a while, need to have formal training by the experts.” USPSTF members described a “decision models 101” that had been developed during one of the projects and presented to the Task Force members as a tutorial, in preparation for the discussion of results. This was graded by several USPSTF members as an excellent session, and one that should be routinely conducted as new members join the task force and when projects incorporate new techniques and methods not previously addressed. “A short manual needs to be developed as a reference guide for terminology, types of models, and standard outputs…like frontier curves and output tables.” Some suggested that “Decision Models 101” should be included at the beginning of every results presentation that uses models.”
The issue of training was identified in several of the interviews. First, the disparate nature of the training that many USPSTF members and other stakeholders have received to date was noted. “Most of us didn't say I want to be a modeler, we just starting using them in our work and learned.” One solution, albeit longer term, is to “formalize training programs for decision modelers…if AHRQ [Agency for Healthcare Research and Quality] and the USPSTF want to use more models, then we need more resources to train the next modelers.” Beyond training modelers, training for other stakeholders and policy makers was reported as being essential. “How do we incorporate this into medical training, or public health policy making, so that when physicians and policy makers see models they can understand them, and also that they can know when to ask for a model, instead of report[ing] not enough data to make or change a policy or recommendation.”
Recommendations for Future Projects
Many of the lessons cited by both USPSTF members and the modelers were focused on the actual process of conducting the projects and analyses. Interviewees were prompted to reflect on the projects in which they were involved and report the top two to three lessons learned, either “what went right, or opportunity areas for improvement.” Responses grouped into five basic categories: (1) goals and objectives for the project, (2) outputs and results, (3) USPSTF interactions with modelers and/or reviewers, (4) leadership from the USPSTF, and (5) interactions among modeling teams.
The goals and objectives that the USPSTF are trying to address with the modeling effort need to be explicit and understood by both the modelers and the task force leader. Within key questions, the areas where modeling is anticipated to have impact and be most beneficial need to be identified so that modelers can tune the analysis to those specific items. The largest opportunity for models to impact is in determining the start, stop, and interval for different testing strategies, an essential objective of the USPSTF. Beyond testing strategies, models need to be utilized in key questions to help the USPSTF assess the net benefit and the magnitude of the effect/benefit. When these goals and objectives are specific, clear, and have been aligned, the project should deliver the necessary results. Lack of clarity has been a problem in the past. Modelers need to be very specific with the task force lead and the systematic review team as to what key questions, or components of key questions, can modeling likely impact, and if the evidence is sufficient to develop a model to address that specific issue.
The design of outputs from a model needs to be conducted in a purposeful and careful manner. Essentially, the “outputs are the model,” and as such need to be carefully constructed so that they answer the questions needed to inform and support decisionmaking. One interviewee suggested that “tables and figures be designed before the start of the project…this makes expectations clear and makes the goals clear, but it also ensures that the results will assist the decisionmaking.” This point is important. It does not suggest that the modeling effort merely confirm an existing conclusion, but that the outputs are directly usable by decisionmakers to inform and aid in the specific decision or recommendation that is being addressed. There was some discussion that often modeling efforts provide too much or too little information, and in some cases, do not provide the necessary information that the decisionmakers need, thus leaving them to interpret or interpolate the results to address the recommendations.
Modelers desire an iterative process that allows interim “readouts of results with the USPSTF lead.” “An iterative process is a much better discipline for modelers, especially with complex questions…interaction with the lead would have served us well, and allowed us to develop a better model.” Further, in one case, the syndicate of modelers was completely disconnected from the USPSTF and the key questions, and was asked to perform specific “runs” based on standard parameters and to simply report the results of those runs. There was concern that, by not informing the modelers of the key questions and the ability to further “tune the model to address the key issues,” unclear communication could result in less informative model results that may require additional analyses to be done. Such an iterative process will “hopefully give the task force members, or just the lead, more confidence in the model and a better ability to communicate the results.”
Informed, model-literate leadership within the USPSTF was mentioned in two of the three projects reviewed as an essential component of success. In both cases, the modelers and the USPSTF lead reported a modeling project that impacted the USPSTF recommendations and allowed the task force to make either a “more detailed recommendation” or “to increase the certainty and/or the magnitude of the effects.” Modelers noted that these USPSTF leads were familiar with models and had used them in their professional experiences, and thus were able to “be much more specific and answer detailed questions about their request…also they were able to challenge us on some of our logic.” No team mentioned the lack of leadership, just the difference that strong, informed leadership can make to a project.
Interactions among modeling groups and with systematic review teams were a source of many recommendations. In terms of interactions among modeling teams, the CISNET structure and operations were consistently cited as a best practice by the interviewers, albeit “an expensive undertaking that would require additional resources from AHRQ, the Task Force, or someone.” The operations of CISNET were seen as providing the right balance between interactions and collaboration among the modeling teams on a frequent enough basis while still maintaining distinct and separate models that in fact demonstrate disparate representations of the disease. The CISNET structure was also seen as advantageous in terms of building repositories of expertise in specific diseases. “The Task Force knows where to go to get the best talent to address breast cancer.” The interaction between the modelers and the systematic reviewers has been addressed previously in the discussion about coordination, but was seen as essential. Most modelers mentioned the advantages of frequent interactions with the systematic review teams. Interestingly, a few modelers suggested that such interaction made the “team more cohesive and…that improved the project.” Perhaps, in addition to the value of the interactions from a purely empirical stance, interaction impacts some team dynamics and feelings of ownership of the project, which in turn improves the overall project, and allows for a more integrated systematic review and modeling effort to address the Task Force's issues.
In this chapter we discussed best practices and recommendations for developing, validating, using, and communicating decision-analytic models in general as well as in the context of systematic reviews to inform decisionmaking of stakeholders. Three separate studies were conducted to reach this objective: a systematic review of the literature on best practices, a focus group of expert modelers, and a set of semi-structured interviews with key modelers and stakeholders involved with three recent cancer models used alongside systematic reviews.
All three studies provided rich sets of information regarding the quality of decision and simulation modeling in general. They included issues such as model formulation and characterization, model development and construction, handling and presentation of modeling assumptions, definition and presentation of parameters, outcomes to incorporate into the model, model analysis, model testing, validation, and implementation (including results presentation and communication).
The literature on best practices provided an extensive list of 23 dimensions of quality of decision and simulation models classified in four main categories: structure, data, consistency/validation, and communication.
As a complement to the summary from the literature, the focus group proposed to frame models and systematic reviews in the context of a decisionmaking framework and identified three key issues to be addressed: the scientific and technical quality of the model, the interaction between the model and the decisionmaker(s) the model is intended to inform, and the communication of the model and model results to a lay audience, beyond the decisionmaker.
Finally, the interviews capturing lessons learned from the breast, cervical, and colorectal cancer models conducted alongside systematic reviews for the USPSTF provided insights in four key categories regarding the development and use of models alongside systematic reviews: modality, communication and presentation of model results and rationale to stakeholders, modeling literacy of stakeholders, and recommendations for future projects.
The information gathered through these three activities is reinforcing and complementary, and provides a solid basis for establishing guidelines for the successful development of models alongside systematic reviews.
Agency for Healthcare Research and Quality (US), Rockville (MD)
Kuntz K, Sainfort F, Butler M, et al. Decision and Simulation Modeling in Systematic Reviews [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Feb. Best Practices for Decision and Simulation Modeling.