U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Institute of Medicine (US) Committee to Advise the Public Health Service on Clinical Practice Guidelines; Field MJ, Lohr KN, editors. Clinical Practice Guidelines: Directions for a New Program. Washington (DC): National Academies Press (US); 1990.

Cover of Clinical Practice Guidelines

Clinical Practice Guidelines: Directions for a New Program.

Show details

3Attributes of Good Practice Guidelines

Remember the drunk who searched for his keys under the lamp post because that's where the light was? Science is a highly systematic process of creating lamps and then looking under them.

David Warsh, Washington Post

Developing practice guidelines that enlighten practitioners and patients is an exceptionally challenging task. It requires diverse skills ranging from the analysis of scientific evidence to the management of group decisionmaking to the presentation of complex information in useful forms. Although the need for these skills has not always been recognized in the past, the recent focus on guidelines is bringing not only a greater awareness of what is required for their development but also a higher level of expertise to the field. The Office of the Forum for Quality and Effectiveness in Health Care should make every effort to reinforce this trend as it works with contractors, expert panels, and others to develop and disseminate practice guidelines.

This chapter describes eight attributes that the committee believes are essential if a set of guidelines are to serve their intended purposes of assisting practitioners and patients, providing a better foundation for the evaluation of services and practitioners, and improving health outcomes. These attributes are ideal characteristics to which real guidelines are unlikely to conform fully either now or in the future. However, in the committee's judgment, guidelines can approach these ideals to a greater extent than has generally been achieved to date.

The next four sections review the context, working assumptions, principles, and sources that guided the committee in developing its list of attributes, followed by a discussion of the attributes themselves. This chapter, however, is not intended either as an exhaustive description of how guidelines should be developed or as an endorsement of one specific method.1 The discussion in this chapter focuses on attributes of guidelines rather than attributes of medical review criteria, standards of quality, and performance measures. The recent IOM report on quality assurance in the Medicare program (1990d) discusses some attributes that good medical review criteria should have, for example, specificity and sensitivity.

One further introductory point: the committee has urged AHCPR and its Forum to focus their efforts on guidelines for clinical conditions rather than specific treatments or procedures. This focus will undoubtedly make their task more difficult: a consideration of conditions generally involves a broader look at alternatives, evidence, practice settings, and outcomes. The result, however, should be guidelines that are both more broadly and more specifically useful to clinicians and patients. The discussion of attributes in this chapter reflects this emphasis on conditions rather than procedures.

Background and Terminology

OBRA 89 specifies that "the Director [of the Forum] shall establish standards and criteria to be utilized by the recipients of contracts" for "developing and periodically reviewing and updating" guidelines, standards, performance measures, and review criteria. Confusion is likely if "criteria and standards" are used to label both the bases for prospectively assessing practice guidelines and the bases for assessing clinician practice. Consequently, to reduce possible terminological confusion, this report refers to ''attributes of guidelines" rather than to "standards and criteria" for "guidelines, standards, performance measures, and review criteria." Synonyms include properties and characteristics. 2

The Forum must be able to employ the list of attributes set forth in this chapter in at least two ways. First, it will need to communicate its expectations in advance to the contractors or expert panels that may develop guidelines for the agency. Second, the Forum and potential users of the guidelines must be able to assess the soundness of a given set of guidelines after they are developed. The IOM expects in a second project to prepare a practical assessment instrument that the Forum can use to systematically review guidelines developed by its panels or by other groups (Appendix C).

During the committee's deliberations, a question was raised about whether the Forum has formal authority under OBRA 89 either to reject or approve the guidelines developed by its contractors or expert panels. This report does not speak to that legal point. Nevertheless, regardless of the Forum's statutory authority in this regard, it is reasonable that the agency should examine the soundness of guidelines developed under its auspices. This examination may (1) improve the way the agency works with contractors or panels in the future, (2) contribute to more informed consideration of dissemination options and evaluation strategies, (3) allow more sophisticated consultations with HCFA and other government agencies about their use of the guidelines, and (4) provide feedback about the feasibility of the assessments proposed here.

In this report, assessment means the prospective or initial judgment of the soundness and feasibility of a set of guidelines. In contrast, the empirical evaluation of the cost, quality, and other effects of guidelines occurs after they are published and implemented.

Further, a set of guidelines includes a series of statements or recommendations about appropriate practice and the accompanying descriptions of evidence, methodology, and rationale. A guideline in the singular refers to a discrete statement or recommendation (for example, annual breast physical examination for women aged 40 to 49 with no family or personal history of breast cancer). Each of the appropriateness reports published by the RAND Corporation clearly exemplifies a set of guidelines (Park et al., 1986). Likewise, using this terminology, the report of the U.S. Preventive Services Task Force (1989) contains 60 sets of guidelines and not 60 guidelines.

Working Assumptions

The committee's first working assumption has been that a set of guidelines will be assessed as a whole; that is, its elements will not be assessed individually in isolation. Under this assumption the Forum could judge a set of guidelines acceptable even if individual statements lacked—for legitimate reasons—some essential attributes. Realistically, early guidelines and (especially) existing guidelines are not likely to score well on all eight attributes collectively. However, the committee expects that, as the development process matures, guidelines will continue to comprise more and more of the attributes.

Second, the committee assumes that the Forum will (in line with OBRA 89 provisions) convene expert panels to assess either existing guidelines or guidelines for which the Forum has contracted. These panels will need to make both objective and subjective assessments guided by instructions from the Forum. This report is a step toward the preparation of an assessment instrument that the expert panels can use in their reviews and deliberations (Appendix C). The AMA has recently taken a similar step by developing a preliminary worksheet to evaluate what it terms practice parameters (AMA, 1990b).

Third, the committee sees the initial assessment of guidelines as part of an evolutionary process of guidelines development, assessment, use, evaluation, and revision. This evolutionary process will involve the government, professional organizations, health service researchers, consumers, and others. As a result, the committee fully expects the set of attributes presented here to be tested, reassessed, and revised, if necessary.


The identification of attributes of practice guidelines rests on four principles. These principles call for:

  • clarity in the definition of each attribute;
  • compatibility of each attribute and its definition with professional usage;
  • clear rationales or justifications for the selection of each attribute; and
  • sensitivity to practical issues in using the attributes to assess actual sets of practice guidelines ("accessibility").

That the definition of an attribute be clear and succinct is obviously desirable, although often difficult when one is working with very abstract or technical concepts. It is also desirable that the term used to label an attribute be recognizable and consistent with customary professional usage. The label should be a single word or short phrase that is carefully chosen to convey the core concept. (Thus, attributes will not be described by number, for example, Attribute No. 1.)

The rationale or justification for each attribute should be clearly described, and it should also be consistent with the professional and technical literature and the legislative mandate. The rationale should describe explicitly any trade-offs between the theoretically ideal attribute and the practical, usable one.

Practicality requires that attributes be definable in operational as well as conceptual terms; that is, it should be possible to devise an instrument that instructs assessors of a set of guidelines on how they can determine whether the guidelines conform to the attributes. Not only is this necessary if the Forum is to judge the soundness of the guidelines that emerge from its expert panels; it is fundamental that the Forum instruct developers of guidelines on the desired properties of guidelines and on the documentation needed as a basis for assessment. As mentioned earlier, the development of a formal instrument for assessing guidelines is an important next step for this committee.

More generally, the number of attributes must be sensible and practical. An appropriate balance must be reached between enough attributes to allow adequate assessment of the guidelines but not so many that the assessment exercise becomes infeasible, confusing, or excessive, given limited resources. It is likely that an instrument for assessing guidelines will need to weight the eight attributes in some manner, specifying which of them are more significant in determining whether a given set of guidelines are sound. Given its time and resource constraints, this committee did not systematically rank the different attributes by relative importance, although the discussion below does distinguish some of the more important ones.

A final point: this report differentiates between the priorities for selecting particular targets for guidelines and the desirable attributes of guidelines. The attributes listed in this chapter do not incorporate the OBRA 89 provisions requiring that priorities for the development of guidelines reflect the needs and priorities of the Medicare program and include clinical treatments or conditions accounting for a significant portion of Medicare expenditures.

The legislation also calls on the Secretary of Health and Human Services to consider the extent to which guidelines can be expected "(i) to improve methods of prevention, diagnosis, treatment, and clinical management for the benefit of a significant number of individuals; (ii) to reduce clinically significant variations among physicians in the particular services and procedures utilized in making diagnoses and providing treatment; and (iii) to reduce clinically significant variations in the outcomes of health care services and procedures." In arriving at its eight recommended properties of guidelines, the committee did not incorporate these factors. Priority setting is a crucial but separate task and one that IOM has undertaken as part of other studies (IOM, 1990a,b,c,e).

Past Work on Defining Attributes

This committee considered three primary sources in identifying attributes for practice guidelines: (1) the legislation, (2) the IOM report on quality assurance for Medicare, and (3) work by the AMA. Other important materials, which in some cases were used in the primary sources, include the work of Brook, Chassin, Eddy, Greenfield, and their collaborators, as cited elsewhere in this report.

In addition to describing priorities to guide the Forum in selecting topics for guidelines, OBRA 89 sets forth some characteristics that guidelines should have. The committee distinguished these four points from the legislation.


Guidelines should be based on the best available research and professional judgment regarding the effectiveness and appropriateness of health care services and procedures.


The Forum director is expected to ensure that appropriate, interested individuals and organizations will be consulted during the development of guidelines.


The director has the power to pilot-test the guidelines.


Guidelines should be presented in forms appropriate for use in clinical practice, in educational programs, and in reviewing quality and appropriateness of medical care.

A second major source for the committee's work, the IOM report on a quality assurance strategy for Medicare (1990d), included a chapter on attributes of quality of care and appropriateness criteria. These attributes derived from a June 1989 meeting of experts on the construction and use of practice guidelines. Some of the distinctions proposed by the quality panel are not used here. For example, this committee's report emphasizes key attributes of good guidelines but contains relatively little discussion of desirable but less critical attributes. In addition, this report drops the panel's distinction between substantive and implementation guidelines because the committee found it awkward to label every attribute as either one or the other. The point that lay behind the original distinction should, nonetheless, be stressed: the designers of guidelines need to keep implementation in mind—whether and how the guidelines can be used.

A third source considered by the committee was the AMA's booklet, "Attributes to Guide the Development of Practice Parameters" (1990a), which sets forth five attributes. They are (minus their accompanying discussion and more detailed descriptions) as follows: (1) practice parameters should be developed by or in conjunction with physician organizations; (2) reliable methodologies that integrate relevant research findings and appropriate clinical expertise should be used to develop practice parameters; (3) practice parameters should be as comprehensive and specific as possible; (4) practice parameters should be based on current information; and (5) practice parameters should be widely distributed.

Attributes for Assessing Practice Guidelines: Overview

The art of developing practice guidelines is in an early stage, and the strengths and weaknesses of specific approaches are still being debated. As a consequence, the committee recognizes that what is expected of guidelines, in terms of their development and implementation, will need to evolve beyond these initial specifications.

Table 3-1 lists the eight attributes for assessing guidelines that the committee identified. One theme emphasized here, which ties these guideline attributes together, is credibility—credibility with practitioners, with patients, with payers, with policymakers. This theme encompasses the scientific grounding of the guidelines, the qualifications of those involved in the development process, and the relevance of the guidelines to the actual world in which practitioners and patients make decisions.

TABLE 3-1. Eight Attributes of Good Practice Guidelines.


Eight Attributes of Good Practice Guidelines.

A second and related theme is the importance of accountability, a key element of which is disclosure. That is, the committee expects that procedures, participants, evidence, assumptions, rationales, and analytic methods will be meticulously documented—preferably in an accompanying background paper. This documentation will help those not participating in any given process of guidelines formulation to assess independently the soundness of the developers' work.

Explanations should be provided for any conflict or inconsistency between the guidelines in question and those developed by others. The issue of disagreement or inconsistency among practice guidelines is an important one for patients, practitioners, managers, payers, and policymakers. As discussed in Chapter 5 of this report, merely identifying inconsistencies in guidelines says nothing about the legitimacy of such differences. Careful documentation of the evidence and rationales can help potential users of guidelines judge whether inconsistencies arise from differences in the interpretation of scientific evidence, from differences in the care taken in developing the guidelines, or from other factors.


In the committee's view, the validity of practice guidelines ranks as the most critical attribute, even though it may be the hardest to define and measure. Conceptually, a valid practice guideline is one that, if followed, will lead to the health and cost outcomes projected for it, other things being equal. In the research literature, validity is commonly defined by three questions. Do the instruments for measuring some concept (for example, quality of care) really measure that concept? Does the relationship or effect that the researchers assert exists (for example, following a set of guidelines improves quality of care) really exist? Can that relationship be generalized (for example, from clinical trials to everyday medical practice)?

Until a guideline is actually applied and the results evaluated, validity must be assessed primarily by reference to the substance and quality of the evidence cited, the means used to evaluate the evidence, and the relationship between the evidence and recommendations.3 In the context of the Forum's practical needs, the committee recommends that an assessment of validity look for 11 elements in a set of guidelines. These elements are listed below:

  • Projected health outcomes
  • Projected costs
  • Relationship between the evidence and the guidelines
  • Preference for empirical evidence over expert judgment
  • Thorough literature review
  • Methods used to evaluate the scientific literature
  • Strength of the evidence
  • Use of expert judgment
  • Strength of expert consensus
  • Independent review
  • Pretesting.

Projected Health Outcomes

A key reason for developing and using practice guidelines is the expectation that they will improve health outcomes. Ideally, a set of guidelines should give practitioners, patients, and policymakers an explicit description of the projected health benefits (for example, a reduction in postoperative infection rates from 4 to 2 percent) and the projected harms or risks (for example, an increase in the risk of incontinence from 10 to 20 percent). If reasonable and technically feasible, the net effects of a course of action—the balance of benefits against risks or harms—also need to be estimated. In addition, projected outcomes should be compared with those for alternative courses of care for the clinical condition in question.

The ideal set of projections just described will often be technically or practically beyond the reach of guidelines developers. In most situations, the assistance of outside consultants or specialized technical advisory panels will be at least helpful or at most essential; yet even with such help, projecting health outcomes is intrinsically a complex and subjective process. The nature of the process makes it particularly important that the methods for projecting outcomes, the limitations in these methods, and the evidence for such projections be described.

When empirical evidence is limited, potential effects may only be listed, not quantitatively compared or weighed. In addition, in cases in which patient preferences about different risks and benefits may differ, practice guidelines will need to be sensitive to such variation, and a comprehensive statement of net effects may have to be omitted (Mulley, 1990). In any event, a systematic effort should be made to provide practitioners, patients, and others with information that will help them make their own judgments of the balance of benefits and risks.

Figure 3-1 provides a simple checklist of outcomes that might be estimated. The particular outcomes to be considered will vary with the clinical conditions and practices under consideration.

Figure Icon


A possible checklist for describing benefits, risks, and costs. SOURCE: This figure is adapted in part from the National Research Council report, Improving Risk Communication (1989).

To support the eventual evaluation of the actual impact of guidelines, guidelines developers should indicate what information related to outcomes will be needed, where it can be obtained, and whether better means for collecting and analyzing data need to be established to permit evaluation.

On this last point, limitations in the sources of data and the variables used to project outcomes are likely to provide inspiration for recommending improvements.

Projected Costs

Recent interest in practice guidelines is founded in part on the explicit or implicit expectation that they can help control escalating health care costs. The committee has already cautioned that some guidelines, if followed, may increase short- or long-term costs and that the net cost effects of current initiatives are not clear. These kinds of uncertainty underscore the desirability of including some form of cost projections in the background documentation for guidelines.

Cost estimation, like the projection of health outcomes, has its own special technical complexities and subjective aspects that will often require the services of outside consultants or specialized technical advisory panels. Even with such assistance, the committee recognizes that the results will be imperfect. In general, estimates of the costs associated with a set of guidelines should follow the same principles of documentation and discussion described for the estimation of health outcomes, including comparisons of alternative courses of care (see Figure 3-1). The remainder of this section describes desirable elements of cost projections, elements the committee sees as goals rather than minimum requirements.

Ideally, cost estimates should have two components, one involving projected health care costs and the other relating to administrative costs. The estimated health care costs of following the guidelines should reflect (1) the estimated total number of services that will be added, substituted, or deleted if a guideline is followed and (2) the substantiated charges (or production costs) for these services. For example, for screening services, the expected costs of providing the services and of treating the problems that are detected all need to be included. Depending on the available information and the assumptions used, estimates will often take the form of ranges rather than point estimates.

If health outcomes are projected in terms of additional life expectancy or similar measures, then the cost per unit of each identified outcome should be projected. Again, ranges may be more suitable than point estimates. If the guidelines indicate acceptable alternative courses of care, the total costs of the major alternatives and their cost per unit of each expected benefit should be described.

Cost estimates should also consider the additional expenses that may be associated with administering or using the guidelines. For example, computer hardware or software may be required to support easy access to various sets of guidelines. In the case of medical review criteria, additional staff may be required to handle inquiries.

This report does not take a position on whether costs should be explicitly factored into recommendations, although some committee members have strong views that such a step should be mandatory if guidelines are to control costs. The committee did agree that information on projected health outcomes and costs will help developers and users of guidelines better understand the implications of following or not following the guidelines. One part of this process will be some clarification of both the factual and the value judgments involved for practitioners, patients, health plans, and others in making such decisions. In some cases, a patient may decide that a service is not worth the personal out-of-pocket cost; in others, a provider may choose among clinically acceptable alternatives on the basis of financial considerations, such as the opportunity cost of acquiring new equipment. Similarly, a health benefits plan may opt not to cover a category of service that it is quite appropriate for a practitioner to provide and a patient to receive.4

Relationship Between the Evidence and the Guidelines

Practice guidelines have not always been clearly and consistently related to the scientific and clinical evidence (Eddy and Billings, 1988), but they should be. The link between the base of evidence and a set of guidelines needs to be explicit, preferably with specific citations for specific portions of a set of guidelines. This implies the need for a reference list rather than just a bibliography of literature used in the guidelines development process. A bibliography may, however, indicate sources consulted but not cited.

Preference for Empirical Evidence Over Expert Judgement

Empirical evidence should take precedence over expert judgment in the development of guidelines. When the empirical evidence has important limitations and experts reach conclusions that are not consistent with the evidence, then the conflict and limits of the evidence should be clearly described and the rationale for departing from the evidence, such as it is, should be explained. When expert judgment proceeds in the absence of direct empirical evidence about a particular clinical practice, a frequent circumstance, the general scientific reasoning or normative (ethical, professional) principles supporting the expert judgments should be described.

Thorough Literature Review

A thorough review of the scientific literature should precede the development of practice guidelines and serve as their foundation. This review must be well documented and easily available to those assessing or using a set of guidelines. It should describe all relevant aspects of the scientific research including (1) sponsors of the research, (2) investigators and their institutional affiliations, (3) research settings (for example, academic medical center or public outpatient clinic), (4) research populations, (5) methods (for example, randomized clinical trial), (6) limitations (for example, a research population limited to males when the condition or service under study is not), and (7) findings. The literature search method should also be described (for instance, MEDLARS), and the rules for including and excluding research should be explicitly noted (for example, whether unpublished materials or articles ''in press" were used).

Altogether, the thoroughness of the review is a key step in developing valid guidelines, and documentation is a key requirement for later assessments of validity. The task, like those of estimating health outcomes and costs, may require the assistance of outside consultants or technical advisory panels. The qualifications of the individual or individuals responsible for the review should be described.

Methods Used to Evaluate the Scientific Literature

Methods for reviewing, summarizing, and evaluating the literature range from unarticulated and subjective—one person's unsupported synopsis, for instance—to highly formal, quantitative means of information synthesis and techniques of meta-analysis (Eddy, 1990b). The former approach is usually unsatisfactory for developing valid guidelines, and it is certainly no aid to those assessing guidelines independently. At a minimum, the factors considered in "weighing" or evaluating the evidence should be explicitly identified. For example, a reviewer could state that he or she weighed evidence from randomized clinical trials more heavily than evidence from case-control studies. An explicit rating of each entry in the literature used in the guidelines development process may be helpful but is not essential (Canadian Task Force on the Periodic Health Examination, 1979).

The more formal the analytic approach, the more valid the literature review (and hence the resulting guideline) can be expected to be. Formal approaches require that analysts guard against any application of quantitative and other systematic techniques that may disguise the limitations of incomplete or poor literature and thereby distort conclusions. The references to this chapter describe several formal approaches to evaluating evidence.

Strength of Evidence

Inevitably, the evidence for some guidelines will be more abundant, consistent, clear, relevant, and methodologically rigorous than the evidence for others. Consequently, guidelines developers should provide some explicit description of the scientific certainty associated with a set of guidelines (Eddy, 1990a–e). The approach recently taken by the U.S. Preventive Services Task Force (1989) was to rank study designs; randomized controlled trials were ranked highest and expert opinion, lowest. However, this unidimensional scheme would rate a poorly executed randomized clinical trial more highly than a carefully done nonrandomized trial, a questionable result in the committee's view. More complex and statistically based techniques may be more accurate, but specific recommendations are beyond the scope of this committee.

One consequence of a thorough and expert assessment of the evidence may be a decision to defer the effort to develop guidelines for the condition or service in question because the evidence is weaker or less conclusive than expected when the effort was initiated. When it is imperative to go ahead with guidelines on a topic, the alternative is to rely more heavily on expert consensus. Even so, the experts may eventually agree that guidelines should be deferred for lack of either evidence or consensus.

Use of Expert Judgment

Expert or group judgment may come into play in guidelines development in two somewhat different but not incompatible ways. First, groups may be used to evaluate and rate scientific evidence with or without the support of quantitative methods such as meta-analysis. Second, group judgment may be used as the primary basis for a guideline when the scientific evidence is weak or nonexistent. Rather than have expert panels accept a consultant's or other party's review uncritically, the panels should conduct their own careful "review of the review" of the literature.

The methods used to arrive at group judgments must be carefully selected and well described (IOM, 1985). For example, if formal votes are taken, a secret, written ballot should be used, insofar as possible, and a record of the results of each round of voting should be maintained. Any departure from a policy of "one person-one vote" must be justified. If a panel member is absent from active group discussion of the guidelines, that absence should be noted. A recent IOM workshop on group judgment noted that more research needs to be done regarding the validity and reliability of judgments reached using different group judgment techniques (IOM, 1990f; Lomas, 1990).

Strength of Expert Consensus

Expert groups will almost assuredly participate in the literature review and development of guidelines. The extent to which those experts agree on their findings and recommendations is important information. Thus, a set of guidelines should describe the strength and nature of the group consensus or agreement.

In some cases, the experts may strongly agree that clear evidence supports precise statements in a set of guidelines about the appropriateness or inappropriateness of a particular clinical practice. This agreement is powerful support for the validity of those statements. In other situations, experts may strongly agree that no clear evidence exists on which to base precise statements about appropriateness. This, too, is important information. In still other cases, the experts may disagree about what the evidence indicates and what statements about appropriateness are warranted (Park et al., 1986). These three quite different situations have different implications for guidelines developers and users.

The extent of agreement within an expert group should be reported in quantitative terms (for example, simple percentages describing levels of agreement or disagreement). When evidence or professional agreement is very strong, guidelines may be more confidently translated into criteria for evaluating practitioner performance.

Independent Review

In any endeavor involving expert panels and the subjective evaluation and interpretation of data, different groups may well arrive at different conclusions. Replication of guidelines development on the same clinical condition or technology is not likely to be feasible, affordable, or desirable (in terms of the opportunity costs involved). Therefore, at a minimum, some effort should be made to subject guidelines (including the relevant literature reviews) to review and criticism by professionals who are not involved in the original development process. These procedures should be described and the results summarized.


Pretesting a set of guidelines on members of the intended user group (for example, practitioners or patients) using a real organization or a set of prototypical cases is desirable. (See also the discussions of reliability/reproducibility and clarity, below.) Description of methods, settings, and results of any pretests of the guidelines should be described. The Forum has been given authority to pretest guidelines, and the committee believes it should exercise that authority.


As conventionally used in a research context, reliability is linked to the measuring, diagnosing, or scoring of some phenomenon such as intelligence or bacterial infection.5 In the context of guidelines, the committee uses the concept to refer to the ability of some method or process to produce consistent results across time or across users, or both. In strictly technical terms, levels of reliability dictate possible (achievable) levels of validity; that is, qualitative and quantitative instruments and tools cannot be valid if they are not reliable.

One kind of reliability is methodological. Ideally, if another group of qualified individuals using the same evidence, assumptions, and methods for guidelines development (for example, the same rules for literature review) were to develop guidelines, then they should produce essentially the same statements. In practice, such replications are almost unknown given the expense of the process,6 but discussion of previous trials of the methodology (for different conditions) and any resulting revisions may be useful. Likewise, review of the guidelines by an outside panel can help in assessing reliability. (Recall that independent review is also important to an assessment of validity, a fact that underscores the link between reliability and validity.)

A second kind of reliability that is important for practice guidelines is clinical reliability. Practice guidelines are reliable if—given the same clinically relevant circumstances—the guidelines are interpreted and applied consistently by practitioners (or other appropriate parties). That is, the same practitioner, using the guidelines, makes the same basic clinical decision under the same circumstances from one time to the next, and different practitioners using the guidelines make the same decisions under the same circumstances. Pretesting of guidelines in actual delivery settings or on prototypical cases can help test this kind of reliability as well as contribute to assessments of validity.

For medical review criteria and other specific tools for evaluating health care actions or outcomes, the concept of reliability (or reproducibility) seems straightforward. Ideally, review criteria and other tools for evaluating performance should be pretested to provide evidence that they meet a specified level of reliability over time for the same user (test-retest reliability) and between users (interrater reliability). Review criteria often run into reliability problems when they use undefined terms—such as "frequent" or "serious" or "presence of comorbid conditions"—that different users may interpret quite differently. Thus, one tactic developers of guidelines and review criteria should use to maximize reliability is to avoid such terms unless precise definitions are provided.

Clinical Applicability

Because of the considerable resources and opportunity costs involved in developing practice guidelines, guidelines should be written to cover as inclusive a patient population as possible, consistent with knowledge about critical clinical and sociodemographic factors relevant for the condition or technology in question. For instance, a guideline should not be restricted to Medicare patients only through age 75 or through age 85 if evidence and expert judgment indicate that the clinical condition or the technology in question is pertinent to those over age 85.

This attribute requires that guidelines explicitly describe the population or populations to which statements apply. These populations may be defined in terms of diagnosis, pathophysiology, age, gender, race, social support systems, and other characteristics. The purpose of such a definition is to help physicians concentrate specific services on classes of patients that can benefit from those services and avoid such services for classes of patients for whom the services might do harm or produce no net benefit. Again, the relevant scientific literature needs to be cited or its absence noted.

Clinical Flexibility

Flexibility requires that a set of guidelines identify, where warranted, exceptions to their recommendations. The objective of this attribute is to allow necessary leeway for clinical judgment, patient preferences, and clinically relevant conditions of the delivery system (including necessary equipment and skilled personnel).7

Operationalizing this attribute may be difficult. In the committee's view, a fairly rigorous approach should be adopted, one that requires a set of guidelines to (1) list the major foreseeable exceptions and the rationale for such exceptions, (2) categorize generally the less foreseeable or highly idiosyncratic circumstances that may warrant exceptions, (3) describe the basic information to be provided to patients and the kinds of patient preferences that may be appropriately considered, and (4) indicate what data are needed to document exceptions based on clinical circumstances, patient preferences, or delivery system characteristics.

The role of patient preferences, whether considered in the context of daily clinical practice or in the context of developing guidelines, is a particularly complex issue. For example, there is much disagreement about the proper behavior for practitioners faced with preferences they believe are unreasonable or unacceptable (Brock and Wartman, 1990). Likewise, the balance between patient preferences and societal resources is the subject of intense debate.

A thorough treatment of this issue was not part of the committee's charge. However, in addition to recommending that patient interests be taken into account at several points in the process of developing guidelines, the committee makes two observations. First, patient preference for a service generally need not be acceded to when the service cannot be expected to provide any benefit or when it can be expected to produce a clear excess of harm over benefit. Second, when a mentally competent patient unreasonably wishes (in a practitioner's view) to forego treatment, the practitioner can try to persuade the patient to accept care but cannot, with rare exceptions, insist on treatment.


Clarity means that guidelines are written in unambiguous language. Their presentation is logically organized and easy to follow, and the use of abbreviations, symbols, and similar aids is consistent and well explained. Key terms and those subject to misinterpretation are defined. Vague clinical language, such as "severe bleeding," should be avoided in favor of more precise language, such as "a drop in hematocrit of more than 6 percent in less than eight hours." Similarly, guidelines must be specific about what populations and clinical circumstances are covered and what specific elements of care are appropriate, inappropriate, and (if relevant) equivocal, as those terms were defined earlier.

For practical reasons, assessments of language and modes of presentation may have to be largely subjective. Depending on the audience, somewhat different standards for assessing clarity may be needed. Materials for consumers might be subject to the "readability" measures that have been variously applied to regulations, consumer warranties, and similar materials. Materials for practitioners may be more technical but should not be burdened by needless jargon, awkward writing, or "unfriendly" software. Software itself may soon allow organizations to apply computer-based "style manuals" or ''templates" to help standardize writing for different purposes (Frankel, 1990).

Multidisciplinary Process

One of the committee's strongest recommendations is that guidelines development include participation by representatives of key affected groups and disciplines.8 The rationale for this position is threefold. First, multidisciplinary participation increases the probability that all relevant scientific evidence will be located and critically evaluated, thereby strengthening the scientific grounding, scope, and flexibility of the guidelines. Second, such participation increases the likelihood that practical problems with using guidelines will be identified and addressed, thus constructing a firmer foundation for successful application of the guidelines in real-world situations. Third, participation helps build a sense of involvement or "ownership" among different audiences for the guidelines, thereby improving the prospect for cooperation in implementing them. Figure 3-2 summarizes these rationales and other key issues in developing or assessing a participation strategy.

Figure Icon


Multidisciplinary participation in guidelines development.

Among clinicians, multidisciplinary participation may call for the use of clinicians with and without full-time academic ties, for the inclusion of specialists and generalists, and for participation by relevant nonphysician practitioners. Optometrists, for instance, could well have an important role to play on panels to develop guidelines for cataract surgery. Experts in research and analytic methods also need to be represented on guidelines development panels; that is, methodological expertise should not be obtained only on a contractual basis or from specialized technical advisory panels.

User groups—in addition to clinicians—include health care administrators, members of peer review organizations, payers, and patients or consumers. If guidelines are expected to pertain to groups distinguished mainly by sociodemographic characteristics (for example, age or minority ethnic groups), special efforts are warranted to involve representatives of those groups at some early stage of development. Successful involvement of patients or consumers is a challenge that may require multiple strategies, as described below.

Documentation for this attribute will need to describe the parties involved, their credentials and potential biases, and the methods used to solicit their views and arrive at group judgments. The committee does not recommend, however, that the Forum develop detailed, rigid definitions of what constitutes a consumer or other participant category. (The often unproductive troubles such definitions created for federally funded health planning agencies were cited during the committee discussion.)

A frequent although not necessarily valid criticism of guidelines is that their content can be improperly manipulated by selecting group participants for their known opinions rather than on the basis of their expertise. The position taken here is that all participants in the guideline-setting process are likely to have personal opinions, biases, and preferences about the clinical problem or service at issue, and no amount of effort will expunge those factors. What is critical is that those factors be known and balanced insofar as possible. 9

The committee discussed at some length the question of who should develop guidelines. Some members felt quite strongly that the Forum should not contract with medical specialty societies for guidelines development services. Others felt that establishing such a blanket prohibition was not the right approach. Instead, decisions should be based on a comparative assessment of potential developers' track records and capacities. These capabilities include, for example, related work that the groups or individual participants have already done, existing documentation of participants' credentials and biases, and the methods and evidence with which they have experience. Although the committee did not reach a specific consensus that the Forum should completely exclude specialty societies as potential direct contractors or subcontractors, the agency should be sensitive to the credibility concerns raised by this question. Physician organizations in any case should be extensively consulted by developers of guidelines, involved in reviewing draft guidelines, and used to help disseminate guidelines.

Another debate arose during the committee's meetings over the question of who should chair a guidelines development group. Again, some felt that a specialist user of a particular technology (for example, a cardiac surgeon who performs coronary artery bypass surgery) should never chair a group developing guidelines on the use of that technology. Others felt that exceptions to the general principle might sometimes be warranted. There was considerable agreement that a physician should chair the development of any clinical practice guidelines. Again, explicit attention to questions of bias is essential.

Participation by affected groups in the process of guidelines development can be achieved in several ways. The strongest form of participation is membership on the panel charged with developing guidelines, but the benefits of this approach have to be balanced against the practical management problems created by too large a panel. Participation may also be achieved through mechanisms other than the panel—for example, public hearings, circulation of draft guidelines for review and comment by a wide variety of groups, and contracts with particular interests for specific analyses. Focus groups and pretests may uncover confusing language or highlight the "hassle factor" associated with draft guidelines and allow practitioners or patients to suggest more acceptable alternatives.

Different types of guidelines are likely to require different mechanisms for participation, and the benefits of participation need to be balanced against resource limitations and other constraints. Therefore, this report stresses the principle and value of participation rather than the specific vehicles. Creativity and experimentation should, in fact, be encouraged.

Scheduled Review

Clinical evidence and judgment are not static. Therefore, guidelines should designate a review date to determine whether they should be updated or, potentially, withdrawn. In a clinical area where technologies are changing rapidly and new research findings can be expected to accumulate quickly, a relatively short timetable may be appropriate. More stable clinical areas may permit a longer period before scheduled review. In every case, however, a guideline should contain a specific review date or time frame for review (for example, within three years of initial publication). The greater the amount of change in a clinical area, the more the revision process will resemble the initial development process in scope, cost, and intensity.

Follow-up on review schedules is part of the implementation process (see Chapter 4) as is determination of whether review is needed before the scheduled date. Unscheduled revisions may be prompted by major new clinical evidence or by emerging or disintegrating professional consensus. To oversee both scheduled and unscheduled reviews, an organization responsible for the development of multiple sets of guidelines should subject all of its guidelines to some kind of yearly examination to flag particular guidelines for either scheduled or unscheduled review. As described in the next chapter, the mechanisms for disseminating and administering guidelines need to provide for guidelines updating or withdrawal.


For the purposes of emphasis, the committee lists documentation as a separate attribute even though it has already been referred to repeatedly in the discussion of other attributes. As a practical matter, a documentation checklist, such as the preliminary version presented in Table 3-2, may be helpful for contractors and review panels.

TABLE 3-2. Provisional Documentation Checklist for Practice Guidelines.


Provisional Documentation Checklist for Practice Guidelines.


This chapter has proposed eight attributes of practice guidelines that the Forum should employ in advising its contractors and expert panels and in assessing the quality of the guidelines it receives. The attributes are validity, reliability (reproducibility), clinical applicability, clinical flexibility, clarity, multidisciplinary process, scheduled review, and documentation. Definitions of these terms and some examples that may aid in their operationalization are also given. Operationalization, that is, turning these eight concepts into a practical instrument for the Forum to use in prospectively assessing guidelines, is one task in a broader project that the IOM is currently conducting (Appendix C).

Several issues about guidelines development need to be kept in mind as the Forum proceeds. First, neither existing guidelines nor those likely to be developed by the agency in the foreseeable future will "score well" on all eight properties simultaneously; indeed, near-perfect scores may always lie in the realm of aspiration rather than attainment. Second, a balance needs to be maintained between an ideal process and a feasible one. For example, this committee, and others, could design a very meticulous process to take into account the views of all interested groups. At some level, that process would consume more resources—in time, professional input, and money—than the outputs would warrant. That is, it would be too slow, too cumbersome to administer, and too costly to meet the needs of providers, third-party payers, or patients. It undoubtedly would not conform to the congressional deadlines of OBRA 89.

The third point to stress is that guidelines development must be an evolutionary process, especially at the national (or federal) level. There is no proven "right way" to conduct this endeavor, even if there clearly are some "better ways." Guidelines that satisfactorily reflect the eight attributes proposed here may not be products of an ideal process, but in the committee's view they will be defensible.

Two other themes should be reiterated: the need for credibility among practitioners, patients, payers, and policymakers, and the need for accountability. The entire practice guidelines enterprise will not fulfill its promise (and certainly the federal program will not) if the products lack solid scientific grounding and widespread understanding and support from the provider and patient communities. The significance accorded such attributes as validity and reliability, clarity, multidisciplinary approach, and documentation reflects the committee's concerns with these needs. Although in the first instance the themes of credibility and accountability apply to the procedures followed in guidelines development, they also carry through to the procedures of implementation and evaluation, which are the subjects of the next chapter.


  • American College of Physicians. Clinical Efficacy Assessment Project: Procedural Manual . Philadelphia, Pa.: 1986.
  • American Medical Association. attributes to Guide the Development of Practice Parameters . Chicago, Ill.: American Medical Association, 1990. a.
  • American Medical Association. Preliminary Worksheet for the Evaluation of Practice Parameters. Draft of ad hoc review panel. Chicago, Illinois, May 1990b. Canadian Task Force on the Periodic Health Examination. Canadian Medical Association Journal 121:1193-1254, 1979. [PMC free article: PMC1704686] [PubMed: 115569]
  • Battista, R., and Fletcher, S. Making Recommendations on Preventive Practices: Methodological Issues. American Journal of Preventive Medicine 4:53-67, 1988. (Supplement). [PubMed: 3079142]
  • Brock, D., and Wartman, S. When Competent Patients Make Irrational Choices. New England Journal of Medicine 322:1595-1599, 1990. [PubMed: 2336090]
  • Chassin, Mark. Presentation to the IOM Committee to Advise the Public Health Service on Practice Guidelines. Washington, D.C., April 2, 1990.
  • Eddy, D. Comparing Benefits and Harms: The Balance Sheet. Journal of the American Medical Association 263:2493-2505, 1990. a. [PubMed: 2329639]
  • Eddy, D. Guidelines for Policy Statements: The Explicit Approach. Journal of the American Medical Association 263:2239-2240, 1990. b. [PubMed: 2319689]
  • Eddy, D. Practice Policies--Guidelines for Methods. Journal of the American Medical Association 263:1839-1841, 1990. c. [PubMed: 2313855]
  • Eddy, D. Practice Policies--What Are They? Journal of the American Medical Association 263:877-880, 1990. d. [PubMed: 2296151]
  • Eddy, D. Practice Policies--Where Do They Come From? Journal of the American Medical Association 263:1265-1275, 1990. e. [PubMed: 2304243]
  • Eddy, D. Designing a Practice Policy: Standards, Guidelines, and Options. Journal of the American Medical Association, forthcoming (a). [PubMed: 2342221]
  • Eddy, D. A Manual for Assessing Health Practices and Designing Practice Policies . American College of Physicians, forthcoming (b).
  • Eddy, D., and Billings, J. The Quality of Medical Evidence and Medical Practice. Paper prepared for the National Leadership Commission on Health, Washington, D.C., 1988.
  • Fink, A., Kosecoff, J., Chassin, M., et al. Consensus Methods: Characteristics and Guidelines for Use. American Journal of Public Health 74:979-983, 1984. [PMC free article: PMC1651783] [PubMed: 6380323]
  • Frankel, S. Hello, Mr. Chips: PCs Learn English. Washington Post, April 29, 1990, p. D3.
  • Gottlieb, L. Margolis, C., and Schoenbaum, S. Clinical Practice Guidelines at an HMO: Development and Implementation in a Quality Improvement Model. Quality Review Bulletin 16:80-86, 1990. [PubMed: 2110358]
  • Institute of Medicine. Effects of Clinical Evaluation on the Diffusion of Medical Technology. Chapter 4 in Assessing Medical Technologies . Washington, D.C.: National Academy Press, 1985. [PubMed: 25032428]
  • Institute of Medicine. Acute Myocardial Infarction: Setting Priorities for Effectiveness Research . Washington, D.C.: National Academy Press, 1990. a. [PubMed: 25144072]
  • Institute of Medicine. Breast Cancer: Setting Priorities for Effectiveness Research . Washington, D.C.: National Academy Press, 1990. b. [PubMed: 25121282]
  • Institute of Medicine. Hip Fracture: Setting Priorities for Effectiveness Research . Washington, D.C.: National Academy Press, 1990. c. [PubMed: 25144033]
  • Institute of Medicine. Medicare: A Strategy for Quality Assurance, Lohr, K., editor. , ed. Washington, D.C.: National Academy Press, 1990. d.
  • Institute of Medicine. National Priorities for the Assessment of Clinical Conditions and Medical Technologies, Lara, M., editor; , and Goodman, C., editor. , eds. Washington, D.C.: National Academy Press, 1990. e. [PubMed: 25144085]
  • Institute of Medicine. Workshop to Improve Group Judgment for Medical Practice and Technology Assessment, Washington, D.C., May 15-16, 1990f.
  • Lomas, J. Words Without Action? The Production, Dissemination and Impact of Consensus Recommendations. Draft paper (dated May 1990) prepared for the Annual Review of Public Health, Vol. 12, Omenn, G., editor. , ed. Palo Alto, Calif., forthcoming.
  • Mulley, A. Presentation to the Workshop to Improve Group Judgment for Medical Practice and Technology Assessment, Washington, D.C., May 15, 1990.
  • National Research Council. Improving Risk Communication . Washington, D.C.: National Academy Press, 1989. [PubMed: 25032320]
  • Park, R., Fink, A., Brook, R., et al. Physician Ratings of Appropriate Indications for Six Medical and Surgical Procedures . R-3280-CWF/HF/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986. See also the same authors and same title in the American Journal of Public Health 76:766-772, 1986. [PMC free article: PMC1646864] [PubMed: 3521341]
  • U.S. Preventive Services Task Force. Guide to Clinical Preventive Services: An Assessment of the Effectiveness of 169 Interventions . Baltimore, Md.: Williams & Wilkins, 1989.



The list of works by Eddy, Gottlieb and associates, and Park and colleagues at the end of this chapter contains more detailed discussions of processes for developing guidelines.


This language generally follows the precedent set by the IOM report Medicare: A Strategy for Quality Assurance (1990d). It is also consistent with the booklet "Attributes to Guide the Development of Practice Parameters" (AMA, 1990a).


The committee discussed four types of validity: face validity, content validity, criterion validity, and construct validity. These concepts may have the following connotations when applied to practice guidelines. First, the content of guidelines and their development processes need to be plausible, at first pass, to practitioners—to have face validity. Second, content validity has to be assessed by reviewing the scientific evidence on which a set of guidelines are based—how much evidence there is, how clear it is, how directly it relates to the guidelines, how sound its methodology is. Third, for a prospective assessment of criterion validity, one judges whether the guidelines would be likely to produce predicted results when applied in the real world of health care delivery. Construct validity involves the fit of the guideline to broader scientific theories.


For example, childhood immunizations and other preventive services have traditionally been excluded from indemnity health plans because insurers believe it is actuarially unwise to cover smaller, more predictable expenses that their subscribers can budget. Competitive pressures from health maintenance organizations may sometimes offset these beliefs, but this may reflect marketing more than clinical considerations.


The committee discussed how two common methodological concepts, sensitivity and specificity, applied to practice guidelines. For medical review and other evaluation criteria, these two related terms are fairly straightforward. Sensitivity and specificity refer, respectively, to a high "true positive rate" in detecting deficient or inappropriate care and a high "true negative rate" in passing over cases of adequate care. The concepts can be operationalized by requiring some evidence, drawn, for example, from pretesting of the review criteria on "prototype" cases or through pilot-testing in a specific organization. As described in Chapter 10 of the Medicare quality report, case-finding screens have often been found to be deficient on these two attributes. The committee concluded that these attributes need to be considered for evaluation instruments but do not add anything to the assessment of practice guidelines.


One effort at replication has been undertaken by those involved with the RAND Corporation's work to develop appropriateness indicators (Chassin, 1990).


Clinical applicability and clinical flexibility could be grouped together as one attribute. Keeping them separate emphasizes the distinctions among the populations or settings that are covered by guidelines and those that are not so covered.


The term multidisciplinary is used broadly here rather than narrowly; it does not refer only to academic and professional disciplines.


The procedures of the National Academy of Sciences might serve as a model for the panel selection process. These procedures require that members of study committees submit bias statements and that an official of the Academy lead each committee through a member-by-member discussion of possible biases. Major funders of a study cannot be represented on a study committee, and every committee report must be reviewed by a panel of outside experts under the oversight of the National Research Council.

Copyright © National Academy of Sciences.
Bookshelf ID: NBK235752


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.4M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...