Hickam DH, Weiss JW, Guise JM, et al. Outpatient Case Management for Adults With Medical Illness and Complex Care Needs [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan.

This publication is provided for historical reference only and the information may be out of date.

Outpatient Case Management for Adults With Medical Illness and Complex Care Needs [Internet].

Topic Development and Refinement

The original topic nomination was submitted to the Agency for Healthcare Research and Quality (AHRQ) by a member of the general public. It proposed a comparative effectiveness review of case management (CM) (performed by certified nurse case managers) for improving utilization and costs of health services. The original nomination specified a broad population of interest (“all patients”) and did not further specify the outcomes of interest. Because a literature scan identified diverse populations, interventions, and outcomes, the nomination was further scoped during topic refinement to produce more specific Key Questions.

During a topic refinement phase, the scope of the project was refined with input from a panel of Key Informants. Key Informants included representatives of public organizations and societies with an interest in CM, individuals who have performed CM research, experts on the chronic care model, and practicing case managers. This input led to revision of the Key Questions, which were posted for public comments. A Technical Expert Panel (TEP) was then formed to review Key Questions, identify important issues, and define parameters for the review of evidence. The TEP also reviewed the research protocol, which is posted on the AHRQ Web site ( Discussions among the project investigators, TOO, Key Informants, and the TEP occurred during a series of teleconferences and via email. In addition, input from the TEP was sought during compilation of the report when questions arose about the scope of the review.

The AHRQ Task Order Officer (TOO) was involved throughout this project. The TOO facilitated a common understanding among all parties involved in the project, resolved ambiguities, and advised on the scope and processes of the project. The TOO and other staff at AHRQ reviewed the report for consistency, clarity, and to ensure that it conformed to AHRQ standards. AHRQ staff did not participate in the literature search, data analysis, or interpretation of the results.

Three Key Questions are addressed in the present report. One pertains to outcomes in patients and caregivers who receive services from case managers (Key Question 1), one addresses associations between patient factors and the results of CM (Key Question 2), and one addresses comparison among different types and models of CM (Key Question 3).

Search Strategy

To identify articles relevant to each Key Question, we worked with medical librarians who have extensive experience with conducting literature searches for comparative effectiveness reviews. We searched MEDLINE® (Ovid), CINAHL® (EBSCO), the Cochrane Central Register of Controlled Trials (Ovid EBM Reviews), the Cochrane Database of Systematic Reviews (Ovid EBM Reviews), and the Database of Abstracts of Reviews of Effects (Ovid EBM Reviews). We searched by broad level subject terms and keywords. The search was limited to English language materials and adult populations (see Appendix B for search strings and time spans searched). The choice of specific terms used in the search strings was guided by the attempt to distinguish among CM as defined for this report and the multiple other types of nursing-based and focused disease management interventions. The database searches included the time period through August, 2011. Retrieved citations were imported into an electronic database, EndNote® X3, for deduplication and tracking.

Other approaches were also used to identify evidence about CM for complex chronic illness care. Additional studies were identified by reviewing the reference lists of published clinical trials and review articles that addressed CM. Gray literature searches included clinical trial registries:, Current Controlled Trials, Clinical Trial Results, and WHO Trial Registries.

Study Selection

We developed criteria for inclusion and exclusion of studies based on the Key Questions and the populations, interventions, comparators, outcomes, timing, and setting (PICOTS) approach (see Appendix C). To reduce bias and enhance consistency in our study selection process, we initially had three reviewers review 100 citations for inclusion and calculated kappa values to estimate inter-reviewer reliability. After discussing and reconciling disagreements between reviewers, the same three team members reviewed an additional 100 citations. We continued this process until the kappa values reached >0.50 for each pair of reviewers. Two reviewers then reviewed each title and abstract for inclusion and exclusion, using our pre-established inclusion/exclusion criteria to determine potential eligibility for inclusion in the evidence synthesis. All citations judged to be possibly included by one or both of the reviewers were retrieved as full-text articles.

Each full-text article was reviewed independently by two team members using pre-established criteria for inclusion. If there was consensus between the two, then the article was either included or excluded. In cases of disagreement, a senior investigator reviewed the article and made the decision on inclusion and exclusion. A data file of excluded studies with reasons for exclusion was maintained (Appendix D).

After the draft report was posted for public comment, the search was updated to capture any new publications. Literature identified during the updated search went through the same process of dual review as all other studies considered for inclusion in the report. All new studies identified by this process as meeting the established criteria for inclusion are incorporated in the final report.

PICOTS Framework

Populations of Interest

This review focuses on adults with medical illness and complex care needs in the outpatient setting. A main criterion in choosing studies for inclusion was the existence of complex care needs. Complex care needs was defined broadly and we included studies with case definitions based on health care resource utilization, patient health outcomes, and/or multifactor assessments that include measures such as socioeconomic status or patient self-efficacy. Appendix E provides examples of similar definitions of complex care needs from a variety of organizations. The included studies sometimes addressed populations in which psychiatric problems, such as depression or dementia, were important comorbid conditions.

The population of interest included all adults with medical illness and complex care needs. To identify the broadest sample of literature relevant to CM for such patients, we did not want to limit the results of the literature search to any particular disease condition or conditions. Our search was designed to include all subpopulations with any medical illness and complex care needs for whom CM had been studied. However, we excluded studies in which the primary clinical problem was a psychiatric disorder (other than dementia) and in which CM was used primarily to manage mental illness or a substance abuse disorder.


The definition of CM used to make decisions about inclusion/exclusion is described in detail in the Introduction section of this report. We define CM as a process in which a person (alone or in conjunction with a team) manages multiple aspects of a patient's care. Key components of CM include planning and assessment, coordination of services, patient education, and clinical monitoring.


In most studies, CM is compared with usual care (i.e., care without a CM component). Usual care can be quite variable across studies and generally consisted of the array of services generally available to the population studied. When a study compared two or more different types of CM, then the comparator was the alternative type of CM. However, in most cases the comparator was the same milieu of clinical services without a distinct CM component. For clinical trials and other studies having a comparison group, we specifically examined the study's reports for information about contamination (provision of CM or other care coordination services to the control group).

Outcomes of Interest

The outcomes of interests are specified in the Key Questions, as follows:

  1. Patient-centered outcomes, including mortality, quality of life (QOL), disease-specific health outcomes, avoidance of nursing home placement, and patient satisfaction with care.
  2. Quality of care, as indicated by disease-specific process measures, receipt of recommended health care services, adherence to therapy, missed appointments, patient self-management, and changes in health behavior.
  3. Resource utilization, including overall financial cost, hospitalization rates, days in the hospital, emergency department use, and number of clinic visits (including primary care and other provider visits).

These categories were derived from the set of outcomes specified in the published evaluations of CM programs. Clinical programs that utilize CM address the needs of defined patient populations and have discrete clinical goals. These three categories of outcomes reflect the clinical goals of CM programs. In some cases certain reported outcomes were not used for this report if the methods used for the measurement were judged inadequate or were not consistent across studies.

Comparative effectiveness reviews commonly classify outcomes as either benefits or harms. The CM literature has not classified harms of CM. Thus, the outcomes listed above are not classified as either benefits or harms.


Longitudinal engagement with patients was a criterion for study inclusion. We excluded studies that provided CM for only short durations (30 days or less). This criterion excluded many studies that evaluated short-term posthospitalization programs (often termed “transitional care” programs). Such programs fall into a large category of inpatient discharge planning activities that are beyond the scope of this review.


We included only studies in the outpatient setting, including primary care, specialty care, and home care settings. No geographic limitations were applied.

Types of Studies

We included trials and observational studies pertinent to the Key Questions. We retrieved and evaluated for inclusion and exclusion any randomized trial. We also included studies using nonexperimental designs, including cohort, case-control and pre/post designs. Previously published systematic reviews were not included as primary evidence. However, systematic reviews that used definitions of CM that were consistent with that used in this project were used to identify any additional primary studies that had not previously been identified. Inclusion and exclusion criteria are detailed in Appendix C.

Analytic Framework

We developed an analytic framework (Figure 2) that specifies the relationships between the interventions and outcomes. This analytic framework depicts the chain of logic for using evidence to answer the Key Questions.

This figure depicts the analytic framework that outlines the target population, interventions, and outcomes considered in the review. The target population includes adults with medical illness and complex care needs in outpatient settings. The intervention is case management. Patient health outcomes include overall quality of care, disease-specific quality of care, quality of life, patient satisfaction, morbidity and mortality. Resource utilization outcomes include overall cost, hospitalization rates and emergency department use. Quality of care process measure outcomes include adherence to therapy, missed appointments, patient self-management, change in health behavior, disease-specific processes of care, and physician or case manager satisfaction. The breakdown of included articles by condition are: cancer, 8 articles or 6 studies; congestive heart failure 12 articles; diabetes, 24 articles or 12 studies; dementia 26 articles or 15 studies; serious chronic infections 17 articles or 15 studies; frail elderly 17 articles or 14 studies; chronic disease in older adults, 28 articles or 18 studies; and other, 17 articles or 13 studies.

Figure 2

Analytic framework. Note: Numbers refer to Key Questions

Data Extraction and Data Management

After studies were selected for inclusion based on the Key Questions and PICOTS, the following data were abstracted and used to assess applicability and quality of the study: study design; inclusion and exclusion criteria; population and clinical characteristics (including sex, age, ethnicity, primary disease, comorbidities, complex care needs, and insurance carrier); CM intervention characteristics (including case manager professional identification and prior training); preintervention training for case managers; caseload and the nature of care provided by the intervention (e.g., patient education, coordination of services, medication monitoring, and adjustment); and results for each outcome, focusing on the outcomes of interest (patient-centered, resource utilization, and process of care outcomes). We also recorded the number of patients randomized relative to the number of patients enrolled, how similar those patients were to the target population, and the funding source. We recorded intent-to-treat results when available. These data are presented in the evidence tables (see Appendix I). All data abstracted from included studies were verified for accuracy and completeness by a second team member.

Quality Assessment of Individual Studies

We assessed the quality of randomized trials and cohort and case control studies based on the predefined criteria listed in Appendix F. We also adapted criteria from methods proposed by Downs and Black22, 23 (observational studies) and methods developed by the U.S. Preventive Services Task force.24 The criteria used are consistent with the approach recommended by AHRQ in the Methods Guide for Comparative Effectiveness Reviews.25 We used the term “quality” rather than the alternate term “risk of bias”; both refer to internal validity.

We rated the quality of each controlled trial based on the methods described in the published reports about randomization, allocation concealment, and blinding; the similarity of compared groups at baseline; maintenance of comparable groups; adequate reporting of dropouts, attrition, crossover, adherence, and contamination; loss to followup; the use of intention-to-treat analysis; and ascertainment of outcomes.23

Individual studies were rated as “good,” “fair,” or “poor” (see Appendix G). Studies rated “good” have the least risk of bias, and results are considered valid. Good-quality studies include clear descriptions of the population, setting, interventions, and comparison groups; a valid method for allocation of patients to treatment; low dropout rates and clear reporting of dropouts; appropriate means for preventing bias; and appropriate measurement of outcomes.

Studies rated “fair” are susceptible to some bias, but it is not sufficient to invalidate the results. These studies do not meet all the criteria for a rating of good quality, but no flaw is likely to cause major bias. The study may be missing information, making it difficult to assess limitations and potential problems. The “fair” quality category is broad, and studies with this rating vary in their strengths and weaknesses: the results of some fair quality studies are likely to be valid, while others are only probably valid.

Studies rated “poor” have significant flaws that imply biases of various types that may invalidate the results. They have a serious or “fatal” flaw in design, analysis, or reporting; large amounts of missing information; discrepancies in reporting; or serious problems in the delivery of the intervention. The results of these studies are at least as likely to reflect flaws in the study design as they are to reflect the true differences between the interventions that were compared. We did not exclude studies rated poor quality a priori, but poor quality studies were considered to be less valid than higher-quality studies when synthesizing the evidence, particularly when discrepancies between studies were present.


Applicability is an indicator of the extent to which research included in a review might be useful for informing clinical and/or policy decisions. Applicability depends on the particular question and the needs of the user of the review. Because it depends on context, there is no generally accepted universal rating system for applicability. We based our approach on the guidance described by Atkins et al.23, 26 to assess applicability of the evidence for the Key Questions addressed in this review. We describe features of the included studies that are relevant to applicability in terms of the elements of PICOTS. We considered the specific clinical and policy questions for CM interventions. For example, CM interventions are often tailored specifically to the needs of particular patient populations making results only pertinent to those populations (e.g., HIV positive, dementia, diabetes, etc); for this reason we provide detailed results by specific patient populations. This choice to describe results according to condition offers greater clarity on applicability of the results and avoids over-generalization of the results of case management interventions for specific conditions to all cases of CM. Additionally, factors about the intervention of CM itself may influence applicability. For example the intensity of the intervention may not be feasible across settings. Therefore, these factors are described within each section when possible.

Data Synthesis

CM has been studied in a large range of clinical settings and for diverse patient groups. Many CM programs target individuals with particular diseases or clinical needs, and the programs are tailored for those patient needs. Because of the broad range of models of CM, we grouped the studies by the population groups and the clinical problems that were chiefly addressed. For the majority of studies, these groupings were based on particular diagnoses (such as congestive heart failure, diabetes, or dementia). There also were studies on programs that addressed the needs of older adults that generally fell into one of two groups—older adults with multiple chronic conditions or the frail elderly. We reviewed the findings of the studies for each of these categories and then assessed overall findings (across population groups), as related to the project's Key Questions. For all outcomes the amount of heterogeneity among the individual studies precluded formal meta-analyses.

Grading the Body of Evidence for Each Key Question

The strength of evidence for each Key Question was initially assessed for the outcomes applicable to each patient category. We used the approach described by Owens et al.27 to evaluate the body of evidence for each outcome in each patient category. This approach uses the following categories:

  • Quality (good, fair, poor)
  • Consistency (consistent, inconsistent, unknown)
  • Directness (direct or indirect)
  • Precision (precise, imprecise)

Without formal pooled analyses, we were not able to assess publication bias. The strength of evidence was assigned an overall grade of High, Moderate, Low, or Insufficient according to a four-level scale:

  • High—High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. When the conclusion is that the intervention (in this case, CM) does not have a significant effect on an outcome, the sample size and statistical power of the existing studies are high enough to warrant confidence in the stated conclusion.
  • Moderate—Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. In the case of negative results, the statistical power of existing studies may be only modest, and the conclusion could be changed by a new study examining a substantially larger patient population.
  • Low—Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate.
  • Insufficient—Evidence either is unavailable or does not permit estimation of effect. This includes situations in which the results of multiple studies are highly heterogeneous.

Because the published studies often examined specific patient populations, the content of the CM interventions generally were tailored to the clinical problems of those patient groups. Thus, there is a considerable diversity of programs. Comparisons across programs and populations need to account both for differences in the populations and differences in the content of the CM programs.

A wide variety of outcomes were included in these studies. After reviewing all of the studies, we categorized the outcomes according to the three parts of Key Question 1. In some cases the patient-centered outcomes were unique to the type of CM programs used for particular patient populations. The following outcomes were evaluated for strength of evidence:

Key Question 1a: Patient-Centered Outcomes

  • Multiple populations
    • Mortality
    • Quality of life (QOL)
    • Functional status
    • Patient satisfaction
  • Frail elderly
    • Nursing home admissions
  • Dementia
    • Ability to remain at home (time to nursing home placement)
    • Caregiver depression and strain (burden)
  • Cancer
    • Symptoms caused by cancer
    • Depression
  • Diabetes
    • Glucose management
    • Cholesterol control
    • Body weight

Key Question 1b: Quality of Care

  • Multiple populations
    • Receipt of guideline-recommended clinical services
    • Patient self-management behaviors
    • Medication adherence
    • Missed appointments
    • Patient perception of care coordination

Key Question 1c: Resource Utilization

  • Multiple populations
    • Hospitalization rates
    • Emergency department (ED) visits
    • Appointments with primary care and specialty providers
    • Overall expenditures

Key Question 2: Variation due to Patient Characteristics

  • Multiple populations
    • Variation among racial/ethnic groups
    • Variation among socioeconomic groups
    • Variation attributable to social support

Key Question 3: Variation due to Intervention Characteristics

  • Multiple populations
    • Variation due to intensity of CM
    • Variation due to duration of CM
    • Variation due to training and supervision of case managers
    • Variation due to integration with other clinical programs

In describing the available evidence about the effects of CM programs on these outcomes, we first summarize the evidence for the three Key Questions. We then provide detailed descriptions of the evidence for the patient populations that fell within this report's scope. In the detailed descriptions provided later in this report, specific citations to individual studies are included. Table 17 (see the Conclusions section) provides the specific evidence statements (with strength of evidence for each) upon which the general summary statements are based. The strength of evidence tables appear in Appendix H.

Peer Review and Public Commentary

Peer review was provided by experts in chronic illness care and CM; representatives of AHRQ and an associate editor also provided comments. The draft report also was posted on AHRQ's Effective Health Care (EHC) Web site for 4 weeks to elicit public comments. We addressed all reviewer comments, revising the text as appropriate, and summarized changes to the report in a disposition of comments document that will be made available 3 months after the final CER is posted on the EHC Web site.


