Results

Jennifer S Lin; Evelyn P Whitlock; Elizabeth Eckstrom; Rongwei Fu; Leslie A Perdue; Tracy L Beil; Rosanne M Leipzig

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Lin JS, Whitlock EP, Eckstrom E, et al. Challenges in Synthesizing and Interpreting the Evidence From a Systematic Review of Multifactorial Interventions to Prevent Functional Decline in Older Adults [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Oct. (Evidence Syntheses/Technology Assessments, No. 94.)

Cover of Challenges in Synthesizing and Interpreting the Evidence From a Systematic Review of Multifactorial Interventions to Prevent Functional Decline in Older Adults

Challenges in Synthesizing and Interpreting the Evidence From a Systematic Review of Multifactorial Interventions to Prevent Functional Decline in Older Adults [Internet].

Show details

Contents

< Prev Next >

2Results

We included 70 fair- to good-quality RCTs (n=40,917) published from 1984 to 2010 (Figure 1). The 70 included trials encompassed an enormous amount of clinical and methodological diversity. Although we summarize the results of our systematic review, we primarily focus on how the heterogeneity in available evidence limits the ability to interpret the evidence and present a simple framework on how to approach this heterogeneity (Table 1).

Table 1

Framework for Understanding Heterogeneity Across Trials, With Examples From Evaluating Multifactorial Assessment and Management Interventions in Older Adults.

Challenges in Understanding Population Risk and Complex Interventions

Our review encompassed a broad range of community-dwelling older adult populations and any outpatient multifactorial assessment and management interventions that could prevent functional decline or improve functional ability. The average age of trial participants ranged from 71 to 87 years. Twenty-four of the 70 trials included unselected or general-risk populations.¹⁴^–²⁹ While the majority of trials targeted older adults “at risk” for functional decline, high-risk designations were based on widely varying criteria: primary care physician identification as high risk,³⁰^–³² recently hospitalized,³³^–³⁵ recently in the emergency department,³⁶^,³⁷ recent fall or at increased fall risk,³⁸^–⁵¹ screened positive for risk for functional decline or hospitalization,⁵^,⁵²^–⁵⁹ high health care utilizers,⁶⁰^,⁶¹ low income,⁶² minimally care-assisted,⁶³^,⁶⁴ multiple chronic health conditions,⁶⁵ frail seniors,⁶⁶^–⁷¹ mild dementia,⁷² or other multifaceted approaches.⁷³^,⁷⁴ These populations represented a heterogeneous group of “at risk” older adults (Table 1).

Likewise, there was no consistent categorization scheme for multifactorial assessment and management interventions and intervention details were often lacking in published reports. In addition, we focused on outpatient interventions; however, these multifactorial assessment and management interventions exist on a continuum, from purely outpatient management to including management of transitions of care to managing both inpatient and outpatient care. Based on our inclusion criteria, we excluded interventions with an inpatient component that could have resulted in an artificial exclusion of interventions that were otherwise similar to those we did include. Most importantly, due to the broad inclusion criteria for outpatient multifactorial interventions, we encountered the problem of comparing effectiveness across trials evaluating a very heterogeneous group of interventions (i.e., different aims, personnel, settings, intensities, and comprehensiveness) (Table 1).

Of the included trials, only half explicitly sought to reduce or prevent functional decline, while other trials evaluated similarly structured interventions aimed at other purposes (e.g., to prevent falls, decrease health utilization, or manage chronic disease). About half were conducted in the United States, and the other half in countries with different health care systems and social services. Even within general types of interventions, there was sufficient variation in the assessment and management components of these interventions to potentially affect intervention success. For example, about two thirds of the trials had a one-time assessment; the other one third had repeated assessments that varied both in their assessment frequency and intervals between assessments. Assessments also varied substantially in how they were delivered (e.g., individual geriatric assessment by health care professional or self-administered questionnaires). The intensity and comprehensiveness of the management of identified risk factors ranged from a single contact to full management within a single multidisciplinary clinic. About three fourths of the trials evaluated interventions that provided active management of at least some of the risks/problems identified during assessment (as opposed to referring these patients to the primary care physician), half of which provided comprehensive management of all identified problems. Contacts could be in-person (clinic- or home-based) or by phone and could involve different personnel. About half of the included trials did not include geriatric expertise or specify if geriatric expertise was involved in the assessment or management of patients.

In order to synthesize the findings across the broad body of evidence, we attempted various approaches to group similar populations and interventions. To estimate the overall effectiveness of the multifactorial interventions by population risk, we attempted to apply a more standardized definition using risk factors or proxies for functional decline, including age, control group mortality, control group baseline ADL or IADL, and a composite measure of baseline frailty (age, self-rated health, and loss of one or more ADLs). However, only age and control group mortality rate were routinely reported across trials. We developed several categorization schemes, based on our assessment of the variation in key trial attributes, as well as groupings suggested by previous researchers,⁷^,³²^,⁷⁵^–⁸⁰ in order to synthesize and interpret the results (Appendix A). We performed stratified analyses and meta-regressions of groupings based on clinically relevant population and intervention characteristics that were reported in individual studies, including mean age of trial population, percent female population, baseline frailty of the population, baseline functional status of the population (ADL and IADL), control group mortality rate, type of intervention, applicability of trial to current U.S. setting, comprehensiveness of the management delivered following assessment, level of geriatric expertise included in the assessment and management, and intervention intensity as measured by the number and duration of assessment and management contacts with participants. Ultimately, however, we were unable to define truly cohesive bodies of literature, despite multiple categorization schemes based on multiple dimensions of the interventions, limiting the value of pooled analyses. After consultation with the USPSTF leads and the need for some estimation of the net benefit of these interventions, we used two basic dimensions to stratify our analyses: 1) the aim of the trial, because most trials with a primary purpose of preventing functional decline measured outcomes of functional ability, and 2) the country in which the trial was conducted, because trials conducted outside the United States were potentially less applicable to U.S. practice, given the large differences in health care delivery and social services, as well as the variability in standards of care for older adults across different countries.

Challenges in Conducting Outcome Analyses

We defined a set of important outcomes a priori, which included any measure of ADL or IADL (e.g., Katz, Barthel, and Lawton scales) and any measure of HRQL (e.g., 12- and 36-item Short-form Health Survey [SF-36] or EuroQol), in addition to falls, hospitalization, institutionalization, mortality, and serious adverse events. We did not include performance-based measures of function (e.g., gait speed, timed Get Up and Go test, Performance Oriented Mobility Assessment), as these were infrequently reported as an outcome (15 of 70 trials) and never specified as a primary outcome. Gait speed was the most commonly reported performance-based measure of function, but was reported as an outcome measure in only four trials.

Our first challenge in conducting and understanding our outcome analyses was inconsistent reporting of outcomes across trials (Table 1). Although mortality was reported in nearly all of the trials, death was reported as part of the CONSORT flow diagram rather than as an outcome measure. While most (51 of 70) trials reported some measure of ADL and/or IADL, nine trials did not mention the name of the instrument, and the remaining 43 trials used 20 different instruments. The three most commonly used instruments were the SF-36 physical functioning domain (ADL) in nine trials, the Barthel scale (ADL) in eight trials, and the Lawton scale (IADL) in five trials. Although multiple validated patient-reported instruments exist to measure ADL and IADL, scales show only weak and inconsistent relationships, and therefore no single scale has been accepted as the gold standard to measure functioning.⁸¹ Other outcomes were less commonly reported (Table 2): HRQL (21 trials), hospitalizations (21 trials), and institutionalizations (25 trials). As with functional ability, the 21 trials reporting HRQL outcomes used 11 different instruments, with the SF-36 being the most commonly used (eight trials). This variability in patient-reported outcome measures was further complicated by evidence of selective reporting of outcomes (i.e., trials included ADL as part of the assessment but did not report it as an outcome, individual domain scores of HRQL instruments were reported but not overall component scores) and by the inconsistency of reporting a set of outcomes at similar lengths of followup across trials (Tables 3 and 4). Thus, the studies addressing different outcomes represent different bodies of evidence and possibly reflect selective reporting bias.

Table 2

Number of Included Trials for Meta-Analyses by Outcome.

Table 3

Outcomes Reported for Interventions With a Primary Purpose of Preventing Functional Decline.

Table 4

Outcomes Reported for Interventions With a Primary Purpose Other Than Preventing Functional Decline.

Our second challenge was that our quantitative analyses could only include a subset of trials, due to variation in outcome measurement and limitations in reporting of ADL and IADL (Table 2). With expert consultation and audit of the ADL and IADL instruments used in the included trials, we determined that ADL and IADL measured different constructs and that even among different ADL instruments, measured constructs were not identical (Tables 3 and 4). Therefore, we conducted meta-analyses for ADL and IADL separately and only combined ADL and IADL measures as part of our sensitivity analyses. Due to limitations in how outcomes were reported (e.g., continuous versus dichotomous, change from baseline or only followup measurement), only a subset of studies could be included in each meta-analysis. We were also cautious not to pool short- and long-term outcomes given the wide range of followup (6 to 39 months) and the fact that treatment effects are often critically dependent on timing.⁸² After limiting our analyses to pooling outcomes at similar lengths of followup (i.e., intermediate [6 to 18 months] or long-term [24 to 39 months]), meta-analyses for patient-reported outcomes represented at most half of trials reporting this outcome measure (Table 2). Although HRQL measures were more comparable (in terms of measured construct), they were less frequently reported. In addition, limitations in how HRQL outcomes were reported at the individual study level (e.g., continuous versus dichotomous, domain scores versus component/overall scores, and heterogeneity in timing of outcome measurement) prevented meaningful pooling of these outcomes.

Challenges in Interpretation of Results

Although 28 of the 34 trials with a primary purpose of preventing functional decline reported ADL or IADL outcome measures, only 14 trials were included in the meta-analysis of ADL outcomes and 10 trials for IADL outcomes, and 17 trials for either ADL or IADL outcomes at 6 to 18 months (sensitivity analysis) (Table 5). Meta-analysis of change in ADL outcomes at 6 to 18 months shows small but statistically significant differences favoring intervention (SMD, 0.10 [95% CI, 0.04 to 0.17]; I²=0.0%) (Table 5, Figure 2). Pooling only U.S. trials showed a slightly higher point estimate of benefit. The meta-analysis for change in IADL outcomes at 6 to 18 months (Table 5, Figure 3) was consistent with ADL findings. These results, however, were not statistically significant (SMD, 0.10 [95% CI, −0.01 to 0.22]) and the statistical heterogeneity was much higher (I²=50.5%). Sensitivity analysis pooling trials reporting either ADL or IADL outcomes or using combined ADL/IADL outcome measures was also consistent, but still heterogeneous (SMD, 0.09 [95% CI, 0.01 to 0.16]; I²=42.3%) (Table 5, Figure 4). Trials that could not be pooled in the meta-analyses were generally consistent with pooled results in terms of direction of effect, although results from individual trials were often not statistically significant. Longer-term outcome analyses included far fewer studies (Table 2), but results were consistent with 6–18 month outcome analyses, showing small but statistically significant effect sizes (data not shown). We did not find evidence of publication bias based on the funnel plot and Egger’s test for any of the meta-analyses of functional ability in trials with a primary purpose of preventing functional decline.

Table 5

Pooled Effect Sizes for Various Outcomes at 6–18 Months Post Baseline.

Figure 2

Meta-Analysis of Activities of Daily Living at 12 Months for Interventions With the Primary Purpose of Preventing Functional Decline. Abbreviations: CI=confidence interval; N=number; SMD=standardized mean difference; US=United States.

Figure 3

Meta-Analysis of Instrumental Activities of Daily Living at 12 Months for Interventions With the Primary Purpose of Preventing Functional Decline. Abbreviations: CI=confidence interval; N=number; SMD=standardized mean difference; US=United States.

Figure 4

Meta-Analysis of Activities of Daily Living/Instrumental Activities of Daily Living, Instrumental Activities of Daily Living, or Activities of Daily Living at 12 Months for Interventions With the Primary Purpose of Preventing Functional Decline. Abbreviations: (more...)

The trials included in the ADL and IADL meta-analyses had minimal overlap with studies that reported hospitalizations and institutionalizations (Tables 3 and 4) and therefore represent essentially different bodies of evidence. Meta-analyses for hospitalizations (n=7,168; 16 trials) and institutionalizations (n=6,973; 19 trials) showed no detectable effect from multifactorial assessment and management interventions (Table 5, Figures 5 and 6). Overall, event rates were low, particularly for institutionalizations (Figures 5 and 6). Finally, but not surprisingly, since trials were generally not powered to detect a reduction in mortality, pooled results (1,475 deaths; n=28,891) showed no significant reduction in mortality at 12 months (RR, 0.91 [95% CI, 0.82 to 1.00]; I²=0.0%) (Table 5, Figure 7).

Figure 5

Meta-Analysis of Hospitalizations at 12 Months. ^* 1=U.S. setting; 0=non-U.S. setting. Abbreviations: CI=confidence interval; NR=not reported; RR=relative risk; US=United States.

Figure 6

Meta-Analysis of Institutionalizations at 12 Months. ^* 1=U.S. setting; 0=non-U.S. setting. Abbreviations: CI=confidence interval; RR=relative risk; US=United States.

Figure 7

Meta-Analysis of Mortality at 12 Months. ^* 1=U.S. setting; 0=non-U.S. setting. Abbreviations: CI=confidence interval; RR=relative risk; US=United States.

Restricting analyses to similar-risk populations and/or more similar interventions substantially limited the number of trials included in the analyses without significantly affecting pooled results or reducing statistical heterogeneity. For example, of the 17 trials included in the ADL/IADL meta-analyses, only four trials evaluated comprehensive multifactorial assessment and management interventions in older adults at risk for functional decline,³¹^,³³^,³⁷^,⁵³ and only three trials evaluated less comprehensive interventions in unselected older adults.¹⁵^,²⁶^,²⁹ Likewise, only six trials of comprehensive interventions in at risk adults³⁶^,³⁹^,⁴⁰^,⁵⁶^,⁶⁰^,⁶⁶ and two trials of less comprehensive interventions in unselected adults¹⁹^,²⁶ are included in the meta-analyses for hospitalization outcomes (total 16 trials).

There were numerous challenges in interpreting the clinical significance of the small but statistically significant average changes in patient-reported ADL and IADL. We calculated pooled SMDs using Hedges’ g statistic to quantitatively synthesize functional limitations across many different measurement instruments that were primarily reported as a continuous outcome. Overall, we found a SMD of 0.09 [95% CI, 0.01 to 0.16] for changes in functional ability (ADL or IADL). An effect size of 0.2 to 0.3 represents a small effect, 0.5 a moderate effect, and 0.8 a large effect.⁸³ Thus, these findings represent a small to very small magnitude of effect, even when considering the upper limit suggested by the 95 percent confidence interval. We looked at individual trials whose SMD was similar to the pooled SMD to understand the clinical significance of this change and examined the precise change in score for those trials. For ADLs, four trials had similar effect sizes.⁵^,³³^,³⁷^,⁵⁵ In these trials, the change in score was approximately a 1- to 2-point improvement in the SF-36 physical functioning score (100-point scale),³⁷^,⁵⁵ or approximately a 0.2-score improvement on the Katz ADL scale (6-point scale).³³ For IADLs, five trials had similar effect sizes.²⁶^,³³^,³⁷^,⁵⁴^,⁶⁴ In one trial, the change was as high as a 9-point improvement on the SF-36 physical functioning score (100-point scale);⁵⁴ however, it was much lower in two other studies: a 0.4-point improvement on the Older American Resources and Services scale (14-point IADL scale),³⁷ or about a 0.8-point improvement on the Lawton and Brody scale (23-point scale).³³ On the basis of this approach, we concluded that overall there would be a very small clinical benefit (at best) to these interventions at a population level. Although there has been a growing body of literature using anchor-based minimally important differences (MIDs) to interpret the clinical significance of patient-reported outcomes,⁸⁴^,⁸⁵ we could not identify established MIDs for these commonly used ADL or IADL instruments. In fact, we identified only one study that established the MID for improvement using the Barthel Index (20-point scale) in stroke patients.⁸⁶ Using a generic threshold of 0.5 on a 7-point scale as a MID, these ADL and IADL changes would not be considered clinically significant.⁸⁵

These findings should be interpreted with caution. Our meta-analyses suggesting a small or null finding does not mean that the multifactorial interventions studied are ineffective. First, the ADL and IADL instruments used have important limitations in their measurement properties. The ADL and IADL instruments most commonly used in included studies are not always responsive to clinically important changes in community-dwelling older adults. The Barthel Index (ADL), for example, was developed in institutionalized adults and thus is not necessarily appropriate for use in other populations.⁸⁷ Even in populations for which it was designed, the Barthel Index has been shown to have floor and ceiling effects.⁸⁸^,⁸⁹ The Lawton scale (IADL) has weak reliability, validity, and responsiveness.⁹⁰ The overwhelming majority of trials did not report the rationale guiding their selection of ADL or IADL instrument or the validity of the chosen instrument for the population studied.

Further, these average effects likely reflect a mixture of substantial benefits for some older adults and no benefit for many older adults. This heterogeneity of treatment effects is reflected by the individual trials’ relatively large standard deviations in change in functional ability.⁹¹ One major source of this heterogeneity is likely from the different baseline risk (or prognosis) of populations studied. Older adults, compared with middle-aged adults, have more variability in their health trajectories,⁹² such that people with a similar baseline health status may decline at markedly different rates.⁹³ Trials infrequently reported the control group health trajectory (e.g., baseline ADL and IADL at followup), which might serve as a surrogate for between-trial differences in populations. In a subset of trials that did report this data, about one third of these trials showed no decline in the control group’s mean ADL or IADL, despite selecting for trial participants at risk for functional decline. The stable trajectory of ADL or IADL could indicate that the trial participants did not have any functional decline or that the measures used were not responsive to changes in functional ability. Inconsistent (or lack of) reporting of patient risk and use of mean differences without subgroup exploration at the individual study level (i.e., persons who improved or maintained their level of function versus those who declined in function) made it impossible to comment on potentially important differential effects by subgroups (with differing risk). Additionally, the majority of trials did not report dichotomous or categorical outcomes, which would have allowed an estimation of the proportion that might benefit more substantially from these interventions.

Finally, because of a relative lack of reporting about harms (or a constellation of outcomes), we were unable to ascertain the net benefit of these multifactorial assessment and management interventions in older adults. Very few trials reported or hypothesized on the harms of these interventions, other than falls. Individual trials may not have been sufficiently powered to detect harms with low event rates, although pooled analyses showed no evidence of paradoxical harms (e.g., increased falls, disability, hospitalizations, institutionalizations, or decreased quality of life). The possibility of unintended harms, however, cannot be fully understood given the inconsistent and incomplete outcome reporting. Increased hospitalizations, for example, may not necessarily represent a true harm if it prevents functional decline or institutionalization. For example, in one study (n=539), persons who were randomized to the intervention had increased hospitalizations (not statistically significant) but decreased institutionalizations.²⁴ Increased falls or fallers (not leading to serious injury) was reported in a few trials (not statistically significant), but may have been from increased physical activity resulting in improved quality of life (not reported as an outcome in those studies).

Bookshelf ID: NBK114223

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Lin JS, Whitlock EP, Eckstrom E, et al. Challenges in Synthesizing and Interpreting the Evidence From a Systematic Review of Multifactorial Interventions to Prevent Functional Decline in Older Adults [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Oct. (Evidence Syntheses/Technology Assessments, No. 94.) 2, Results.
PDF version of this title (1.7M)