NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Fordham B, Sugavanam T, Edwards K, et al. Cognitive–behavioural therapy for a variety of conditions: an overview of systematic reviews and panoramic meta-analysis. Southampton (UK): NIHR Journals Library; 2021 Feb. (Health Technology Assessment, No. 25.9.)
Cognitive–behavioural therapy for a variety of conditions: an overview of systematic reviews and panoramic meta-analysis.
Show detailsThe map of the CBT review evidence base included 494 reviews. Of these, 171 reviews included data suitable for inclusion in the PMAs. The majority of the reviews reported in the mapping exercise, but excluded from the PMA, were not suitable because we could not extract the CBT RCT-specific data in isolation for any of the four outcomes (n = 279). This could be for any one of the following reasons:
- The review may not have performed a meta-analysis or reported any quantitative data from single RCTs.
- The review may have looked for RCTs reporting on the outcome but not identified any evidence.
- We may have been unable to extract CBT RCT data in isolation, for example a review that presented a subgroup analysis of 10 CBT trials, but one of these trials was not a RCT. We could not isolate the purely CBT RCT evidence; therefore, the data were not included in the PMA. To have included these RCTs, we would have needed to return to the original RCT and perform a new meta-analysis including only the RCT data.
Of the 126 reviews eligible for the end-point PMA, 7132,37,39,46,50,59,63,68,89,102,117,126,134,143,149,165,167–169,175,188,193,197–199,205,206,211,219–221,227,231,234–236,246,249,251,259,261,267,275,286,291,299,315,317,340,343,347,357,369,371,373,397,401,405,409,432,445,446,448,450,454,464,469,480,484,507,518 were higher-quality reviews (i.e. ‘moderate’ or ‘high’ on AMSTAR-2); the primary analyses for each outcome were conducted using these 71 higher-quality reviews. The PRISMA flow diagram describing review selection for the PMAs from the mapping stage is presented in Figure 5.

FIGURE 5
The PRISMA diagram from mapping to PMAs.
For reviews reporting data as change scores or dichotomous outcomes (RR, OR), separate PMAs for each outcome were undertaken; these are presented in Appendices 9–12.
Health-related quality of life
We identified 24 higher-quality systematic reviews32,39,63,165,188,193,219,220,231,235,236,275,286,299,317,343,347,371,409,445,446,464,469,518 that met the eligibility criteria to be included in the HRQoL PMA. One review343 included two meta-analyses for different disorders; hence, the number of comparisons is 25. These reviews included 49 RCTs, 4304 participants and represent 12 out of 40 ICD-11 categories (30%), as presented in Box 4. The white rows represent those ICD-11 codes that are represented in the primary analysis, the purple rows represent those conditions that are represented in the sensitivity analyses (see Sensitivity analysis) and the orange rows represent those ICD-11 codes that are not represented.
BOX 4
The ICD-11 categories (not) represented in the primary PMA of higher-quality reviews for the HRQoL outcome
The most commonly used measure of HRQoL was the Short Form questionnaire-36 items (SF-36) (n = 6). Other measurements included the Quality of Life Inventory (n = 5), the EuroQol-5 Dimensions (n = 2), the WHO Quality of Life-BREF (n = 1), the Global Assessment of Functioning (n = 1) and the Modular System for Quality of Life-54 (n = 1). The remaining reviews used population-specific (e.g. KIDSCREEN-27; KINDL-R; Comprehensive Quality of Life Scale, Intellectual Disability) or condition-specific [e.g. Fibromyalgia Impact Questionnaire, ADHD (attention deficit hyperactivity disorder) Impact Module-Adult™, Quality of Life in Alzheimer’s Disease, Diabetes Quality of Life for Youths] quality-of-life measurements.
Some reviews included trials with mixed characteristics, for example one review could include trials with adults and with older adults; in such a case, we would record that there was one review with adult data and one with older adult data. Consequently, the counts do not always add to 24 reviews. The mapping results demonstrate that the majority of these meta-analyses were focused on adults (n = 21), with only three reviews of children/adolescents and one review of older people. A higher number of reviews (12/24) had samples that included more female than male participants than reviews with samples of more male than female participants (6/24). Only three reviews reported the ethnicity of their samples: two reviews had samples with < 25% non-white participants and one included a sample with > 75% non-white participants.
The majority of reviews reported on the management of clinical conditions (16/24), through high-intensity CBT (17/24), delivered in outpatient settings (16/24), and with short-term follow-up (19/24). Seven reviews shared these three contexts but were conducted across different conditions. The majority of the included RCTs were from Europe, North America and Australasia (21/24).
The number of reviews containing only one trial was 6 out of 25; for some conditions, the numbers in each trial were very small (Figure 6). Comparators were active (8/24), mixed (3/24) and non-active (13/24).
Primary analysis
Within-condition heterogeneity (I2) varied between 0% and 56%, and across-condition heterogeneity was 32%; hence, the criteria for PMA were met. The pooled across-condition SMD between control groups and CBT intervention groups gave a modest effect in favour of CBT on outcomes of HRQoL (SMD 0.23, 95% CI 0.14 to 0.33) (see Figure 6). Variation in effects was observed across conditions; for example, in aggression, the estimate mean effect was almost zero, although it was estimated with considerable uncertainty (SMD –0.02, 95% CI –0.28 to 0.32), whereas, in anxiety disorders, the estimated effect was positive and was estimated with much greater certainty (SMD 0.42, 95% CI 0.20 to 0.64). This heterogeneity is reflected in the resulting prediction intervals, which, indicated for the overall effect (within any given condition), were between –0.03 and 0.50, indicating, at worst (and with little support in the prediction interval), a small negative effect of CBT for some conditions and, at best, a large positive effect for other conditions.
No publication bias was detected using funnel plots (Figure 7) and Egger’s test showed that there were no small-study effects (p = 0.18).

FIGURE 7
The HRQoL funnel plot with pseudo-95% confidence limits (end-point data from high-quality reviews).
Mean difference in health-related quality of life
We identified a standard deviation (10.93 points) of the SF-36 physical composite score from a trial,525 deemed to have a low risk of bias, in a higher-quality review.464 The SMD translated to an estimated mean difference on the SF-36 of 3 points (95% CI 2 to 4 points).
Subgroup analysis
The only interaction test that was statistically significant was between reviews of CBT compared with active comparators and reviews of CBT compared with non-active comparators. All other subgroup interaction tests were not statistically significant and are, therefore, consistent with the general effect of CBT on HRQoL outcomes.
Cognitive–behavioural therapy intensity
Overall, high- and low-intensity CBT reviews were distributed evenly across the different conditions and characteristics. High- and low-intensity CBT reviews both included populations diagnosed with 6B00-06 anxiety or fear-related disorders, 6A60-80 mood disorders, 6A00-06 neurodevelopmental disorders, 6B20-25 obsessive–compulsive disorders, 21 pain and 6B40-45 disorders specifically associated with stress. They included patients with chronic symptoms (6A60-80 mood disorders and 21 pain). The reviews included children, adolescents and adults, of both sexes, from all care settings in Europe, North America, Australasia and Asia. Reviews of both intensities included long-term follow-up data. Reviews of high-intensity, but not low-intensity, CBT included (1) populations diagnosed with 6C20-21 disorders of bodily distress, 08 diseases of the nervous system or 6A20-25 schizophrenia or other primary psychotic disorders; (2) older adults; and (3) CBT delivered in a preventative context. Reviews of low-intensity, but not high-intensity, CBT were conducted in populations diagnosed with 6C40-4H addiction and MB23 aggressive behaviour.
There was little difference between effect estimates in reviews of high-intensity and low-intensity CBT, although heterogeneity was substantially higher for low-intensity CBT (SMD 0.23, 95% CI 0.03 to 0.42; I2 = 68%) than for high-intensity CBT (SMD 0.21, 95% CI 0.11 to 0.32; I2 = 0%) (Figure 8). The interaction test between high- and low-intensity CBT reviews was not statistically significant (p = 0.99).

FIGURE 8
The HRQoL subgroup analysis (end-point data from higher-quality reviews): CBT intensity. Note that two reviews, that combined high- and low-intensity CBT are not included here. GAD, generalised anxiety disorder.
We identified three reviews318,343,521 (four RCTs, 243 participants; two reviews of lower quality and one review of higher quality) that directly compared high- with low-intensity CBT interventions on HRQoL outcomes in 6B00-06 anxiety or fear-related disorders and 6A60-80 mood disorders. One review provided separate data for both 6B01 panic and 6B04 social anxiety disorder populations, and so the PMA included four meta-analyses. In this subset of direct comparisons, there was no difference between high- and low-intensity CBT (SMD 0.15, 95% CI –0.10 to 0.40; I2 = 0%) (Figure 9).

FIGURE 9
Health-related quality of life: high- vs. low-intensity CBT, direct comparison PMA. SAD, seasonal affective disorder.
The direct evidence (see Figure 9) comparing high- with low-intensity CBT in 6B00-06: Anxiety and 6A60-80: Mood disorders supports our indirect evidence (see Figure 8) from subgroup analyses of high and low intensity. In summary, we have found no direct or indirect evidence that high- or low-intensity CBT produced different effect sizes.
Type of comparators
The choice of comparator had a significant effect on the treatment estimates. Comparison to an active intervention was associated with a very small effect (SMD 0.09, 95% CI –0.01 to 0.19; I2 = 0%) (Figure 10). The active comparators tested in these reviews were education, exercise, pharmacotherapy, physiotherapy, psychotherapy/counselling and relaxation. Comparison with a non-active control was associated with a larger effect estimate (SMD 0.31, 95% CI 0.18 to 0.45; I2 = 40%). The interaction test was statistically significant (p = 0.04).

FIGURE 10
The HRQoL subgroup analysis (end-point data from high-quality reviews): type of comparators. Note that three reviews,, with mixed active and non-active comparators are not included here. GAD, generalised anxiety disorder.
Duration of follow-up
Effect estimates were higher in reviews reporting short-term follow-up (SMD 0.29, 95% CI 0.17 to 0.42; I2 = 30%) than in reviews reporting long-term follow-up (SMD 0.11, 95% CI 0.02 to 0.20; I2 = 0%) (Figure 11). However, the interaction test did not find a statistically significant difference between the groups (p = 0.06).

FIGURE 11
The HRQoL subgroup analysis (end-point data from high-quality reviews): duration of follow-up. Note that one review with combined short- and long-term follow-up is not included here. GAD, generalised anxiety disorder.
Age
Effect estimates were similar in reviews of children and adolescents (SMD 0.20, 95% CI –0.15 to 0.56; I2 = 0%) and adults (SMD 0.23, 95% CI 0.14 to 0.33; I2 = 39%) (Figure 12). However, the sample sizes were much smaller in the reviews of children and adolescents, and the consequent CIs crossed zero. The interaction test did not find a statistically significant difference between the children/adolescents and adult groups (p = 0.06). The effect size for older adults was larger (SMD 0.39, 95% CI –0.24 to 1.02), but was generated from one review, with one trial and only 39 participants, and, again, the 95% CIs crossed zero (see Figure 12).

FIGURE 12
The HRQoL subgroup analysis (end-point data from high-quality reviews): age. GAD, generalised anxiety disorder.
Sensitivity analysis
The sensitivity analysis was conducted with an additional 10 reviews that had been rated as low or critically low on the AMSTAR-2. Therefore, the sensitivity analysis was conducted with 34 reviews (76 RCTs, 7466 participants).32,39,63,82,165,188,193,219,220,231,235,236,270,275,276,279,286,299,317,329,343,347,356,371,409,413,445,446,464,467,469,513,518,521 Inclusion of lower-quality reviews increased the estimate of effect (SMD 0.28, 95% CI 0.17 to 0.38) and raised the levels of heterogeneity (I2 = 71%) (see Appendix 9, Figure 15). This analysis included reviews from more physical conditions: 13: Digestive system, 02: Neoplasms, 08: Headaches and epilepsy and 21: Symptoms such as tinnitus and fatigue (see Box 4). All of the additional within-condition group estimates were consistent with the general effect, that is, an absence of inconsistent effects.
We re-ran the PMA replacing the physical component scores with the mental component scores from the SF-12/SF-36 in the two reviews that presented both the physical and the mental component scores.220,464 The replacement did not change the overall effect or heterogeneity rating for the HRQoL outcome (SMD 0.24, 95% CI 0.14 to 0.33; I2 = 38%) (see Appendix 9, Figure 16).
Health-related quality-of-life change scores and risk ratio data
Four reviews (four RCTs, 185 participants), two of higher and two of lower quality, presented HRQoL data as change scores.158,246,406,523 These included reviews of 6B00-06 anxiety and 6A60-80 mood disorders, 13 digestive system, 21 pain and 12 respiratory system disorders. The overall pooling reported acceptable heterogeneity and a moderate effect in favour of CBT (SMD 0.58, 95% CI 0.15 to 1.00; I2 = 66%) (see Appendix 9, Figure 17).
One lower-quality review (two RCTs, 145 participants) presented HRQoL data as a RR.170 This review identified a large effect (SMD 1.57, 95% CI 0.78 to 2.37; I2, not applicable) in favour of CBT (see Appendix 9, Figure 18).
Discussion
From the highest-quality reviews, we found that CBT produced consistent, positive effects on HRQoL across 10 different conditions. Effect estimates suggest a modest, long-term improvement, compared with no intervention. These effects became very small when CBT is compared with other active treatments, including education, exercise, pharmacotherapy, physiotherapy, psychotherapy/counselling and relaxation. We did not find a difference between the effect sizes of reviews conducted with low-intensity CBT or high-intensity CBT.
The effect estimates were generated by synthesising data from samples of children, adolescents and adults, of both sexes, mainly living in countries in Europe, North America and Australasia. There is a lack of higher-quality evidence of CBT’s effectiveness for older adults.
We do not know if CBT will be effective when delivered preventatively or when delivered to patients with severe or subclinical symptoms. We do not know if CBT is equally effective across different ethnic groups nor do we know its effect for people living in countries in Africa, Asia or South America.
Depression
We identified 48 higher-quality systematic reviews37,39,46,50,59,63,117,126,143,167–169,175,197–199,205,206,220,221,231,234,235,246,249,261,267,275,286,299,340,357,369,371,373,401,405,409,432,445,446,448,450,454,469,480,484,507 that met the eligibility criteria to be included in the primary depression PMA. One review included six meta-analyses for different disorders; hence, the number of comparisons is 53. These included 130 RCTs and 14,073 participants, and represent 16 out of 40 possible ICD-11 categories (40%). Box 5 includes the ICD-11 codes represented in the primary PMA (white cells) and those codes not represented (shaded cells).
BOX 5
The ICD-11 categories (not) represented in the primary PMA of higher-quality reviews for depression outcome
The most commonly used measure of depression was the Beck Depression Inventory (n = 22). Other measurements included the Hamilton Depression Rating Scale (n = 7), the Hospital Anxiety and Depression Scale (n = 4), the Patient Health Questionnaire-9 items (n = 2), the Montgomery–Åsberg Depression Rating Scale (n = 2), the Center for Epidemiologic Studies Depression Scale (n = 2), the Profile of Mood States (n = 1), the Depression Anxiety Stress Scale (n = 1) and the Hopkins Symptoms Checklist (n = 1). The remaining reviews used population-specific depression measures [Glasgow Depression Scale for People with a Learning Disability (n = 1), Children’s Depression Inventory-revised (n = 3)].
The majority of these meta-analyses were focused on adults (37/48), with seven reviews focusing on adolescents/children and one review focusing on older people. More reviews had samples that included more female than male participants (23/48) than samples with more male than female participants (9/48). Only seven reviews reported the ethnicity of their samples. Of these, four reviews had samples with < 25% non-white participants and one included a sample with > 75% non-white participants.
The majority of reviews reported on the management of clinical conditions (26/48), on interventions of high intensity (26/48), delivered in outpatient settings (27/48), and with short-term follow-up (39/48). The majority of included RCTs in the reviews were from Europe, North America and Australasia (33/48). Many of the reviews contained only one trial (54%, 26/48), and, for some conditions, such as personality disorders, the numbers in those trials were very small (see Appendix 10, Figure 19).
Primary analysis
Within-condition heterogeneity (I2) varied between 0% (6D10-11: Personality disorders) and 86.3% (6A60-80: Mood disorders), and across-condition heterogeneity was 81%. The across-condition heterogeneity was too high for us to pool across the ICD-11 category groups (see Appendix 10, Figure 19). Ten of the within-condition groups reported effects in favour of CBT with some certainty. However, aggression, eating disorders, mixed mental conditions, nervous system disorders and stress-related disorders report within-condition effects of close to zero.
The heterogeneity was too high to pool across ICD-11 categories in any of the subgroup or sensitivity analyses. There was no evidence of publication bias or of small-study effects (Egger’s test p = 0.87) (see Appendix 10, Figure 20).
Discussion
Depression was the most commonly reported outcome in the review evidence base. The variation between the effect estimates generated for the within-condition subgroups was too wide-ranging to pool across the ICD-11 condition groups. No further subgroup or sensitivity analyses were conducted to compare with the primary analysis.
Anxiety
We identified 34 higher-quality systematic reviews37,39,89,134,143,168,175,205,227,234–236,246,249,251,259,275,286,291,315,340,343,347,371,373,397,409,432,445,446,450,464,469,480 that met the eligibility criteria. Two reviews included meta-analyses for different disorders; hence, the number of comparisons is 36. These included 59 RCTs and 4673 participants, and represent 13 out of 40 possible ICD-11 categories (33%). Box 6 includes the ICD-11 codes represented in the primary PMA (white rows), those conditions represented in the sensitivity analysis only, namely lower-quality reviews (purple rows) and those codes not represented (orange rows).
BOX 6
The ICD-11 categories (not) represented in the primary PMA of higher-quality reviews for anxiety outcome
The most commonly used measure of anxiety was the Beck Anxiety Inventory (BAI) (n = 9). Other measurements included the Hospital Anxiety and Depression Scale (n = 6), the State–Trait Anxiety Inventory (n = 6), the Hamilton Anxiety Rating Scale (n = 1), the Hopkins Symptom Checklist (n = 1) and the Profile of Mood States (n = 1). The remaining reviews used population-specific [Glasgow Anxiety Scale for People with an Intellectual Disability, Revised Children’s Manifest Anxiety Scale (n = 4)] or condition-specific [Generalised Anxiety Disorder-7 (n = 3), Dental Anxiety Scale, Cardiac Anxiety Questionnaire] measurements.
The majority of these meta-analyses focused on adults (23/34), with seven reviews of adolescents/children and two reviews of older people. More reviews had samples that included more female than male participants (14/34) than samples with more male than female participants (9/34). Only five reviews reported the ethnicity of their samples. Of these, four reviews had samples with > 75% white participants and none included a sample with < 25% white participants.
The majority of reviews reported on the management of clinical conditions (20/34), on interventions of high intensity (29/34), delivered in outpatient settings (22/34), and with a short-term follow-up (27/34). The majority of included RCTs were from Europe, North America and Australasia (22/34). Out of 34 reviews, 24 contained only one trial, and, in some cases, the numbers in each trial were very low (Figure 13 presents the data).
These analyses also included reviews with trials conducted in less common contexts and populations, for example patients with subclinical mood (6A60-80) conditions (n = 1), and CBT delivered in preventative contexts to mood disorder (6A60-80), psychosis (6A20-25) (n = 2) and inpatient psychosis patients (n = 2), and to older adults living with stress disorders (6B40-45), obsessive disorders (6B20-25), anxiety (6B00-06) and mood (6A60-80) disorders (n = 2). None of these specific reviews produced effect estimates that were inconsistent with the primary anxiety PMA.
Primary analysis
Within-condition heterogeneity varied between 0% (MG30 pain) and 75% (6B40-45 stress-related disorders) and across-condition heterogeneity was 62%. The pooled across-condition SMD gave a modest effect in favour of CBT on outcomes of anxiety (SMD 0.30, 95% CI 0.18 to 0.43) (see Figure 13). The prediction intervals for the overall effect were –0.28 to 0.88. No inconsistent effects were identified across the conditions.
Once again, variation in effects was observed across conditions. This heterogeneity is reflected in the resulting prediction interval, which, indicated for the overall effect (within any given condition), was between –0.28 to 0.88, indicating a possible small negative effect of CBT for some conditions and, at best, a large positive effect for other conditions.
There was no evidence of publication bias or of small-study effects (Egger’s test p = 0.70) (see Appendix 11, Figure 21).
Mean difference in anxiety
We transformed the across-condition SMD into a mean difference of the most commonly reported anxiety outcome, the BAI.526 We identified a standard deviation (13.46 points) of the BAI from a low risk-of-bias trial525 in a higher-quality review.464 The SMD translated to an estimated mean difference on the BAI of 4 points (95% CI 2 to 6 points).
Subgroup analysis
None of the interaction tests between the subgroups was significant and the evidence is consistent with the primary anxiety PMA.
Cognitive–behavioural therapy intensity
Reviews of low- and high-intensity CBT examined similar populations and conditions. The ICD-11 categories that were represented by high-intensity CBT only, and not low-intensity CBT, were 6A20-25 schizophrenia or other primary psychotic disorders, 6D10-11 personality disorders and related traits, 6A00-06 neurodevelopmental disorders and 12 diseases of the respiratory system. The populations who were sampled in reviews of high-intensity CBT but not in reviews of low-intensity CBT were older adults (06B40-45 disorders specifically associated with stress, 6B20-25 obsessive–compulsive disorders, 6B00-06 anxiety or fear-related disorders and 6A60-80 mood disorders) and subclinical populations (6A60-80 mood disorders). High-intensity, but not low-intensity, CBT reviews included trials delivered in preventative contexts (6A20-25 schizophrenia or other primary psychotic disorders and 6A60-80 mood disorders) and to inpatient samples (6A20-25 schizophrenia or other primary psychotic disorders).
The heterogeneity was too high to pool across the low-intensity CBT reviews (I2 = 78%). The low-intensity CBT reviews included trials examining CBT delivered via paraprofessionals (n = 3) or via the internet (n = 4). The heterogeneity was much lower in high-intensity CBT reviews (SMD 0.28, 95% CI 0.15 to 0.42; I2 = 54%). The interaction test between high- and low-intensity CBT reviews was not statistically significant (p = 0.62) (see Appendix 11, Figure 22).
We identified five reviews87,306,343,360,521 (11 RCTs, 503 participants) that directly compared high- with low-intensity CBT interventions on anxiety outcomes in 6B00-06: Anxiety, 6A60-80: Mood and pain [including tinnitus 21 Symptoms and signs not otherwise specified (MG30 pain)] conditions. In this subset of direct comparisons, there was no difference between high- and low-intensity CBT (SMD 0.03, 95% CI –0.14 to 0.21; I2 = 20%) (see Appendix 11, Figure 23). This direct evidence comparing high- with low-intensity CBT in anxiety and mood and pain conditions supports our indirect evidence (see Appendix 11, Figure 22) from the high- and low-intensity CBT subgroup analyses. We have found no direct or indirect evidence that high- or low-intensity CBT produce different effect sizes.
Type of comparators
The effect was larger and significant when CBT was compared with non-active comparators (SMD 0.37, 95% CI 0.19 to 0.55; I2 = 64%) and smaller and non-significant when compared with active comparators (SMD 0.19, 95% CI 0.00 to 0.37; I2 = 49%) (see Appendix 11, Figure 24). However, the interaction test between the two groups was not significant (p = 0.24).
Duration of follow-up
Effect estimates were higher in reviews reporting long-term follow-up (SMD 0.38, 95% CI 0.15 to 0.60; I2 = 66%) than in those reporting short-term follow-up (SMD 0.27, 95% CI 0.12 to 0.43; I2 = 59%) (see Appendix 11, Figure 25). However, the interaction test did not find a statistically significant difference between the groups (p = 0.48).
Age
Effect estimates were similar in reviews of children and adolescents (SMD 0.37, 95% CI 0.12 to 0.62; I2 = 67.1%) and adults (SMD 0.32, 95% CI 0.15 to 0.48; I2 = 63.6%). The estimates in the two reviews of older adults were much lower and the 95% CIs crossed zero (SMD 0.06, 95% CI –0.30 to 0.43; I2 = 0%) (see Appendix 11, Figure 26). The interaction test did not find a statistically significant difference between the three groups (p = 0.69).
Sensitivity analyses
We identified 56 reviews (117 RCTs, 11,409 participants)34,37,39,89,134,143,168,171,175,205,215,216,227,234–236,241,246,249,251,259,266,275,277,286,291,294,306,315,329,340,343,347,356,371,373,377,379,397,398,409,410,413,425,429,432,445,446,450,463,464,466,469,480,497,513 of any quality with data suitable for inclusion in the sensitivity anxiety PMA. Five reviews had separate valid data representing different conditions; therefore, the total number of comparisons in the all-quality anxiety PMA is 64. Inclusion of lower-quality reviews increased the heterogeneity across conditions (I2 = 76%) beyond our threshold for pooling across the conditions (see Appendix 11, Figure 27). All of the ICD-11 category within-condition effects were consistent with the primary analysis for anxiety outcomes.
Anxiety change scores/dichotomous outcomes
Four lower-quality reviews (four RCTs, 255 participants) reported anxiety outcome data as change scores.56,105,336,457 These included reviews of 6B00-06 anxiety, 6A60-80 mood disorders and 08 diseases of the nervous system. The heterogeneity was too high (I2 = 88.1%) to pool across the reviews (see Appendix 11, Figure 28). One lower-quality review433 reported an OR (one RCT, 112 participants) and found a large effect in favour of CBT (SMD 1.01, 95% CI 0.96 to 1.06; I2, not applicable) (see Appendix 11, Figure 29). One lower-quality review332 (one RCT, 27 participants) reported a risk difference and presented a moderate effect in favour of CBT (SMD 0.36, 95% CI 0.01 to 0.71; I2, not applicable) (see Appendix 11, Figure 29). There were no data that were inconsistent with the primary analysis for anxiety outcomes from any of the change score or dichotomous data.
Discussion
The primary PMA reported that CBT produces a small, but meaningful, long-term improvement in anxiety symptoms. Results from the primary, subgroup, sensitivity subgroups, change scores and dichotomous data PMAs are all consistent with the primary PMA. Some individual reviews were conducted in less frequently researched contexts (e.g. trials conducted in Africa), under-represented populations (e.g. older adults) and less frequently researched delivery formats (e.g. preventative CBT). Every review generated effect estimates that were consistent with the overall general effect.
Cognitive–behavioural therapy was effective when it was delivered via high-intensity methods, but there was too much variation to conclude whether or not CBT was effective when it was delivered via low-intensity methods. However, there were no statistically significant differences between any of the subgroup tests.
The effect estimates were generated by synthesising data from samples of children, adolescents and adults, of both sexes, mainly living in countries in Europe, North America and Australasia. There is a lack of higher-quality evidence of CBT’s effectiveness for older adults.
We do not know if CBT will be effective when delivered preventatively or when delivered to patients with severe or subclinical symptoms. We do not know if CBT is effective across different ethnic groups nor do we know its effect for people living in countries in Africa, Asia or South America.
Pain
We identified 10 higher-quality systematic reviews32,68,102,134,149,188,211,251,317,446 that met the eligibility criteria. These included 22 RCTs (2581 participants) and represent 5 out of 40 possible ICD-11 categories (13%). Box 7 presents the ICD-11 codes represented in the primary PMA (white rows), those codes represented in the sensitivity analysis (i.e. lower-quality reviews) (purple rows) and those codes not represented (orange rows).
BOX 7
The ICD-11 categories (not) represented in the primary PMA of higher-quality reviews for the pain outcome
The most commonly used measure of pain was the 100-mm visual analogue scale (VAS) (n = 6). Other measurements included the numerical rating scale of pain intensity (n = 3), the Wong–Baker Faces Pain Rating Scale (n = 2), the modified von Korff scale (n = 1), the Chronic Pain Grade questionnaire (n = 1) and the McGill Pain Questionnaire (n = 1).
The majority of these meta-analyses were focused on adults (6/10),102,134,149,188,211,317 three reviews focused on adolescents/children32,68,446 and one review did not report the age of the samples.251 All of the reviews included samples that were equally balanced between male and female participants. Only one review63 reported the ethnicity of its samples (> 75% white participants).
Two reviews specified examining CBT in patients with chronic symptoms of pain, anxiety and mood conditions. Half of the reviews examined high-intensity and half of the reviews examined low-intensity CBT. One review examined using CBT to prevent pain developing post orthodontic treatments, but all the others were using CBT in response to diagnosed problems. Two reviews observed the use of CBT for inpatients with pain conditions, whereas the remaining reviews examined CBT in outpatient/community settings. Three reviews included long-term follow-ups (abdominal pain, back pain, anxiety and mood disorders). Four reviews included only one trial, and five reviews included < 100 participants (Figure 14).
Primary analysis
The across-condition heterogeneity was 64% and the pooled across-condition SMD gave a modest effect in favour of CBT on outcomes of pain (SMD 0.23, 95% CI 0.05 to 0.41) (see Figure 14). The prediction intervals for the overall effect were –0.28 to 0.74.
There was no evidence of publication bias, nor of small-study effects (Eggers test p = 0.19) (see Appendix 12, Figure 30).
Mean difference in pain
We transformed the across-condition SMD into a mean difference of the most commonly reported pain outcome, the 100-mm VAS. We identified a standard deviation (27 mm) of the VAS from a trial527 with a low risk of bias in a higher-quality review.32 The SMD translated to an estimated mean difference on the VAS of 6 mm (95% CI 1 to 11 mm).
Subgroup analysis
There were no statistically significant interaction effects between any subgroups; therefore, the evidence is consistent with the primary analysis.
Cognitive–behavioural intensity
High- and low-intensity CBT was examined in both children/adolescent and adult populations. The only review149 of preventative CBT delivered low-intensity treatment. There were no other differences in the populations or contexts tested with high- and low-intensity CBT.
The heterogeneity was too high to pool separately across the low-intensity CBT reviews (I2 = 84%). The low-intensity CBT reviews included trials examining CBT delivered by paraprofessionals (n = 4), through self-help tools (n = 6). The heterogeneity was much lower in high-intensity CBT reviews (SMD 0.19, 95% CI 0.01 to 0.37; I2 = 18%) (see Appendix 12, Figure 31). The interaction test between high- and low-intensity reviews was not statistically significant (p = 0.87).
No reviews directly compared the effectiveness of low-intensity compared with high-intensity CBT on pain outcomes. Therefore, no direct evidence is available to compare with our indirect evidence.
Type of comparators
The effect was larger when CBT was compared with non-active comparators (SMD 0.59, 95% CI 0.07 to 1.11; I2 = 69%) and was very small when compared with active comparators (SMD 0.14, 95% CI –0.11 to 0.38; I2 = 73%) (see Appendix 12, Figure 32). However, the difference between the two groups was not statistically significant when tested with the interaction test (p = 0.86).
Duration of follow-up
Effect estimates were higher in reviews reporting short-term follow-up (0.32, 95% CI 0.04 to 0.59; I2 = 70.5%) than in reviews reporting long-term follow-up (0.19, 95% CI 0.08 to 0.31; I2 = 0%) (see Appendix 12, Figure 33). However, the interaction test did not find a statistically significant difference between the groups (p = 0.62).
Age
The effect estimates extracted from reviews conducted in children and adolescent populations were too varied (I2 = 87%) to justify pooling across them. Conversely, there was 0% heterogeneity between the reviews in adult populations and the pooled effect was modest (SMD 0.21, 95% CI 0.12 to 0.31; I2 = 0%) (see Appendix 12, Figure 34). The interaction test between (1) children and adolescents and (2) adults did not find a significant difference between these groups (p = 0.68).
Sensitivity analysis
We identified 16 reviews (19 comparisons, 39 RCTs, 4592 participants)32,68,102,134,149,188,211,222,251,266,306,317,348,413,446,473 of any quality with data suitable for inclusion in the sensitivity pain PMA. This introduced 08: Nervous system disorders into the analysis. In the sensitivity analysis, all of the ICD-11 within-condition groups were consistent with the primary analysis for pain outcomes. Inclusion of lower-quality reviews marginally reduced the estimate of effect and heterogeneity (SMD 0.21, 95% CI 0.11 to 0.31; I2 = 51%), compared with the primary PMA (see Appendix 12, Figure 35).
Pain change scores/dichotomous data
The reviews that reported dichotomous data were consistent with the primary analysis. One lower-quality review, Palermo et al.,354 presented pain outcome data as ORs. This review demonstrated a non-significant effect for CBT (SMD 7.99, 95% CI –2.72 to 18.70) (see Appendix 12, Figure 36). One lower-quality review, Bernardy et al.,64 reported pain outcome data as risk differences and showed a non-significant effect for CBT (SMD 0.08, 95% CI –0.03 to 0.19) (see Appendix 12, Figure 36).
Discussion
From a smaller data set of the highest-quality reviews, we found that CBT produced consistent improvements in pain outcomes across six different conditions. Effect estimates suggest a modest long-term improvement in comparison with any other comparator intervention. We did not find a difference in the effect sizes between reviews conducted with low-intensity CBT and those conducted with high-intensity CBT.
The effect estimates were generated by synthesising data from samples of children, adolescents and adults, of both sexes, mainly living in countries in Europe, North America and Australasia. To our knowledge, there is no higher-quality evidence of CBT’s effectiveness for older adults.
The included reviews presented evidence to suggest that CBT can improve pain outcomes when CBT is delivered preventatively to patients with chronic or subclinical symptoms, but there is no evidence regarding severe symptoms. We do not know if CBT is equally effective across different ethnic groups, nor do we know its effect for people living in countries in Africa, Asia or South America.
- Results: panoramic meta-analysis - Cognitive–behavioural therapy for a variety o...Results: panoramic meta-analysis - Cognitive–behavioural therapy for a variety of conditions: an overview of systematic reviews and panoramic meta-analysis
Your browsing activity is empty.
Activity recording is turned off.
See more...

