NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

MacPherson H, Vickers A, Bland M, et al. Acupuncture for chronic pain and depression in primary care: a programme of research. Southampton (UK): NIHR Journals Library; 2017 Jan. (Programme Grants for Applied Research, No. 5.3.)

Cover of Acupuncture for chronic pain and depression in primary care: a programme of research

Acupuncture for chronic pain and depression in primary care: a programme of research.

Show details

Chapter 4Towards a cost-effectiveness analysis of acupuncture for chronic pain: developing methods in a case study


Evidence synthesis in health technology assessment

Cost-effectiveness analyses of health technologies have a number of key requirements.202 These analyses should entail (1) a clear definition of the decision problem, which should include all relevant comparators; (2) an appropriate time horizon for the analysis; (3) the systematic identification and consideration of all relevant evidence;203 (4) an appropriate characterisation of all sources of uncertainty; and (5) an assessment of the value of acquiring additional research. It is extremely rare for the evidence base informing a cost-effectiveness analysis to come from a single study.204 Data (typically summary study level) are derived from multiple sources and are often available in multiple formats, for example using different instruments for measurement and using different measures of effect reported at different time points. Evidence synthesis and decision modelling are used extensively in health technology appraisal to meet the challenge of reflecting these disparate sources of evidence within a coherent framework.

Synthesis tools are increasingly used to obtain pooled estimates of the parameters of interest to inform economic decision models. This is particularly the case for treatment effect estimates when multiple relevant RCTs may be available. In many circumstances the synthesis of treatment effect evidence considers only pairwise comparisons through the use of standard meta-analysis. However, frequently there are more than two treatment choices. Network meta-analysis (also known as mixed-treatment comparisons) is a tool that extends standard pairwise meta-analysis, allowing the estimation of the relative effectiveness of multiple treatments by simultaneously synthesising all relevant evidence. This statistical method is a well-established technique and its methods have been described extensively in the literature.85,86,178,205208 As is the case for standard meta-analysis, most published work using network meta-analysis focuses on the synthesis of aggregate data. These data are usually obtained from published literature and consist of an estimate of treatment effectiveness (e.g. mean difference in the case of continuous outcomes) and an appropriate measure of uncertainty (e.g. the variance or SE).

Network meta-analysis and the use of individual patient data

With the increasing availability of IPD for economic evaluation, together with considerable support for utilising this type of evidence,209,210 meta-analytic methods have emerged to address the challenges of IPD study synthesis.211,212 Most progress has been made in the area of the statistical synthesis of clinical effectiveness, whereas little work has been undertaken to address the challenges of synthesising information on other important decision model parameter types.213 Techniques for pairwise meta-analysis of individual-level evidence exist for most outcome types including binary,214 continuous210,215 and time-to-event data.216,217 Use of IPD to inform decisions creates added value by offering the potential to reduce network heterogeneity, tackle existing evidence inconsistencies172 and examine subgroup effects in patients in whom interventions might have an effectiveness and cost-effectiveness profile which differs from that of the wider population.218 Few methodological studies on the synthesis of IPD in network meta-analysis are available in the published literature and even fewer examples of its use within cost-effectiveness analysis exist.213

Objectives and structure

Given the potential benefits of IPD network meta-analysis as a basis for informing cost-effectiveness modelling and the paucity of examples of this approach in the literature, we present a case study of using IPD network meta-analysis to inform a cost-effectiveness analysis of acupuncture for chronic pain. The objectives of this research were to both develop novel methods for IPD network meta-analysis and demonstrate the application of IPD network meta-analysis for use in economic evaluation.

To our knowledge the current synthesis methods literature does not offer modelling tools for continuous data within an IPD network meta-analysis framework. Using a pairwise meta-analysis framework, Riley et al.219 discuss different approaches to the synthesis of continuous outcome data when IPD are available. Riley et al.219 highlight that modelling the follow-up result, adjusted for the baseline value, commonly called ANCOVA,219,220 is the preferred approach. The availability of IPD is crucial for such models. If IPD are not available, the use of ANCOVA would require all original study authors to have reported appropriate treatment effect estimates, ideally at the same follow-up time. These requirements, in most circumstances, make this option unfeasible. Flexibility is introduced with the availability of IPD as the analyst can apply the same modelling approach across trials and derive consistent outputs. This report describes a novel, methodological framework for IPD network meta-analysis of continuous data within the Bayesian framework, which builds on the work described in Riley et al.219

Two approaches to synthesising data on heterogeneous continuous outcomes are explored. The first involves standardising outcomes by dividing primary outcome scores by study-specific SDs. This creates a dimensionless measure of treatment effect usually termed the SMD.221223 Although commonly used, this approach does not produce results that can directly feed into cost-effectiveness analysis models, as absolute treatment effect estimates are required. Furthermore, health-care policy-makers require a common health outcome measure to be able to make decisions across different conditions and clinical areas. In many jurisdictions, including England and Wales,224 this measure is the QALY.225 The QALY is a composite measure and provides an estimate of an individual’s remaining life expectancy weighted by a preference-based measure of HRQoL. The most popular HRQoL measure for generating quality-of-life weights is the EQ-5D.226 These considerations motivate the second synthesis approach used, which involves translating (or ‘mapping’) the available HRQoL data from the trials to EQ-5D values and synthesising the resulting data.

After describing the motivating data set in Motivating case study: the cost-effectiveness of acupuncture for chronic pain in primary care, the core of Methods outlines the novel statistical models for the IPD network meta-analysis. A variety of modelling approaches are described and discussed. This section also describes how comparable end points suitable for synthesis were obtained, the estimation of costs and the cost-effectiveness modelling methods. The section Application provides the results of the IPD network meta-analysis and cost-effectiveness analysis. The discussion section offers concluding remarks and discusses relevant issues, including extensions to the current work.

Motivating case study: the cost-effectiveness of acupuncture for chronic pain in primary care

Background to acupuncture and acupuncture guidance

There is currently a lack of agreement about the effectiveness of acupuncture as a treatment for chronic pain, as reflected in debates about recent UK guidance surrounding its value.83,84,227 Acupuncture received a positive recommendation from NICE for its use in back pain81 and headache/migraine,80 whereas a negative recommendation was given for its use in osteoarthritis in 2008228 and 2014.82 The methods in this chapter were developed as part of a project to improve evidence regarding the clinical effectiveness and cost-effectiveness of acupuncture for chronic non-specific pain to inform decision-making in the UK NHS.

Data description and network of evidence

Data for the current study were made available by the ATC. To address the lack of good-quality evidence in acupuncture, the ATC undertook a systematic review in which relevant high-quality trials were identified and, for a large proportion, IPD were obtained.196 From 31 eligible RCTs from the ATC database, IPD were obtained from 29. However, data from Cherkin et al.131 were not available to us because of sharing restrictions. The data set analysed here included 28 high-quality RCTs60,61,6677,117122,130,132138 that assessed the effectiveness of acupuncture for three pain conditions: osteoarthritis of the knee (seven trials), headache [including tension-type headache (TTH)] and migraine (six trials), and musculoskeletal pain, encompassing lower back, shoulder and neck pain (15 trials), totalling approximately 17,500 patients from the USA, UK, Germany, Spain and Sweden. These studies are summarised in Table 21.



Data set main characteristics and study outcomes used for analysis

Nine of these studies were three-arm trials, assessing the three treatments simultaneously, 11 evaluated acupuncture and sham acupuncture only, and eight considered acupuncture and usual care only. Thus, the comparison of acupuncture with usual care is informed by 17 studies, acupuncture with sham acupuncture by 20 studies and sham acupuncture with usual care by nine studies. The resulting evidence network is presented in Figure 11.

FIGURE 11. Network of RCTs.


Network of RCTs. In the network, a unique treatment category is indicated by a box. Arrows between boxes indicate that these treatments had been compared in a trial. Pain groups: H, headache/migraine; MSK, musculoskeletal; OAK, osteoarthritis of the knee (more...)

Resource use information in the data set was limited to five60,61,66,69,71 of the 28 studies. Of these, three are specific to the German health-care system66,69,71 and, given the jurisdiction-specific nature of health-care resource use data, non-UK studies are of limited value to inform decision-making in the UK.229 The remaining two studies that provided resource use evidence60,61 were carried out in the UK, although only one of these60 collected resource use for time points that matched the 3-month time frame of the effectiveness assessment. The study by Thomas et al.61 recorded resource use at 12 and 24 months only. The study by Vickers et al.60 focused on one of the three clinical areas of interest (headache/migraine). It was decided to seek additional external data sets outside the ATC data set instead of assuming that health-care resource use data from headache/migraine trials could be generalised to musculoskeletal conditions and osteoarthritis of the knee. Therefore, IPD were obtained from the UK Back pain Exercise And Manipulation (BEAM) study230 for musculoskeletal pain and the UK Topical or Oral IBuprofen (TOIB) study231 for knee osteoarthritis pain.


This section describes how individual-level comparable values for the two end points of interest were generated, that is, EQ-5D index values and standardised pain scores. The section then goes on to describe the Bayesian IPD network meta-analysis synthesis modelling framework for both end points. Extensions to the modelling approach are then considered. Following this, methods used to analyse resource use and costs, and to generate cost-effectiveness results, are described.

Overview of the analysis

The ATC data set includes trials comparing acupuncture with sham acupuncture or usual care or comparing all three comparators. All trials were included in the synthesis to maximise use of the available data. In the context of clinical or other health-care decision-making, a sham comparator is not a clinically meaningful intervention as it would not be prescribed; the focus of the cost-effectiveness analysis was therefore on acupuncture compared with usual care. The range of possible treatment options for chronic non-specific pain in primary care is much wider than those considered here. The cost-effectiveness results presented in this report should not therefore be interpreted as providing a definitive answer to the question of whether or not acupuncture is cost-effective for the treatment of chronic non-specific pain. Instead, the cost-effectiveness analysis provides an illustration of how IPD network meta-analysis can be used.

Two outcome measures were used to value benefit in the present analysis: pain and EQ-5D index values. Given the spectrum of pain conditions considered, the availability of multiple instruments with which to measure pain, the lack of agreement about the preferred outcomes with which to measure pain and variable quality in reporting, pain measurement was highly heterogeneous across trials. SMDs were used to synthesise the pain outcomes. The HRQoL data were also obtained using a variety of instruments – some generic [e.g. Short Form questionnaire-12 items (SF-12) and SF-36] and others disease specific (e.g. the WOMAC index). These data were therefore mapped to EQ-5D values using a series of statistical mapping algorithms, which are described below.

End points were measured at a variety of follow-up times across trials. To consistently assess the effect of the treatments of interest across trials, the analysis focused on the time point closest to 3 months from the start of treatment. The 3-month time point was used as this was typically the measurement taken after the end of an acupuncture treatment course in the trials forming the evidence base and was reported for the majority (21/28) of the trials (see Appendix 1).

Generating homogeneous health-related quality-of-life scores and pain outcomes

The EQ-5D index score was the preferred end point for our analysis because of its importance for cost-effectiveness analysis. The conventional three-level version of the EQ-5D questionnaire includes five domains (mobility, self-care, usual activities, pain/discomfort and anxiety/depression), each of which can be at one of three severity levels (no problems, some or moderate problems, or extreme problems), to generate a health status descriptor of one of 243 (35) health states (245 states in total when also considering the ‘unconscious’ and ‘dead’ states). The descriptor is quality adjusted using a score derived from analysing the preferences of approximately 3400 members of the UK public.232 Bounded by full health and by the worst imaginable health state, the score ranges from 1 to –0.594. The distribution of EQ-5D health-state utility data is commonly non-normal. This, among other features, makes statistical modelling of the EQ-5D particularly challenging.233 Only a small number of trials in the data set (see Table 21) provided EQ-5D data.

When EQ-5D data were not available they were predicted using other generic and disease-specific measures (see Table 21) through published mapping algorithms. Mapping algorithms were identified using the University of Oxford’s Health Economics Research Centre (HERC) database of studies mapping from HRQoL or clinical measures to the EQ-5D.234,235 When using this tool and when multiple mapping algorithms were available for a given instrument, the preferred algorithm was selected on the basis of the sample size, adequacy of statistical modelling and relevance of study population. The selection of the outcome to be mapped was not at random. Preference was given to generic health status-based instruments (i.e. SF-12 and SF-36) and, in their absence, to condition-specific instruments [i.e. WOMAC, VAS pain and Constant–Murley Score (CMS)], conditional on the existence of a valid and published algorithm. The WOMAC was used in preference to VAS pain and CMS as it covers a broader definition of HRQoL. In 50% of the trials (n = 14) (see Appendix 4 for trial details), well-established published algorithms were used to map from SF-36 dimensions and SF-12 summary scores to the EQ-5D61,236,237 [a random-effects generalised least-squares algorithm considering dimensions, dimensions squared and interactions from Rowen et al.236 was used (model R2 = 0.71); a multinomial logit using physical component summary (PCS) and mental component summary (MCS) scores, summary scores squared and interaction terms (mean square error = 0.021) from Gray et al.237 was used to map the SF-12 to the EQ-5D. In 10 of the 28 trials, published algorithms that map VAS pain scores [an ordinary least squares (OLS) regression including VAS pain and VAS pain squared as covariates from Maund et al.238 was used (R2 = 0.101)] and WOMAC scores [an OLS regression including total WOMAC score, total WOMAC score squared, age and sex as covariates from Barton et al.239 (R2 = 0.313) was used] to the EQ-5D were used. For one trial,137 a double mapping was necessary as, to our knowledge, no direct mapping algorithm exists to obtain EQ-5D values from the CMS. Thus, an in-house unpublished mapping algorithm240 was used to derive VAS pain estimates from the CMS, which were used to obtain individual-level EQ-5D predictions using the algorithm mentioned above (available on request from Kamran Khan – For further details on mapping to the EQ-5D, see Chapter 5 (Health-related quality of life for cost-effectiveness analysis).

A high level of unexplained variation was found in the majority of the mapping algorithms used, that is, the proportion of total variation of the outcome(s) explained in these models (quantified by the coefficient of determination, R2, in most cases) was low. To account for this source of uncertainty in the mapping process, an additional variance component was included in the EQ-5D predictions.241 A mapping process involves additional sources of uncertainty – the uncertainty in the mapping function regression coefficients and the structure of the mapping model. These additional sources of uncertainty are not accounted for in this analysis. This was achieved by drawing from a normal distribution with a mean of zero and variance equal to the study-specific residual variance. The residual variance was calculated as the difference between the total variance (calculated by dividing the variance of the mapped data by the R2 for the mapping algorithm) and the mapped outcome variance. Each random draw was then added to each individual-level EQ-5D prediction.

The second outcome measure assessed was standardised pain. Across the 28 trials, the primary outcome of each study was used to generate patient-level standardised pain estimates. Pain measures varied from days with headache in the headache/migraine pain condition to VAS pain in the musculoskeletal group or WOMAC pain in the osteoarthritis of the knee group, as reported in Table 21. Individual-level standardised pain estimates were obtained for each trial by dividing the primary outcome scores by the study-specific SD. Note that, although these estimates were used as inputs in the synthesis models, the outputs of the synthesis are in the SMD format, as differences between treatments were estimated within the modelling [considering stxt as the standardised value of the pain measurement p made at the time point t in patients under treatment tx, it can be demonstrated that (stx1t1stx1t0)(stx0t1stx0t0)=(stx1t1stx0t1)(stx1t0stx0t0)=ΔSMD].

Health-related quality of life and standardised pain estimates were obtained at baseline and at the follow-up point closest to 3 months following the start of treatment. Changes from baseline were obtained by calculating the difference between values for these two time points.

Statistical methods

This section describes the IPD network meta-analysis models. All analyses were conducted from a Bayesian perspective. Bayesian methods can be considered an alternative to the classical (frequentist) approach to statistical modelling and have been frequently used in the data synthesis and the economic evaluation of heath-care technologies.202,207 They provide a more appealing, intuitive and flexible modelling framework as both the data and the model parameters are considered as random quantities. The key feature in this framework is the likelihood function, which defines how reasonable the data are given values of those model parameters. A key feature of this approach is that it allows the model to incorporate external information alongside available data in the format of prior distributions. When very little or no information is accessible, or when wanting the data to dominate, the posterior, subjective or ‘vague’ beliefs are set as priors.242 This framework also allows the uncertainty in the relative effect estimates to be translated into probabilities of decision uncertainty, that is, the probability of which treatment is best (most efficacious) out of all treatments being compared. This explicit consideration of decision uncertainty leads naturally into a decision theory framework, which usually also considers costs and utilities, typically used in health-care decision-making.202

A one-step IPD network meta-analysis modelling approach was preferred as, together with relative treatment effect estimates, the estimation of treatment–covariate interactions for patient-level covariates were of interest.210,212 In the following model descriptions a random-effects approach was taken because of the expected between-study heterogeneity. Nonetheless, a fixed-effect framework could be attained with straightforward simplifications.86,243 The models described apply both to the EQ-5D and to the standardised pain outcome.

The main modelling approach considered (model 1) is a variation of the ANCOVA approach, modelling the change score but also adjusting for baseline outcome values.219,220,243,244 Model 1 was used as changes from baseline more closely approximated a normal distribution than absolute outcomes at 3 months. The model included interaction effects for pain type as it was expected that the impact of acupuncture (and sham acupuncture) may differ across pain types. Interaction effects for each pain type were modelled as exchangeable and related245 as it was expected that the impact of each pain type on the treatment effect of acupuncture may be related to the impact of each pain type on sham acupuncture effects.

Individual patient data network meta-analysis considering pain type as a treatment effect modifier

The model considers a set of J studies for which IPD were available. These studies included patients with a specific pain condition, with the pain conditions being headache/migraine, musculoskeletal pain and osteoarthritis of the knee. The set of treatments included in these trials are labelled [A,B,C], where A is the reference treatment, and there are K (= 3) treatments in total. At baseline, patient i in study j allocated to treatment k provides a baseline measurement Yijk0, where 0 indicates time t at baseline). Each patient provides a follow-up measurement (the assessment closest to 3 months) Yijk3. The change from baseline (Yijk3 – Yijk0) is denoted ΔYijk.

Model 1: analysis of covariance variation – change score modelling, adjusted for baseline

This model can be written as:


where Vj represents the study-level variance, the quantity µjb represents the outcome for the treatment b in study j for a patient with a baseline utility of 0, the parameter β0j represents the impact of the (outcome) baseline on the change outcome for each study j, the term δjbk represents the study-specific treatment effect for treatment k relative to treatment b and Xjp are p – 1 dummy variables representing pain type p in the jth study. Pain × treatment interaction effects βAkp were considered different for each treatment but exchangeable and were assumed to be drawn from a random distribution with a common mean (Bp) and between-treatment variance (σBp2).

Independent prior distributions were defined as follows: 1/Vj ∼ Gamma(0.001,0.001); µjb ∼ N(0,106); β0j ∼ N(0,106); dAk ∼ N(0,106); σ ∼ Unif(0,2); Bp ∼ N(0,106); σBp ∼ Unif(0,2). Correlations in the random effects from trials with three or more arms were accounted for following published methodology.86,222 In this report, k > b indicates that k is after b in the alphabet.

Modelling extensions

Model 1 can be extended to consider covariates. Age and BMI were identified as potential treatment effect modifiers, with the clinical expectation being that older age or higher BMI may make patients more difficult to treat (i.e. reduce the effect of treatment). BMI data were, however, rarely reported and were available in only 10 of the 28 studies. This covariate was not therefore adjusted for in the modelling.

Age was assumed to modify outcomes by the same amount across pain types and to modify treatment effects by the same margin for acupuncture and sham acupuncture (i.e. a single interaction term is assumed to apply to all comparisons with usual care). Squared terms were included for main effects and treatment interaction effects as a non-linear impact of age on outcomes and treatment effects was expected a priori. Age was centred prior to inclusion in the model.

The following model (model 2) extends model 1 by considering the effects of the covariate Z:

ΔYijkN(θijk,Vj)θijk={µjb+β0jYijk0+ϕ0Zijk+φ0Zijk2ifk=0;b,k{A,B,C}µjb+β0jYijk0+ϕ0Zijk+φ0Zijk2+δjbk+βbkpXjpifk>b and bAµjb+β0jYijk0+ϕ0Zijk+φ0Zijk2+ϕZijk+φZijk2+δjbk+βbkpXjpifk>b and b=AδjbkN(dbk,σ2)N(dAkdAb,σ2)βbkp=βAkpβAbpβAkpN(Bp,σBp2)ZijkN(m,prec)dAA,βAA=0.

Coefficients on the main covariate effect and the effect squared are represented by ϕ0 and φ0, respectively. Coefficients on the treatment–covariate interaction term and the interaction between treatment and the squared covariable term are represented by ϕ and φ, respectively. No interaction term for comparisons of k and b was included when bA because the common regression coefficient cancels out.

Because of the possibility of missing covariate information for some individuals, Zijk was represented as a normally distributed random variable with mean m and precision prec, common across all IPD studies. This represents a multiple imputation technique and assumes that the covariable data were missing at random. Additional priors were required for this model: ϕ0, ϕ, φ0, φ, ∼ N(0,106); m ∼ Unif(–50,50), 1/precUnif(0,30).

Analysis in the presence of restricted evidence

Although model 1 is the preferred choice, this model would not be feasible in the absence of information at the individual level at the baseline and follow-up time points. Models that do not rely on the availability of IPD were therefore run for comparison purposes. Two options219 are typically available to the analyst when only aggregate data are available: modelling the change score (model 3) or modelling the final outcome score (model 4), both without baseline adjustment. These models represent simplifications of model 1 in which the baseline outcome variable is omitted.

Model selection and implementation

Data management was performed in the freely available software package, R version 3.0.0 (The R Foundation for Statistical Computing, Vienna, Austria). The network meta-analysis was undertaken in WinBUGS version 1.4.3,246 linked to the R software through the packages R2WinBUGS246 and CodaPkg.247 Code for the network meta-analysis is provided in Appendix 4.

In all models the MCMC Gibbs sampler was initially run for 10,000 iterations and these were discarded as ‘burn-in’. Models were run for a further 5000 iterations, on which inferences were based. Chain convergence was checked using autocorrelation and Gelman and Rubin248 diagnostics. Within the network meta-analysis, goodness of fit was assessed using the deviance information criterion (DIC) and residual deviance.180 The DIC is a measure that balances fit and complexity, allowing parsimony to be considered in model choice. The DIC is often used for model comparison when smaller DIC models should be preferred. The residual deviance of each data point may be viewed as a measure of the data point’s contribution to the total residual deviance (or lack of fit) of the model. A posterior mean for the total residual deviance similar to the number of data points will imply that model predictions fit well to the observed data.

Results are presented as EQ-5D index scores and SMD treatment effect estimates (and associated 95% CrIs), and also the probability of treatment being the ‘best’ treatment in terms of being the most clinically effective.207

Modelling resource use

Acupuncture was assumed to be administered during 10 sessions with a physiotherapist. Ten sessions of acupuncture have been recommended by NICE in the context of lower back pain81 and headache/migraine,80 and it was assumed that this duration of therapy could be generalised to other musculoskeletal conditions and osteoarthritis of the knee. The first session was assumed to last for 40 minutes and subsequent sessions for 30 minutes. All sessions were costed using a unit cost for a physiotherapist (£36 per hour; Schema 9.1, with qualifications249).

The NICE recommendations, alongside the above assumptions regarding appointment durations, equate to a total of 5.2 hours of therapist time. A sensitivity analysis using a weighted average of the therapist time observed in the trials was conducted. Data were obtained from the data extractions conducted by Vickers et al.105 Therapist time was calculated as the duration of sessions multiplied by the number of sessions and included only sessions that occurred within the 3-month time horizon considered for efficacy. The sensitivity analysis used total therapist interaction times of 5.6 hours for headache/migraine, 3.9 hours for musculoskeletal and 4.7 hours for osteoarthritis of the knee chronic pain.

The potential impact of improved health outcomes on resource use was explored using the three data sets described in Data description and network of evidence.60,230,231 EQ-5D predictions (mapped from the available SF-36 physical and mental summary scores) together with the number of primary care (i.e. GP) and secondary care (i.e. specialist) visits from Vickers et al.60 were used to estimate the relationship between change in HRQoL and change in health resource utilisation for the headache pain group. The relationship estimated from Vickers et al.60 was assumed to apply for the entire headache group of patients (which includes patients with TTH and migraine pain). A simple OLS analysis was used to regress the change in resource use from 0–3 months to 3–12 months on the change in EQ-5D scores between month 3 and month 12. Primary and secondary care visits were analysed separately. Although not aimed at evaluating acupuncture, the UK BEAM study230 (with approximately 1300 patients) and the TOIB study231 (with approximately 280 patients) were used to estimate this relationship for lower back pain and osteoarthritis of the knee patients, respectively, using the same approach. Data from the UK BEAM study were assumed to be applicable to the other patients within the musculoskeletal pain category (i.e. those with neck and shoulder pain).

Resource utilisation at baseline was not collected in these studies (and is generally not collected in clinical trials). Changes in resource use were preferred to absolute resource use estimates as their relationship with EQ-5D changes is less likely to be confounded. To estimate the change in resource use it was therefore necessary to use the change from 0–3 months to 3–12 months. Use of the change from 0–3 months to 3–12 months to infer change in resource use over the 0- to 3-month time horizon of the economic model, however, assumes that a given utility change would drive a given change in resource use regardless of the time frame. Given that this is a strong assumption a secondary analysis was conducted using the absolute resource use in the period 0–3 months and regressing this on the change in EQ-5D score over this period.

The statistical software Stata 13 was used to model resource use for each pain condition.

The average cost of non-intervention resources used for each pain condition was calculated as the product of the EQ-5D estimates derived from the synthesis models, the coefficients on the EQ-5D estimates from the resource use regressions and the relevant unit costs [primary care visits were costed at £46.8 – this represents a weighted average of GP (£45 per consultation) and nurse (£49 per hour) visits with weights taken from the UK BEAM study and unit costs taken from the Curtis249 – and secondary care visits at £135 – this is the weighted average NHS reference cost for all outpatient procedures taken from Curtis249). Costs are reported in UK pounds for the financial year 2012–13. Other treatments and health-care interactions that may form a package of ‘usual care’ were assumed to have been provided equally to all patients regardless of comparator. These costs were therefore omitted from the analysis.

Estimation of cost-effectiveness outcomes

Quality-adjusted life-years were estimated assuming that the benefit of acupuncture over usual care estimated from the network meta-analysis of EQ-5D index scores was achieved instantaneously, with benefit maintained from 0 to 3 months, and was then lost instantaneously, illustrated in Figure 12 by the accrued benefit 2. This is equivalent to assuming that the full benefit was gradually achieved over a specified period and then lost linearly over the same period, which may be viewed as a more realistic scenario (see Figure 12, accrued benefit 1). For example, the benefit could be linearly achieved from the start of treatment until 12 weeks and gradually lost over the 12 weeks following treatment completion. Costs and effects beyond the 3-month time horizon were not considered in the current model and given the short time horizon no discounting was applied.

FIGURE 12. Illustrative diagram of the treatment benefits over the period of assessment.


Illustrative diagram of the treatment benefits over the period of assessment.

Incremental QALY estimates were compared with incremental cost estimates (intervention costs and non-intervention costs) to calculate incremental cost-effectiveness ratios (ICERs). These can be compared with a threshold value of £20,000–30,000 per QALY as conventionally applied in England and Wales.203

Uncertainty in the estimates was quantified through the use of probabilistic analysis. The 5000 posterior samples from the synthesis of effectiveness (extracted from the Convergence Diagnostic and Output Analysis WinBUGS output) were used together with 5000 samples of the cost parameters, generated through Monte Carlo simulation. Uncertainty surrounding the decision to accept/reject acupuncture on the basis of cost-effectiveness was illustrated through cost-effectiveness acceptability curves. The cost-effectiveness modelling was also implemented in R.


Results of generating homogeneous health-related quality-of-life scores and pain outcomes

Appendix 4 presents the (mapped) EQ-5D data and standardised pain outcomes. In general, patients’ HRQoL increased from baseline to 3 months. Similarly, standardised pain estimates decreased from baseline to 3 months. For both time points, it appears that osteoarthritis of the knee patients had, on average, lower HRQoL (and higher mean values of standardised pain) than patients suffering from headache/migraine or musculoskeletal pain.

For both end points, baseline imbalances between trial arms were observed within trials. For the EQ-5D end point, the biggest within-trial differences at baseline were found in the studies by Carlsson et al.119 and Salter et al.133 For the SMD end point, the largest differences were found in the same two trials and also in the studies by Kleinhenz et al.137 and White et al.132 These large imbalances are not surprising as most of these trials included only a small number of patients (around 50 or fewer). These observations supported the use of a modelling framework that allows for baseline adjustment,219 involving the use of either model 1 or 2 as the appropriate tool to synthesise this evidence.

Analysis of covariance (model 1)

Table 22 shows the parameter estimates obtained from model 1 applied to the EQ-5D and the standardised pain outcome data. For each parameter estimate the median of the MCMC posterior sample and 95% CrI are shown. Relative treatment effect estimates are shown, adjusted for baseline and treatment–pain interaction effects, together with measures of model fit (total residual deviance and DIC). The osteoarthritis of the knee pain group is the reference category for the pain interaction effects.



Parameter estimates from fitting the novel network meta-analysis ANCOVA synthesis model (model 1) to the EQ-5D preference score and standardised pain end points

For both end points, model 1 indicates that acupuncture treatment increases the HRQoL of patients and/or reduces pain more than usual care and sham acupuncture treatments, irrespective of the pain group they belong to. For the EQ-5D end point, the median treatment effect of acupuncture compared with usual care in the osteoarthritis of the knee population is 0.079 (95% CrI 0.042 to 0.114); for headache/migraine and musculoskeletal pain patients the comparable median treatment effects are 0.056 (95% CrI 0.021 to 0.092) and 0.082 (95% CrI 0.047 to 0.116), respectively. The results also favour acupuncture over sham acupuncture, although with a greater degree of uncertainty, as reflected by the fact that the CrIs include zero for all pain types (osteoarthritis of the knee 0.022, 95% CrI –0.014 to 0.060; headache/migraine 0.004, 95% CrI –0.035 to 0.042; musculoskeletal pain 0.023, 95% CrI –0.007 to 0.053). The probability that acupuncture is the best treatment at improving HRQoL is 0.89 for osteoarthritis of the knee, 0.64 for headache/migraine and 0.95 for musculoskeletal pain.

For the SMD end point the median treatment effect of acupuncture compared with usual care in the osteoarthritis of the knee population is 0.703 (95% CrI 0.399 to 0.984); for headache/migraine and musculoskeletal pain patients the comparable median treatment effects are 0.588 (95% CrI 0.311 to 0.869) and 0.588 (95% CrI 0.334 to 0.863), respectively. The results also favour acupuncture over sham acupuncture. In contrast to the EQ-5D analysis, the CrIs do not include zero in the standardised pain analysis for osteoarthritis of the knee (0.438, 95% CrI 0.121 to 0.715) and musculoskeletal pain (0.527, 95% CrI 0.323 to 0.735), although the CrI for headache/migraine does (0.256, 95% CrI –0.073 to 0.560). The probability that acupuncture is the best treatment at improving standardised pain is 0.96–1.00, depending on pain type. These results are presented as a forest plot in Figure 13.

FIGURE 13. Forest plot showing the network meta-analysis results for the standardised pain and EQ-5D outcomes: (a) acupuncture vs.


Forest plot showing the network meta-analysis results for the standardised pain and EQ-5D outcomes: (a) acupuncture vs. usual care; (b) acupuncture vs. sham acupuncture; and (c) sham acupuncture vs. usual care. OAK, osteoarthritis of the knee.

The expectations were that some level of heterogeneity existed between trials. Possibly as a consequence of the mapping work performed, this expectation was not fulfilled for the EQ-5D end point (the between-study variance estimate is 0.001). For the standardised pain end point, the between-study variance was also small relative to the magnitude of the treatment effects (the between-study variance estimate is 0.09). The total residual deviance suggests that the models provide an adequate fit to the data (see Table 22).

Controlling for patient-level characteristics

Table 21 provides information on age of participants for each of the trials included in the data set. On average, age was lower in the headache/migraine pain group than in the musculoskeletal and osteoarthritis of the knee groups.

Using the change in EQ-5D as the outcome for synthesis, Table 23 presents the results of applying model 2 (an extension of model 1) to include patient-level information on age, age being a potential treatment effect modifier. The model fit statistics show that this adjusted by age model is marginally better than model 1, providing lower DIC statistics and reduced posterior residual deviance. The results of this model are very similar to those of model 1 and do not suggest that age is a strong effect modifier or that non-linear effects of age on the effect of treatments exist.



Parameter estimates from fitting the synthesis model including age as a covariate

Analysis with restricted evidence (models 3 and 4)

Results for models 3 and 4 are presented in Table 24, together with the model 1 results for comparison. Generally, all three models convey the same message in relation to which treatment provides higher increases in patients’ HRQoL, that is, acupuncture is found to be better than sham acupuncture and usual care treatments. Nevertheless, given the presence of baseline imbalance, models 3 and 4 (but model 3 in particular) provide very different and potentially inappropriate summary results of treatment effects when compared with model 1. These two models show also a fit to the data that is worse than model 1 (higher DIC of –6420 in model 1 compared with –69 and –3824 in models 3 and 4, respectively). In the absence of baseline outcome data, if the choice was between modelling change (model 3) or modelling follow-up scores (model 4), results from the latter model indicate that this would be a better option as the relative treatment effect estimates and pain interaction effects are closer to those in model 1.



Parameter estimates from fitting three different network meta-analysis models to the EQ-5D end point

Results of analysing resource use

Table 25 shows the results from regressing change in primary and secondary care health resources on change in EQ-5D index score for each study. Generally, an increase in EQ-5D score over time implies a reduction in health-care resource use. The analysis of secondary care resource use for osteoarthritis of the knee was an exception, with improvements in EQ-5D score being associated with increased secondary care attendances; however, this result was not statistically significant.



Resource use regression results

Results of the illustrative cost-effectiveness analysis

Illustrative cost-effectiveness results are presented in Table 26. The ICERs in each indication are well below the threshold of £20,000–30,000 per QALY generally considered acceptable in the UK. Results using the 0- to 3-month data for the resource use regressions were very similar to the results in the base case and are therefore not shown here. Acupuncture has close to a 100% probability of being cost-effective in patients with osteoarthritis of the knee and musculoskeletal pain types, and an 86% probability of being cost-effective for the headache/migraine indication, assuming a threshold of £20,000 per QALY. The sensitivity analysis using trial data with a weighted average of the therapist time observed in the trials provided fairly similar results, with musculoskeletal pain now obtaining the lowest estimated ICER compared with the other two pain groups, as shown in Table 26. The results of the probabilistic sensitivity analysis are presented as cost-effectiveness acceptability curves in Figure 14.



Cost-effectiveness results (probabilistic analysis) for each pain condition

FIGURE 14. Cost-effectiveness acceptability curves for each pain condition.


Cost-effectiveness acceptability curves for each pain condition. The curves indicate the probability that acupuncture is cost-effective at different values of the ceiling ratio. Separate curves are shown for osteoarthritis of the knee (OAK), headache/migraine (more...)


Principal findings

Policy-makers faced with difficult resource allocation decisions require estimates of the costs and effects of alternative treatment options. These estimates should reflect all relevant data and compare treatments using a metric that can be used across clinical areas – in the UK the QALY is typically used. Synthesising all relevant evidence to produce comparable estimates of costs and effects generates a series of challenges as the available evidence base rarely captures all costs and effects of treatment (because of the nature of data collection or the duration of follow-up), and often requires evidence to be generalised from different populations. The available trial evidence may compare different sets of treatments and in many instances the HRQoL data required to estimate QALYs directly are not available.

The National Institute for Health and Care Excellence has recommended acupuncture for the treatment of chronic headache and musculoskeletal pain but not in the context of chronic pain associated with osteoarthritis of the knee and headache/migraine.8082 This decision in part reflected concerns regarding the available evidence. The current study was commissioned as part of a programme intended to improve evidence around the costs and effects of acupuncture. This study synthesised IPD from RCTs of acupuncture in headache/migraine, musculoskeletal and osteoarthritis of the knee chronic pain. Trials compared acupuncture with usual care, sham acupuncture or both control interventions. Bayesian network meta-analysis synthesis modelling was therefore used in this study to leverage all available evidence to inform estimates of relative treatment effects. The studies reported heterogeneous and distinct outcome sets. Methods to homogenise outcomes for synthesis were therefore used. The availability of IPD for all studies expanded the set of feasible analyses and allowed development of de novo methods to fully exploit the benefits of access to these data.

Novel methods for network meta-analysis of IPD on continuous outcomes were developed, building on previous work on ANCOVA models for pairwise meta-analysis.219 Analysis of the pain outcome required development of methods for conducting SMD analysis with IPD. Analysis of the EQ-5D data required an extensive mapping exercise whereby separate mapping functions were applied to each study, with choice of mapping dependent on the available outcome data. Access to IPD allowed ANCOVA models to be applied, thus improving precision and adjusting for baseline imbalance. Access to IPD also avoided the use of any assumptions regarding the distribution of HRQoL instrument scores, thus allowing the observed distributions to be adequately reflected in the mapped utilities. Finally, access to IPD provided the opportunity to adjust for covariates based on within- and across-trial information. Given the demonstrable benefits of access to IPD, more effort should be made to share and develop repositories for data. A recent survey indicated a high level of support from reviewers affiliated with Cochrane Collaboration’s IPD Meta-analysis Methods Group for the development of a central repository for storing IPD.250

Analyses were conducted to explore the importance of modelling change scores in the presence of non-normally distributed outcome data and to explore the implications of using non-ANCOVA models, as would be necessary in the absence of IPD. The results showed that modelling final scores or change scores without baseline adjustment produced estimates of treatment effect that differed by up to 26% compared with the baseline adjusted model, emphasising the importance of baseline adjustment and therefore of having access to IPD.

The results of the network meta-analysis show acupuncture to be more effective than usual care with respect to reducing pain and improving HRQoL. There remains uncertainty regarding whether or not the benefit of acupuncture varies across the pain types analysed. The analysis of EQ-5D preference scores suggests that patients with the headache/migraine pain type may benefit from acupuncture, but less so than patients with osteoarthritis of the knee or musculoskeletal pain, although interaction effects are relatively uncertain. A reduced benefit in patients with headache/migraine-related chronic pain could be caused by ceiling effects as individuals with chronic headache/migraine pain had higher baseline EQ-5D index values. Results for the standardised pain analysis were more consistent across indications. Differences between acupuncture and sham acupuncture were relatively small. The large effect of the sham acupuncture intervention compared with usual care may reflect the potency of the sham comparators in the higher-quality trials included in the ATC systematic review. In contrast to the NICE guidelines, our results suggest that if anything the evidence base for acupuncture is stronger in the osteoarthritis of the knee and musculoskeletal conditions (for which acupuncture is recommended only for lower back pain) than in the headache/migraine pain group (for which acupuncture is recommended). The recommendations in the NICE osteoarthritis guidelines were heavily driven by comparisons with sham acupuncture. The network meta-analysis found strong evidence of an effect of acupuncture when compared with sham acupuncture in osteoarthritis of the knee for the standardised pain outcome but not the EQ-5D outcome (for which the CrI contained zero).

Considerable commonalities exist between the methodologies and the results presented in Chapter 2 and this chapter; however, there are some differences. Across pain types, the two chapters report minor differences in effect between acupuncture and usual care, and acupuncture and sham acupuncture. Nevertheless, the results were broadly consistent across the two chapters. The exact magnitude of the treatment effects and their precision inevitably varied given that there are differences in the data and methods being used. The current analysis used 28 trials (rather than 29) and consistently used the 3-month end point rather than the primary end point as in Chapter 2. For example, in Chapter 2 the two headache trials used the primary end point, which was at 6 months. Additionally, a different methodology was used. Chapter 2 used IPD pairwise meta-analysis based on a frequentist approach. In contrast, in this chapter, IPD network meta-analysis was implemented using a Bayesian random-effects framework. The synthesis model implemented in the current analysis considered all evidence and all available treatments of interest in a single analysis, simultaneously deriving relative treatment effects for all comparisons. Finally, Chapter 2 focused on the standardised pain outcome whereas this chapter analysed standardised pain and HRQoL (EQ-5D) estimates.

The cost-effectiveness results suggest that, compared with usual care alone, acupuncture is cost-effective with ICERs ranging from £9000 to £13,000 per QALY. These values fall within both the NICE plausible threshold range (i.e. between £20,000 and £30,000 per QALY gained) and a more recent empirical threshold estimate of £13,000 per additional QALY obtained.251 These values are comparable to those in other studies in the UK comparing acupuncture with usual care for the same pain indications, which have estimated ICERs of £4000–17,000.62,63,227 These ICERs were derived from individual studies, whereas the ICERs presented here reflect the synthesis of a large number of studies.


The study has a series of limitations. First, synthesis of heterogeneous outcomes relied on imperfect standardisation processes (which assume that any differences in within-trial outcome variability result from the use of different instruments) and mappings, which are typically able to explain only a minority of variation in EQ-5D scores. Clearly, the use of any mapping tool is considered a second best approach to directly eliciting relevant preference-based measures from study participants. The magnitude of bias introduced by using standardisation processes and mapping functions (and different mapping functions across trials) is unknowable. The availability of key outcomes across trials would have reduced these concerns, as would the collection of generic preference-based measures of HRQoL in all trials. A ‘core outcome set’ for osteoarthritis is available, along with the recommendation that future Phase III trials of knee, hip and hand osteoarthritis should evaluate the following domains: pain, physical function, patient global assessment and, for studies of ≥ 1 year, joint imaging.252 Other recommendations have tended to focus on domains rather than specific instruments. Recommendations that go beyond Phase III regulatory trials, and which define the instruments that should be used to measure outcomes in these domains, are warranted.

Second, outcome data closest to 3 months were selected for synthesis. The synthesis therefore requires the assumption that, in the minority of trials not reporting at 3 months, the available data are reflective of the 3-month time point. Some trials reported outcomes at months 1 and 2. If the effect of acupuncture is gradual, these effects may underestimate 3-month outcomes. For the cost-effectiveness analysis, the HRQoL effects observed at 3 months were applied from 0 to 3 months to generate QALYs. Other quality-of-life trajectories may, however, be more plausible. For example, quality of life may increase gradually during treatment and reduce gradually following treatment completion. Moreover, there is some evidence of benefits increasing for some time after the first 3 months when treatment was provided, for example at 12 months for headache/migraine60 and at 24 months for lower back pain.61 Depending on the nature and magnitude of these effects, the incremental benefit of sham acupuncture and acupuncture could be larger or smaller than presented here. Further work analysing repeated outcome measurements in a network meta-analysis could be used to evaluate the importance of these effects.

All sham interventions were assumed to be equivalent in the analysis, as were the usual care controls. Evidence from work recently conducted by the ATC suggests that the effect of sham acupuncture may vary depending whether penetrating or non-penetrating needles are used and that the effect of usual care may depend on whether or not a treatment protocol for usual care is specified.45 Exploration of a network including more refined comparator definitions may, therefore, be of value.

The impact of each pain condition on treatment effects was assumed to be exchangeable;173 this assumption could be explored further by comparing the fit of models assuming a common pain–treatment effect interaction and models assuming completely separate pain–treatment effect interactions.

The studies analysed here are from a range of countries, which may differ in terms of the method and intensity with which acupuncture is administered. For instance, following NICE recommendations for lower back pain, we assumed that acupuncture treatments are fixed at 10 sessions, irrespective of the pain condition. This assumption might be questionable as the optimum number of treatment sessions may vary according to setting and pain type. Also, acupuncture sessions were costed using a unit cost for a physiotherapist of £36 per hour. This is also an assumption of the current work as unit costs will depend on how the NHS will provide the service. In addition, differences in the nature of health care for chronic pain more generally could have impacted on outcomes.

The analysis of non-intervention resource use assumed that only primary care and specialist visits are impacted on by changes in outcomes following acupuncture, and that the impact of treatment on resource use can be captured through changes in the EQ-5D. It is possible, however, that this did not capture the full impact of treatment on resource use.

Our analysis of standardised pain included the primary end point for each study and, therefore, the outcomes on which we would expect the trials to have been powered. The outcomes included in the analysis ranged from pure pain measures to wider measures of HRQoL (e.g. total WOMAC score). Both pain and functioning outcomes have been highlighted in previous NICE Guidance Development Groups to be of critical importance to decision-making.8082 Our analysis suggests that, based on the standardised pain outcome, acupuncture is better than usual care and sham acupuncture for all indications, although CrIs include zero for the headache group when acupuncture is compared with sham acupuncture.

Recommendations for future research

First, a key limitation of this work is the use of imperfect standardisation processes to adequately combine available heterogeneous evidence. Thus, we consider it a research priority to identify key outcomes for the conditions considered here and improve reporting so that consistency exists across the body of evidence. Second, in the impossibility of achieving complete homogeneity of outcomes across the relevant evidence resulting in mapping tools being required, a worthwhile methodological extension of the current work would be to develop a model that would map the existing evidence to the desired outcome and simultaneously synthesise it together with other relevant evidence. Finally, it was highlighted in this work how important it is to have access to, and analyse, evidence at the individual level. It showed that IPD has clear value over summary data for both synthesis and decision modelling aspects of the analysis. Thus, continuing efforts to share this data type across the research community is highly commended.

Although results from this analysis provide robust estimates of the incremental costs and effects of acupuncture compared with usual care, they are unlikely to provide a suitable basis for decision-making. There is a wide range of alternative treatments for chronic pain and the relative value of these alternatives should be appraised alongside the costs and effects of acupuncture and usual care to reliably inform decision-making. In the context of osteoarthritis of the knee, an evaluation of a broader set of treatment options has been conducted and is presented in the following chapter.


This study presents methods for conducting IPD network meta-analysis of continuous outcomes when the instruments used to measure outcomes differ between trials. Using the example of acupuncture for the treatment of chronic pain, our novel methods show how heterogeneous outcomes can be analysed using standardisation and mapping approaches, and how the resulting outcomes can be translated into cost-effectiveness results to inform resource allocation decisions.

The methods developed allowed all available trials to inform the synthesis. Availability of IPD allowed the true distribution of outcome measures to be reflected in the mapping to EQ-5D and avoided the use of non-baseline-adjusted models, which produced quite different results. Use of baseline-adjusted change score models produced better results than non-adjusted models, suggesting the superiority of the ANCOVA framework in the context of treatment effect estimation.

The analysis found acupuncture to be more effective than usual care with respect to reducing pain and improving EQ-5D preference scores in patients with chronic pain of osteoarthritis of the knee, musculoskeletal and headache/migraine origin. The benefits of acupuncture over sham acupuncture are smaller than when compared with usual care. The probability that acupuncture is associated with better pain outcomes than sham acupuncture and usual care is high (> 0.96) across indications. The probability that acupuncture is associated with higher EQ-5D preference scores than sham acupuncture and usual care is high in osteoarthritis of the knee (0.89) and musculoskeletal chronic pain (0.95). For headache/migraine this probability is 0.64, reflecting the smaller benefit of acupuncture compared with sham acupuncture for this indication. The methods used provide outputs in a format that can be used to directly inform cost-effectiveness considerations once the full set of relevant comparators is considered.

Copyright © Queen’s Printer and Controller of HMSO 2017. This work was produced by MacPherson et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK409496


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (6.0M)
  • Disable Glossary Links

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...