U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Claxton K, Martin S, Soares M, et al. Methods for the estimation of the National Institute for Health and Care Excellence cost-effectiveness threshold. Southampton (UK): NIHR Journals Library; 2015 Feb. (Health Technology Assessment, No. 19.14.)

Cover of Methods for the estimation of the National Institute for Health and Care Excellence cost-effectiveness threshold

Methods for the estimation of the National Institute for Health and Care Excellence cost-effectiveness threshold.

Show details

Appendix 1Systematic review of the literature on the cost-effectiveness threshold

Systematic review approach


The aim of the systematic review was to inform the development of the conceptual framework, as well as the design, implementation and interpretation of the empirical analyses. Rather than define a set of very specific questions to answer through the review, the objective was to characterise the existing literature in terms of the questions addressed and approaches taken. However, it was hoped that insights would be provided on topics including:

  • general conceptualisation of the cost-effectiveness threshold
  • how NICE’s cost-effectiveness threshold should be defined, characterised and operationalised
  • approaches to estimating cost-effectiveness thresholds in general and the NICE threshold in particular.

In the initial stages of this systematic review it became clear that the ‘traditional’ method of conducting systematic searches of existing literature on the topic of the cost-effectiveness threshold would be insufficient to deal with the requirements of this particular study. Here we refer to the ‘traditional’ method as the practice of finding key terms and medical subject headings (MeSHs) that most accurately capture the range of literature relevant to the topic, while attempting to include as few irrelevant studies as possible (making use of programs such as MEDLINE).

The main weaknesses of using such an approach for a systematic review of this topic is that it requires a pre-existing knowledge of the terms used and topics covered in the current literature. This process has always required a degree of expertise (and luck) as to the strategy taken, including both knowledge of the literature to find likely search terms and skill in the construction of the strategies. The implications of excluding a single key term are potentially equivalent to ignoring vast areas of the literature. In addition, the traditional approach relies on key terms existing that suitably encapsulate the relevant literature. Finding common terms used in literature with potential relevance to the cost-effectiveness threshold was found to be a significant problem as many relevant topics were not specifically aimed at issues relating to the NICE cost-effectiveness threshold (e.g. the Martin et al. publications5759 which provide a precursor to this project). In addition, due to the wide range of coverage of topics such a ‘threshold’ and ‘cost-effective’, any attempts at a systematic review would be either excessively large or result in a clearly limited snap-shot of the existing literature.

As a result a pragmatic approach was taken to the identification of relevant papers, one of ‘pearl growing’ which can be defined here as the use of existing collections of studies to identify additional relevant parts of the literature. The approach uses a pool of ‘initial pearls’ to grow the literature both through references and citations until all relevant papers have been discovered. This approach therefore relies on the expertise of the authors of the existing literature to populate the pool of studies rather than the searcher’s potentially limited knowledge.

Although this approach of ‘pearl growing’ was significantly limited by the existing software available and has a time consuming element, it represents an approach that corrects for many of the failings of traditional searches for topics that share the characteristics of the cost-effectiveness threshold.

Systematic review methods

The ‘pearl growing’ method of systematic review can be characterised into five steps for the identification of relevant papers.

  1. Identification and extraction of ‘initial pearls’.
    • ‘Initial pearls’ were identified through consultation with researchers with experience of the cost-effectiveness threshold literature. Fourteen initial pearls were identified through this process. These publications were chosen for their wide-ranging coverage of the topic as well as their anticipated significance.
  2. Extraction of citations and references from ‘initial pearls’.
    • Citations: Web of Knowledge was selected to perform the citation searches. The reason for this selection was in part due to expert advice from an information specialist as well as brief and non-systematic investigations of citation results from a range of alternative software packages.
    • References: Web of Knowledge was also used for the collection of papers’ references.
    • Both citations and references were exported into an EndNote library (EndNote X6, Thomson Reuters, CA, USA) for the purpose of collection and further analysis (exclusion of repeats, title searching and review of the abstracts).
  3. Identification of further ‘pearls’ from cited and referenced papers.
    • Once citations and references of the ‘initial pearls’ had been collected, they were subjected to a set of investigations to identify further ‘pearls’.
    • Papers were excluded based on whether or not the titles or abstracts suggested the paper contained information on five topics of interest. These topics had been previously identified given the objectives of the project and from a review of the ‘initial pearls’ and included papers were classified by whether or not they could inform:
      • introduction to the cost-effectiveness threshold topic and policy context
      • discussion and debate around the current value use of the threshold
      • potential methods suggested to find a suitable threshold value
      • specific values proposed
      • the use of individual and societal valuations of health gains to inform the value of the threshold.
  4. Repetition of citation and reference searches.
    • The process was then repeated for the ‘pearls’ identified in step 3.
    • This process was repeated until no new ‘pearls’ were discovered by additional iterations.
  5. Manual search of references.
    • To ensure as complete a search had been conducted as possible a retrospective manual search of all of the ‘pearls’ references was conducted. Any potentially relevant references not discovered previously (most likely due to a mix of user error and limitations with the software used) were added to the analysis at the relevant step and further pearl growing methods applied to them to ensure completeness of results.

Systematic review results

The ‘pearl growing’ method of systematic review revealed 76 papers deemed relevant. The results from each stage of the process are reported in Figure 9. The figure highlights that after four iterations no new relevant papers were identified by the systematic process.

FIGURE 9. Graph showing process results from pearl growing systematic review.


Graph showing process results from pearl growing systematic review.

Review of the literature

Introduction and policy context

Due to the broad range of context which the relevant literature covers it is necessary to break down the literature review into several topics, these will be discussed independently. The 76 papers (see Papers discovered by the literature review for all of these papers) identified by the systematic review were defined into three different categories:

  1. literature covering the introduction to the cost-effectiveness threshold topic and policy context
  2. discussion and debate around the current value use of the threshold
  3. potential methods suggested to find a suitable threshold value.

These categories were chosen to reflect the broad range of relevant topics and areas of discussion covered by the cost-effectiveness threshold literature. It should be noted that the majority of the literature identified by the literature review fell into the first and last categories, with very few covering multiple categories sufficiently to be discussed in more than one section. The final category will only be discussed briefly as it can be seen as a separate, unrelated approach to the threshold required for purposes of decision-making by NICE.

The majority of papers (34 of the 76 papers discovered) identified in the literature review could be characterised as introducing the idea of a cost-effectiveness threshold (these consist of the very early literature pre-dating NICE) or discussing the policy context through the years.13,5,7,8,15,33,40,46,47,50,5255,114130 This section will characterise the main areas of discussion in the literature and briefly describe the key parts of the literature development.

Definition of the cost-effectiveness threshold

An important place to start is the consideration of how the literature has defined the cost-effectiveness threshold. This is important to analyse in the review as not only is it worth ensuring that a good definition has been presented, but it also allows us to assess whether or not the existing literature uses a definition that is both consistent and accurate.

One of the earliest definitions of something resembling the modern interpretation of the cost-effectiveness threshold comes from Weinstein and Zeckhauser.15 Their paper identifies a ‘critical ratio’ between monetary costs and a measure of health gains. This critical ratio was argued to represent ‘a cut-off point for allocation’ of an activity in a budget-constrained public sector entity (p. 1).15

A similar, more recent approach to define the threshold is that taken by Towse et al.130 where the author considered a hypothetical budget-constrained health-care sector, with a perfectly informed decision-maker who only considers the cost per QALY of health technologies. Assuming perfect information, the decision-maker is able to rank all of the potential health-care activities based on their cost per QALY. A decision-maker will implement as many of the relatively low cost per QALY activities as possible until the budget is used up. Eventually a point will be reached where society is not willing to pay for a further marginal increase in QALYs and would rather the funding be used on other consumption. The cost per QALY at which this cut-off occurs can be described as the cost-effectiveness threshold as it represents the switching point between an activity being funded and not. As the budget is assumed to be fully responsive, any new technologies with a cost per QALY below this threshold will be funded in the future.

National Institute for Health and Care Excellence and the cost-effectiveness threshold

The use and valuation of a cost-effectiveness threshold by NICE has been controversial. Williams8 highlighted three events that may be argued to have particularly muddied the water. First, NICE did not set a threshold value by the government at the time of its inception in 1999. This meant that NICE was obliged to come up with a de novo estimate fairly rapidly. Through his set of discussions with NICE, Williams stated that at the point of inception NICE came up with a value of ‘roughly £30,000 per QALY, plus or minus £5000 depending on the specific circumstances’ (p. 7)8

The second event which Williams refers to was NICE’s initial resistance to acknowledging that any form of threshold value existed. Following analyses such as Towse et al.130 and Devlin and Parkin6 investigating previous NICE decisions and inferring an implicit threshold, NICE began to publish details of its approach to an ICER threshold. The major step was the 2004 Guide to the Methods of Technological Appraisal5 that provided these details, although the definition of the £20,000–30,000 threshold range may be considered loose and open to interpretation. Although the 2004 guide was one of the first official references to the threshold, Sir Michael Rawlins did state at the 2001 NICE Annual General Meeting that the Institute would ‘need to be very clear in its reasons for supporting technologies with cost-effectiveness ratios higher than £30,000 per QALY’.130

Williams’ final event is the often quoted £20,000–30,000 threshold range having never been scientifically justified. Authors such as Rawlins and Culyer38 have argued that there has never been an empirical basis for the values or any definitive meaning behind the range. They therefore argued that the threshold should not be the only tool for NICE to draw conclusions about new technologies.

The threshold as a range

The idea of such a threshold range has been part of the literature for some time. Kaplan and Bush125 considered the idea of a less abrupt approach than that suggested by Weinstein and Zeckhauser.17 Kaplan and Bush125 investigated a set of early Medicare adoption decisions and presented broad criteria of acceptance based on a set of threshold ranges in terms of cost per additional well-year. These were defined as < $20,000/well-year (cost-effective), $20,000–100,000 (possibly controversial but justifiable), > $100,000 (questionable when compared with other expenditure). However, the authors noted that a$100,000 cut-off was not relevant to the policy decisions at the time and that all results would need significant future investigation. Similarly, Laupacis et al.53 presented five ‘grades of recommendation’ for decisions about technological reimbursement in Canada.

The conclusions of both of these papers can be represented graphically by Figure 10, which is also described or presented in much of the literature (see Rawlins and Culyer,38 Littlejohn in Towse et al.,130 McCabe et al.2 and Devlin and Parkin6). This graph represents the probability of rejection of a new technology as a function of the technology’s ICER. The graph clearly shows two points of inflection (A and B in Figure 10), these two points represent an interpretation of the lower and upper bounds of a cost-effectiveness threshold range.

FIGURE 10. Probability of rejection with a ‘soft’ cost-effectiveness threshold.


Probability of rejection with a ‘soft’ cost-effectiveness threshold. A and B represent the two points of infection.

The literature often makes use of the terms ‘soft’ and ‘hard’ when referring to the threshold. The term ‘soft’ is often used in a similar way to the threshold ‘range’ (alternatively Akehurst’s ‘smudge’130). Although the underlying idea is the same, a ‘soft’ threshold has also been used to refer to a single threshold. For example, McCabe et al.2 argued that it is both feasible and probably desirable to use a single threshold rather than a range, as the threshold should represent the point beyond which factors other than cost-effectiveness are considered. This approach would suggest that all new technologies with an ICER below the threshold should receive funding (regardless of their impact on other factors such as equity of health). It is, however, unclear from this paper what the implications are for technologies with an ICER beyond the single threshold value.

In contrast, a ‘hard’ threshold represents the situation where the ICER valuation is the sole relevant variable in an adoption decision, as demonstrated in Figure 11.119 It is an important point that if a ‘hard’ threshold is set, no other factors can be considered in the decision-maker’s consideration of a new technology. The difference between a ‘hard’ and a ‘soft’ threshold is therefore largely based on whether or not the ICER reflects all considerations. Assuming the decision-maker is optimising health, a hard threshold should represent the most effective allocation of a health-care budget, but cannot account for any equity concerns (such as the severity of the condition, unmet need and orphan diseases) that are not included in the calculation of the ICER. Authors such as Dolan et al.21 have demonstrated that a ‘hard’ threshold may not be able to suitably reflect the non-linearity of social or political values of QALYs to factors such as quality and length-of-life and for those with worse health prospects or dependents.

FIGURE 11. Graph showing a ‘hard’ cost-effectiveness threshold.


Graph showing a ‘hard’ cost-effectiveness threshold.

What does the threshold represent?

Two broad lines of thought have developed on what the threshold represents, social WTP and shadow pricing.1,2,8,12,16,17 The key difference between the two is the budget that should be considered by those accepting or rejecting health technologies. The social WTP approach (usually implicitly) assumes that the budget of the health-care sector is flexible to the value of health gains determined by society. So in this case it is the value society places on the health benefits (e.g. in QALYs) generated by new health-care programmes and technologies is estimated first, and then the health-care budget is the sum of society’s WTP for all treatments. In other words, the threshold is set exogenously with no reference to a budget constraint.

In contrast, the shadow pricing approach takes the budget as given (at least beyond the control of those who determine the cost-effectiveness threshold).1,2 The threshold is, therefore, endogenously based on the services currently provided within the system. When a new programme or technology is accepted into the system and imposes an additional cost onto the budget, the only way to meet those costs is to remove or down-scale existing services which will incur opportunity costs in terms of population health. Hence the threshold represents the ICER of the least cost-effective existing service covered by the budget. In principle, it is this service which is removed to fund a new programme or technology. In practice, a range of criteria is likely to be used to identify appropriate services for displacement to make room in the budget for new interventions.

In the UK, the main source of debate about which of these concepts of the threshold is the correct one lies in NICE’s remit. Authors such as Culyer et al.1 have discussed NICE’s position as a ‘searcher’ or a ‘setter’ of the threshold. The distinction between these two roles is that a threshold ‘searcher’ does not set a threshold with the motivation of maximising social welfare under the assumption of a flexible NHS budget, but instead investigates the threshold value that is appropriate given current NHS activities and the fixed budget as set down by Parliament.

Much of the literature on this topic is founded in the discussion of the correct constitutional role of NICE, the potential negative implications of setting a threshold and the feasibility of identifying displaced activities. In 2007, Culyer et al.1 argued that it is not appropriate for NICE to be characterised as a threshold setter. The authors argued that the setting of a threshold would effectively imply that NICE sets the NHS budget. The setting of the NHS budget, they highlight, is the constitutional responsibility of Parliament, not NICE. Hence the paper argues that NICE should concern themselves with being threshold ‘searchers’, seeking to identify ‘an optimal threshold ICER, at the ruling rate of expenditure, that is consistent with the aim of the health service to maximise population health’ (p. 4).1

In a similar vein Appleby et al.56 concluded that the threshold used by NICE should be consistent with the decisions made by local commissioners within the NHS. This is important given that NICE provides little guidance to the NHS regarding interventions suitable for disinvestment to release the funding necessary to cover the new technologies it recommends. If the threshold is set too high NICE may well accept new technologies which are less cost-effective than the services which local commissioners displace to fund those technologies. Conversely, if the threshold is set too low, NICE is likely to reject services that are cost-effective relative to existing services delivered from the NHS budget. The authors conclude that, in the short term, NICE have to act as a threshold ‘searcher’ to ensure continuity in the NHS.

Alternative arguments have been put forward which reject the idea of NICE as a threshold ‘searcher’. First, some authors (such as Gafni and Birch7,120) have made the case that an implicit threshold has the potential to lead to spiralling inflation if new cost-effective technologies are funded without sufficient disinvestment. However, McCabe et al.2 argued that Culyer’s characterisation of the NICE threshold could overcome this challenge if it were regularly reviewed so as to be flexible over time to changes in the NHS budget and the productivity of the sector, and if the threshold for new activities with a non-marginal budget impact was greater than those with a marginal impact. The issue of the inflationary pressure of a threshold is discussed further below.

Another concern raised about Culyer et al.’s1 characterisation of the NICE threshold is that of Towse.48 They argue that a lack of knowledge of the true opportunity cost of new activities makes us unable to identify the value of those activities being displaced and, therefore, it is impossible for NICE to ‘search’ for a threshold relating to activities displaced at the margin. The issue of the difficulty of identifying current activities at the margin in terms of cost-effectiveness will be dealt with later in this chapter.

Factors considered by the National Institute for Health and Care Excellence other than the comparison of the incremental cost-effectiveness ratio and threshold

As was discussed in the section The threshold as a range, the suitable threshold approach is dependent on the policy context around it, specifically if the comparison of the ICER with the threshold represents the only relevant piece of information that informs an adoption decision (a ‘hard’ threshold) or if it is simply one of many factors considered (‘soft’ threshold). In the case of the UK, NICE has openly stated the ICER of a technology is not the sole consideration of the committee in its adoption decisions.5

Both NICE and a number of other authors have provided overviews of the other factors that are considered by NICE in the adoption decision, these are provided in Table 35.



Table showing factors other than ICER considered by NICE

Table 35 suggests that the threshold is only one consideration to decision-makers at NICE. However, in principle, these other types of benefits could be added to health benefits and compared with potential treatments for displacement which also have wider social benefits. In other words, this wider set of considerations relating to the benefits of new technologies should arguably also be reflected in the threshold.a

Multiple thresholds

Similarly, some have argued for using different thresholds for different situations.2,17 The two main cases for using different thresholds are the size of the budgetary impact, or depending on whether the decision represents an investment in additional activities or a disinvestment in current activities.

The topic of different thresholds for different budgetary impacts of a proposed technology has received very little analytical attention from the literature. McCabe et al.2 argue that technologies with a large budgetary impact should be evaluated against a lower threshold than those with a relatively small impact. The reason for this is a large budgetary impact will require a greater displacement of current activities (assuming a fixed overall budget); this may result in displacement of non-marginal activities which may be associated with a lower ICER than those at the margin.

Several authors have suggested the use of different threshold values depending on whether the decision represents an investment in additional activities or a disinvestment in current activities. O’Brien et al.’s 2002 paper131 considers the difference in willingness to accept monetary compensation to forgo a health-care programme and WTP for the same benefit and link it to the cost-effectiveness threshold. This paper came from the perspective of the threshold representing social preferences rather than the shadow price of a fixed budget constraint and highlights that from a traditional ‘welfarist’ economics standpoint; a greater threshold value for disinvestment may be welfare maximising. Similarly, both Devlin and Parkin6 and Speight and Reaney132 have suggested a threshold for disinvestment of currently performed activities could be lower than for new activities, however, neither present any methodology for calculating the weight of a disinvested activity.

This is in contrast with the view that CEA guides the decisions of health systems with the objective of maximising some measure of health benefit subject to a budget constraint. Hughes and Ferner50 argued that differential thresholds with respect to investment and disinvestment would result in suboptimal levels of population health. This is because a new technology that would improve health may be rejected under a policy of having different thresholds for investment and disinvestment but not if the threshold values were the same. The authors argue that this failure to maximise population health represents an avoidable inefficiency not related to the aim of the health-care sector to maximise health and thus making the case for a single threshold value for disinvestment and investment. This point can be seen as a further case for the shadow price approach as opposed to the social WTP perspective as it highlights that, given a fixed NHS budget, the social WTP approach will not lead to a maximisation of health.

The need for an independent threshold panel

Related to the discussion over the correct role of NICE in determining a suitable cost-effectiveness threshold for the NHS is the literature on the potential for an independent threshold panel. Such a panel has been characterised in a similar manner to the Monetary Policy Committee (the setters of the Bank of England’s interest rate who act independently of the Government of the UK), as an independent committee responsible for the setting and updating of the cost-effectiveness threshold used by NICE.

The papers covering this topic are consistent in their call for an independent threshold panel, with no papers identified arguing against it. The main case provided in the literature for an independent setter is the removal of political influence; Claxton et al.99 argue that political influence may drive the threshold up as politicians seek to use the threshold as a means to encourage investment by pharmaceutical companies. Williams8 suggests that NICE is biased in the setting of a threshold, as its political connections mean a higher threshold makes it more popular with the ‘sellers’ (the author defines sellers as not only the pharmaceutical industry but health-care professionals and patient groups) by allowing more technologies to be approved. Similarly, papers by Appleby et al.56 and Raftery133 call for the creation of an independent threshold setter. The 2008 Health Select Committee14 recommended that a body independent of NICE should be established to set and review the threshold. However, it is unclear if such a body would also be independent of political influence or just of the NICE structure.

Arguments against the use of a cost-effectiveness threshold

A number of authors have argued against the use of a threshold. As mentioned earlier authors such as Gafni and Birch7,120 have suggested that the threshold approach risks leading to spiralling increases in inflationary pressures on health-care spending, and present an alternative approach based on the use of league tables of cost-effectiveness. The reason, they argue, is that there is no guarantee that the activities displaced are less cost-effective than those new technologies imposing costs on the health system budget. This observation is coupled with the expectation of authors such as Cohen and Looney118 that pharmaceutical firms will inevitably price their drugs so as to ensure the ICER of their proposed new technology is sufficiently close to the threshold to ensure adoption and thereby gain maximum producer surplus. This observation implies that providers such as the NHS may be forced to pay above market costs of new technologies by revealing their maximum WTP, in the form of the threshold. In addition the point raised in McCabe et al.2 that the threshold should be adjusted regularly over time to ensure its efficiency seeks to address both of these arguments.

Other authors such as Eichler et al.52 have raised and debated the issues around the theoretical base for the cost-effectiveness threshold, namely the assumption of perfect divisibility of health-care programmes, constant returns to scale and constant marginal opportunity costs.

Bridges et al.119 argues that a unique threshold value imposes impractical assumptions in the case of the US health-care sector, and fails to account for supply and demand side variations in the market. As an alternative the authors propose a series of thresholds that reflect regional, dynamic budgeting and general methodological differences. They conclude that the case for abandoning a fixed threshold outweighs those for keeping one in the USA and that any threshold should vary across payer, over time, in the true budget impact of interventions and in the measurement of the effectiveness of interventions. This argument has clear links to the argument for shadow pricing of the threshold rather than the social WTP approach, as the shadow price approach is based on the view that the threshold is determined by budget and current efficiency which can be seen to differ over time and across payers. The unresolved issue here is the degree to which different subgroups (e.g. by region or budget) require different threshold values.

Identification of activities under the threshold

An important part of the literature is the discussion around the identification of activities with an ICER greater than the proposed threshold. The importance of this discussion stems from the requirement of new activities to displace current activities that are at the margin of what is cost-effective. If it is not possible to identify these activities separately from others then threshold analysis is methodologically flawed, as the funding of a new activity may impact on an activity with an ICER above the proposed threshold.

Most literature on this topic focuses on the importance of identifying activities to be displaced rather than the process and feasibility of doing so. For example, Hughes and Ferner46 and McCabe et al.2 highlight the implications of inconsistent displacement on geographic variations in health-care provision and that the lack of consistency in the displacement process undercuts the use of a single cost-effectiveness threshold for the evaluation of new technologies. Similarly, Buxton51 suggests that, in order to fully appreciate the opportunity cost of the implementation of a new technology, we must have a clear knowledge of those activities displaced at the cost-effectiveness margin.

Few authors have sought to develop methods to identify the activities that should be displaced to free-up budget for new more cost-effective activities. Elshaug et al.52 outlines a set of criteria for the identification of existing, potentially non-cost-effective practices which could then be further assessed to determine their cost-effectiveness using health technology assessment. The criteria suggested include factors such as new evidence on safety; efficacy or cost-effectiveness; geographic variation that have become apparent since technology adoption; heterogeneity in the clinical procedure; and technological development.

The current value of the threshold

As it became evident that decision-making bodies such as NICE are using (more or less explicit) cost-effectiveness thresholds, there has been a significant level of debate over its appropriate value.6,8,38,39,45,48,51,56,130,134137 In this section we will present three areas of the debate:

  1. the lack of empirical basis to the current value
  2. arguments over the value being generally too high or too low
  3. if and how the threshold should change over time.

Lack of empirical basis to the current value

Since NICE made it clear that it uses an explicit threshold5 there has been little hiding the lack of evidential justification behind the £20,000–30,000 range. Indeed the Health Select Committee14 heard (during their enquiry into NICE in 2008) that the NICE threshold has no basis in hard science. Similarly, Appleby et al.56 noted that ‘the uncomfortable truth is that NICE’s threshold has no basis in either theory or evidence’.

Similarly, the US value of $50,000 per QALY, which is often cited as the cost-effectiveness threshold relevant to resource allocation decisions in that country, is often attacked for its lack of empirical founding.33,45,122,124 Some have suggested that the US figure is rooted in the cost-effectiveness of hospital renal dialysis,122 although why this makes it suitable for use more generally is unclear.

The threshold changing over time

Another concern of current NICE practice is the apparent lack of change in the threshold value used since the body’s inception. Many authors have argued that factors such as the NHS budget, price inflation, technological developments in the NHS and the discount rate applied to economic evaluations35,122,125,126 have all changed since the first use of the cost-effectiveness threshold. As such, the threshold should have changed to reflect this fact. Braithwaite and Roberts45 sought to demonstrate the impact of budget and technological growth on the optimal threshold. By creating a computer simulation of the US Medicare system, the authors were able to demonstrate the impact of these factors. Although there is no doubt in the literature that the NICE threshold should potentially change over time,a no papers have been identified which model the impact of any changes on the threshold.

Both Ubel et al.136 and Raftery133 discuss the principles behind the directional change the threshold should take over time. Ubel et al.136 have argued that the optimal threshold value needs to fall over time assuming medical innovation continues at roughly its current rate. Raftery133 has noted that, in real terms, the threshold has been falling since 1999 as, in order to stay constant in real terms, it should have increased given inflation (up 40% in the time period) and increased NHS spending (up 90%). The authors argue that this decline in the threshold should have been observed in the value used by NICE in decision-making. They describe the suggestion of a rise in the threshold being linked to the observed growth of the NHS budget over the last decade as ‘audacious’.133 It is unclear to what extent the authors disagree with this interpretation of NHS efficiency as a relevant factor affecting the optimal threshold.

Threshold value generally too high or low

The majority of the debate over the current use of the threshold in the UK (and elsewhere) has been centred on whether the current value is too low or too high. The papers that will be discussed in this section focus on the general discussion of necessary directional change in the value rather than the presentation of a specific value; the latter is discussed in more detail in the following section on the proposed values of the threshold in the literature.

Vernon et al.137 presented an analysis of the implications of the threshold being above or below its optimum value in terms of signals to the companies involved in research and development of new medical products. The authors concluded that if the threshold is set too low (below the economic value of the health benefit) it will result in research and development investment levels that are too low relative to their economic value (at the margin). The reason for this lies in a lack of returns to investments for the pharmaceutical companies. However, in the isolated case of the threshold relevant to the NHS (a small proportion of the world pharmaceutical market), the impact of changes to the threshold on the international pharmaceutical market equilibrium is unknown but likely to be small.

Similarly, thresholds set too high (above the economic value of the health benefit) will result in inefficiently high levels of research and development spending, such that the health-care provider is funding projects that do not have a sufficient impact on social welfare.

The literature that argued the threshold is too high in the UK can be broadly characterised into three key papers. Williams8 made the case that, intuitively, the threshold should not be significantly greater than the gross domestic product (GDP) per capita (roughly £18,000 in the UK in 2004). He made the case that, although it may be possible to provide a lot of the population with health care when the threshold is above the GDP per capita, it is not possible to provide health care for much of the population without imposing great hardship on those expected to foot the bill (the tax payer or government debt).

Second, Raftery133 argued that, although the UK threshold has been historically too high, it does not need reducing as the real value has decreased since 1999 due to inflationary pressure and increases in the NHS budget. He also suggests that recent policies implemented by NICE, such as greater weight being given to the benefits of treatment accruing to patients at the end of their life, need to be offset by reductions in the threshold for all other treatments for expenditure to remain within the NHS budget. Finally, Raftery cites the opportunity cost analysis of trastuzumab (Herceptin®, Roche)114 which showed that more cost-effective oncology services were being sacrificed to fund trastuzumab in breast cancer. This result suggests directly that, in some cases at least, the threshold value is too high.

Work by Martin et al.5759 investigated the cost per life-year saved in a selection of the 23 PBCs used in the NHS; these results are presented in Table 36. It is important to note that these results are presented as the cost per YLG rather than the cost per QALY of the least cost-effective current activity. The authors and others have used these results to argue that the threshold used by NICE may be too high.128 Similarly, Collier’s47 report of the Health Select Committee suggests that the threshold used by NICE is higher than that used by PCTs.

TABLE 36. Table showing cost per YLG results of Martin et al.


Table showing cost per YLG results of Martin et al. papers

In contrast, a range of authors have argued that the current NICE threshold is too low. Both Speight and Reany132 and Towse48 argued that the inclusion of wider social costs/benefits and full consideration of social WTP for additional health gains show that the threshold should be significantly larger. Both cite recent NICE work by Mason et al.29,135 which suggested the threshold should be between £30,000 and £75,000 per QALY based on attempts to model a WTP-based value of a QALY based on observations of the value of avoiding a statistical fatality. Similarly, in the USA, Ubel et al.136 have argued that, if inflation and WTP valuations are taken into account, the relevant threshold in the US should be closer to $200,000 per QALY than the regularly cited $50,000.

Those analyses which conclude the UK and US thresholds should be significantly higher have, at the core of their argument, the assumption that the respective health-care budget is fully capable of responding to society’s WTP for additional health gains.

Potential methods for threshold estimation

There are broadly three approaches that can be taken to determine the threshold value:51,56 social WTP, non-analytical approaches. Such as expert elicitation and shadow pricing of the budget constraint. This project is concerned with the latter approach to estimating the cost-effectiveness threshold. This is entirely consistent with the remit of the NHS in general and NICE in particular – they do not set the NHS budget but have to allocate those finite resources appropriately.

Papers seeking to elicit social willingness to pay and non-analytical approaches

The majority of the literature that has presented a proposed value for the threshold (in the UK, USA and elsewhere) has done so using valuation methods based on WTP for an additional health benefit.1832,3437 However, other approaches have been suggested. For example, the WHO’s 2002 report138 suggested that interventions costing less than three times GDP per capita for each DALY averted represent good value for money.

Lee et al.139 sought to update the US ‘dialysis standard’ often claimed to be the base of the US Medicare threshold.122 The authors present a valuation of $129,090 per QALY based on current dialysis practice in the USA. Finally, in an appendix to their edited book, Towse et al.130 provide an interesting set of results drawn from a set of participants to the associated workshop (the majority of which were health economists). The participants were asked to anonymously record their view on what threshold NICE should apply. Eighteen responses were recorded with the average of all responses being £29,000 per QALY.

Papers considering the shadow price of the budget constraint

The systematic review only identified four different papers by three different authors that suitably fell into the category of shadow pricing of the budget constraint.

Williams8 suggested investigating the cost-effectiveness of NHS interventions that represent the majority of the budget (he speculated that some 300 interventions accounted for about 90% of the cost incurred by the NHS). The purpose of this would be to identify current NHS activities that might not be cost-effective. He acknowledged the implausibility of conducting full technological appraisals on such a large number of interventions (estimating this would take 10 years, at which point it would be necessary to re-evaluate the initial appraisals), and thus suggested relying on expert opinion and existing patient data to speed up the process.

While Williams’ recommendations related to identifying current interventions with a high cost per QALY as the basis for disinvestment, there is the potential to take this approach further and use it for a method to determine the cost-effectiveness threshold even down to the level of a local decision-maker. This was attempted by Appleby et al.49 who conducted a feasibility experiment into the estimation of the appropriate NHS threshold by examining decision-making in the NHS at a local level. The authors propose a structured model considering the new technology’s cost per weighted QALY gain in a table of all existing services. In an attempt to test the feasibility of this model they conducted interviews with senior NHS staff as well as investigating information on public health to construct a list of health-care services introduced or discontinued in 2006/7. The authors found that it was feasible to identify decisions and to make the important step of estimating their cost-effectiveness; however, they noted that any attempts to fully evaluate sufficient decisions as to estimate a threshold would require a detailed understanding of the decision structure at a local level as well as a significant number of observations.

The other key papers seeking to develop and implement methods for estimating the NHS threshold were those of Martin et al.5759 They aimed to establish a link between health-care spending and health outcomes in the NHS after having adjusted for the need of the patient population. They made use of data around the observed mortality at PCT level in the NHS alongside expenditure data on health care across 23 programmes of care based on ICD-10 disease categories. As has been mentioned earlier in this chapter these papers present the cost per life-year for a range of PBCs; however, the key result of these papers is that it is possible to make use of existing data to determine such valuations for current NHS interventions. The authors concluded that although their results are highly limited and do not present a single cost per QALY estimate for the optimal threshold they can ‘inform the decisions of NICE on whether their current threshold for accepting new technologies is set at an appropriate level’ (p. 37). These studies are the precursor of analyses presented in this report, and further details can be found in Appendix 2 and in Chapter 3 of the main report.

In the area of the efficient allocation of health care it is also important to note the contribution of the earlier mathematical papers such as Stinnett and Paltiel16 who outlined mathematical techniques to approach the problem through the use of a mixed integer programming approach. Although their approach differs from the interpretation of the threshold as used in this study it represented an important step in the evaluation of the methodology of seeking to solve the optimisation problem apparent in health care.


This systematic review of the literature surrounding the cost-effectiveness threshold has highlighted the significant range and diversity of the literature. Despite the international and mature nature of the literature there are significant differences in the suggested methods to represent a cost-effectiveness threshold. The main areas of debate relevant to this report have revolved around the role of NICE as a ‘searcher’ or ‘setter’ of the threshold.1,2 Although some authors have implicitly argued for NICE to fulfil the role of a threshold ‘setter’ by suggesting methods of elicitation of social WTP valuations of a QALY, death or life-year,1832,3437 the literature of most relevance to this research has sought to consider estimation methods consistent with its role as a ‘searcher’.5659

Papers discovered by the literature review

Note: not all of these papers are referenced in this appendix and some references used were not discovered through the systematic review.

Initial pearls

Appleby J, Devlin N, Parkin D. NICE’s cost effectiveness threshold – how high should it be? BMJ 2007;335:358–9.

Appleby J, Devlin N, Parkin D, Buxton M, Chalkidou K. Searching for cost effectiveness thresholds in the NHS. Health Policy 2009;91:239–45.

Bridges JFP, Onukwugha E, Mullins CD. Healthcare rationing by proxy cost-effectiveness analysis and the misuse of the $50 000 threshold in the US. Pharmacoeconomics 2010;28:175–84.

Culyer A, McCabe C, Briggs A, Claxton K, Buxton M, Akehurst R, et al. Searching for a threshold, not setting one: the role of the National Institute for Health and Care Excellence. J Health Serv Res Policy 2007;12:56–8.

Devlin N, Parkin D. Does NICE have a cost-effectiveness threshold and what other factors influence its decisions? A binary choice analysis. Health Econ 2004;13:437–52.

Gafni A, Birch S. Incremental cost-effectiveness ratios (ICERs): the silence of the lambda. Soc Sci Med 2006;62:2091–100.

McCabe C, Claxton K, Culyer AJ. The NICE cost-effectiveness threshold – what it is and what that means. Pharmacoeconomics 2008;26:733–44.

Raftery J. Should NICE’s threshold range for cost per QALY be raised? No. BMJ 2009;338:268–9.

Towse A. Should NICE’s threshold range for cost per QALY be raised? Yes. BMJ 2009;338:268–9.

Braithwaite RS, Meltzer DO, King JT Jr, Leslie D, Roberts MS. What does the value of modern medicine say about the $50,000 per quality-adjusted life-year decision rule? Med Care 2008;46:349–56.

Grosse S. Assessing cost-effectiveness in healthcare: history of the $50,000 per QALY threshold. Exp Rev Pharmacoecon Outcomes Res 2008;8:165–78.

Rawlins MD, Culyer AJ. National Institute for Clinical Excellence and its value judgments. BMJ 2004;329:224–7.

Chambers JD, Neumann PJ, Buxton MJ. Does Medicare have an implicit cost-effectiveness threshold? Med Decis Making 2010;30:E14–27.

Step 1 results

Brouwer W, can Exel J, Baker R, Donaldson C. The new myth – the social value of the QALY. Pharmacoeconomics 2008;26:1–4.

Claxton K, Lindsay AB, Buxton MJ, Culyer AJ, McCabe C, Walker S, et al. Value based pricing for NHS drugs: an opportunity not to be missed? BMJ 2008;336:251–4.

Cohen J, Looney W. Re: How much Is life worth: cetuximab, non-small cell lung cancer, and the $440 billion question. J Natl Cancer Inst 2010;102:1044–8.

Eichler HG, Kong SX, Gerth WC, Mavros P, Jonsson B. Use of cost-effectiveness analysis in health-care resource allocation decision-making: how are cost-effectiveness thresholds expected to emerge? Value Health 2004;7:518–28.

Green C, Gerard K. Exploring the social value of health-care interventions: a stated preference discrete choice experiment. Health Econ 2009;18:951–76.

Groot W, van den Brink HM. The value of health. BMC Health Serv Res 2008;8:136.

Hughes DA, Ferner RE. New drugs for old: disinvestment and NICE. BMJ 2010;340:690–2.

Lieu TA, Ray G, Ortega-Sanchez I, Kleinman K, Rusinak D, Prosser L. Willingness to pay for a QALY based on community member and patient preferences for temporary health states associated with herpes zoster. Pharmacoeconomics 2009;27:1005–16.

Mason AR, Drummond MF. Public funding of new cancer drugs: is NICE getting nastier? Eur J Cancer 2009;45:1188–92.

Mason H, Jones-Lee M, Donaldson C. Modelling the monetary value of a qaly: a new approach based on UK data. Health Econ 2009;18:933–50.

Maynard A, Bloor K. The future role of NICE. BMJ 2010;341:c6286.

Rascati KL. The $64,000 question – what is a quality-adjusted life-year worth? Clin Ther 2006;28:1042–3.

Rawlins M, Barnett D, Stevens A. Pharmacoeconomics: NICE’s approach to decision-making. Br J Clin Pharmacol 2010;70:346–9.

Shiroiwa T, Sung YK, Fukuda T, Lang HC, Bae SC, Tsutani K. International survey on willingness-to-pay (WTP) for one additional QALY gained: what is the threshold of cost effectiveness? Health Econ 2010;19:422–37.

Speight J, Reaney M. Wouldn’t it be NICE to consider patients’ views when rationing health care? BMJ 2009;338:297.

Tappenden P, Brazier J, Ratcliffe J, Chilcott J. A stated preference binary choice experiment to explore NICE decision making. Pharmacoeconomics 2007;25:685–93.

Weinstein MC. How much are Americans willing to pay for a quality-adjusted life-year? Med Care 2008;46:343–45.

Yaesoubi R, Roberts SD. A game-theoretic framework for estimating a health purchaser’s willingness-to-pay for health and for expansion. Health Care Manag Sci 2010;13:358–77.

Step 2 results

Appleby J, Devlin N, Parkin D, Buxton M, Chalkidou K. Searching for Local NHS Cost Effectiveness Thresholds: A Feasibility Study. NICE Conference Manchester. 5–6 December 2007. URL: www.nice2007.co.uk/ApplebyDevlin.pdf (accessed 12 January 2012).

Birch S, Gafni A. The biggest bang for the buck or bigger bucks for the bang: the fallacy of the cost-effectiveness threshold. J Health Serv Res Policy 2006;11:46–51.

Braithwaite RS, Roberts MS. $50,000 per QALY: Inertia, Indifference, or Irrationality? Presented at the Annual Meeting of the Society for Medical Decision Making. Atlanta, GA, USA, 17–20 October, 2004.

Drummond M, Torrance G, Mason J. Cost-effectiveness league tables: more harm than good? Soc Sci Med 1993;37:33–40.

Gerard K, Mooney G. QALY league tables: handle with care. Health Econ 1993;2:59–64.

Gyrd-Hansen D. Willingness to pay for a QALY: theoretical and methodological issues. Pharmacoeconomics 2005;23:423–32.

Hammitt JK. The $64,000 question: what are we willing to pay for a QALY. ISPOR Connect 2005;11:7–9.

Hirth RA, Chernew ME, Miller E, Fenderick AM, Weissert WG. Willingness to pay for a quality-adjusted life-year: in search of a standard. Med Decis Making 2000;20:332–42.

King JT Jr, Tsevat J, Lave JR, Roberts MS. Willingness to pay for a quality-adjusted life-year: implications for societal health care resource allocation. Med Decis Making 2005;25:667–77.

Lee C, Chertow G, Zenios S. An empiric estimate of the value of life: updating the renal dialysis cost-effectiveness standard. Value Health 2009;12:80–7

Martin S, Rice N, Smith P. Further Evidence on the Link Between Health Care Spending and Health Outcomes in England [CHE 28. National Institute for Health and Care Excellence. NICE discussion paper 32]. York: University of York; 2007.

Martin S, Rice N, Smith PC. The Link Between Health Care Spending and Health Outcomes for the New English Primary Care Trusts. CHE Research Paper 42. York: University of York; 2008.

Mauskopf J, Rutten F, Schonfeld W. Cost-effectiveness league tables: valuable guidance for decision makers? Pharmacoeconomics 2003;21:991–1000.

Smith RD, Richardson J. Can we estimate the ‘social’ value of a QALY? Four core issues to resolve. Health Policy 2005;74:77–84.

Stinnett AA, Paltiel AD. Mathematical programming for the efficient allocation of health care resources. J Health Econ 1996;15:641–53.

Towse A, Pritchard C, Devlin N, eds. Cost Effectiveness Thresholds: Economic and Ethical issues. London: Office of Health Economics, The King’s Fund; 2002.

Ubel PA, Hirth RA, Chernew ME, Fendrick AM. What is the price of life and why doesn’t it increase at the rate of inflation? Arch Int Med 2002;163:1637–41.

Williams A. What Could Be Nicer Than NICE? London: Office for Health Economics; 2004.

Winkelymayer WC, Weinstein MC, Mittelman MA, Glynn RJ, Pliskin JS. Health economic evaluations: the special case of end-stage renal disease treatment. Med Decis Making 2002;22:417–30.

Step 3 results

Baker R, Bateman I, Donaldson C, Jones-Lee M, Lancsar E, Loomes G, et al. Weighting and valuing quality-adjusted life-years using stated preference methods: preliminary results from the Social Value of a QALY Project. Health Technol Assess 2010:14(27).

Bobinac A, van Exel N, Rutten FFN, Werner B. Willingness to pay for a quality-adjusted life-year: the individual perspective. Value Health 2010;13:1046–55.

Brock DW. How much is more life worth? Hastings Center Rep 2006;36:17–19.

Byrne MM, O’Malley K, Suarez-Almazor ME. Willingness to pay per quality-adjusted life-year in a study of knee osteoarthritis. Med Decis Making 2005;25:655–66.

Griffin S, Claxton K, Sculpher M. Decision analysis for resource allocation in health care. J Health Serv Res Policy 2008;13:23–30.

Gyrd-Hansen D. Willingness to pay for a QALY. Health Econ 2003;12:1049–60.

Harrison S. A policy agenda for health-care rationing. Br Med Bull 1995;51:885–99.

Laufer F. Thresholds in cost-effectiveness analysis – more of the story. Value Health 2005;8:86–7.

Pinto-Prades JL, Loomes G, Brey R. Trying to estimate a monetary value for the QALY. J Health Econ 2009;28:553–62.

Vernon JA, Goldberg R, Golec J. Economic evaluation and cost-effectiveness thresholds signals to firms and implications for R&D investment and innovation. Pharmacoeconomics 2009;27:797–806.

Step 4 results

Abelson P. The value of life and health for public policy. Econ Record 2003;79:S2–13.

Fryback DG, Lawrence WF. Dollars may not buy as many QALYs as we think: a problem with defining quality-of-life adjustments. Med Decis Making 1997;17:276–84.

Gafni A, Birch S. Guidelines for the adoption of new technologies – a prescription for uncontrolled growth in expenditures and how to avoid the problem. CMAJ 1993;148:913–17.

Johnson FR. Einstein on willingness to pay per QALY: is there a better way? Med Decis Making 2005;25:607–8.

Laupacis A, Feeny D, Detsky A, Tugwell P. How attractive does a new technology have to be to warrant adoption and utilization – tentative guidelines for using clinical and economic evaluations. CMAJ 1992;146:473–81.

Polsky D. Does willingness to pay per quality-adjusted life-year bring us closer to a useful decision rule for cost-effectiveness analysis? Med Decis Making 2005;25:605–6.

Martin S, Rice N, Smith P. The Link Between Health Care Spending and Health Outcomes: Evidence from English Programme Budgeting Data. CHE Research Paper 24. York: Centre for Health Economics; 2007.

Chambers JD, Neumann PJ, Buxton MJ. Does Medicare have an implicit cost-effectiveness threshold? Med Decis Making 2010;30:E14–27.

Step 5 results

Birch S, Gafni A. Cost-effectiveness ratios – in a league of their own. Health Policy 1994;28:133–41.

Johnson FR, Backhouse M. Eliciting stated preferences for health-technology adoption criteria using paired comparisons and recommendation judgments. Value Health 2006;9:303–11.

Step 6 results

Dolan P, Shaw R, Tsuchiya A, Williams A. QALY maximisation and people’s preferences: a methodological review of the literature. Health Econ 2004;14:197–208.

Baker R, Bateman I, Donaldson C, Jones-Lee M, Lancsar E, Loomes G, et al. Weighting and valuing quality-adjusted life-years using stated preference methods: preliminary results from the Social Value of a QALY Project. Health Technol Assess 2010;14(27).

O’Brien BJ, Gertsen K, Willan A, Faulkner L. Is there a kink in consumers’ threshold value for cost-effectiveness in health care? Health Econ 2002;11:175–80.

Buxton M. How much are health-care systems prepared to pay to produce a QALY? Eur J Health Econ 2005;6:285–7.

Mason AR, Drummond MF. Public funding of new cancer drugs: is NICE getting nastier? Eur J Cancer 2009;45:1188–92.

Copyright © Queen’s Printer and Controller of HMSO 2015. This work was produced by Claxton et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK274312


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (16M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...