U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Littlewood E, Ali S, Dyson L, et al. Identifying perinatal depression with case-finding instruments: a mixed-methods study (BaBY PaNDA – Born and Bred in Yorkshire PeriNatal Depression Diagnostic Accuracy). Southampton (UK): NIHR Journals Library; 2018 Feb. (Health Services and Delivery Research, No. 6.6.)

Cover of Identifying perinatal depression with case-finding instruments: a mixed-methods study (BaBY PaNDA – Born and Bred in Yorkshire PeriNatal Depression Diagnostic Accuracy)

Identifying perinatal depression with case-finding instruments: a mixed-methods study (BaBY PaNDA – Born and Bred in Yorkshire PeriNatal Depression Diagnostic Accuracy).

Show details

Chapter 7Cost-effectiveness of screening/case-finding strategies for perinatal depression

The overall aim of the economic evaluation was to compare a range of screening/case-finding strategies for prenatal and postnatal depression, in terms of costs and health benefits, to establish their relative cost-effectiveness in the context of UK NHS. The analysis was conducted from the NHS/Personal Social Services perspective, with costs expressed in 2015/16 prices and health outcomes expressed in terms of QALYs. The time horizon for the postnatal model was 1 year, whereas the time horizon for the prenatal model was up to the point of birth; as such, no discounting of costs and health outcomes was required.

Methods

Overall approach of cost-effectiveness analysis

A decision model was developed to evaluate the costs and outcomes of screening/case-finding strategies for perinatal depression. Decision modelling is a systematic approach to decision-making under conditions of uncertainty. It is based on logical and temporal sequence of events (such as positive diagnosis followed by treatment) that would flow from a set of alternative options being evaluated. The likelihood of each pathway is expressed in terms of probabilities, and consequences are expressed in terms of costs and outcomes. Subsequently, the expected costs and outcomes of each consequence are calculated as the sum of the costs and outcomes weighted by the probability of each pathway.

Decision models also take into account uncertainty in the evidence base, including uncertainty in diagnostic accuracy of screening/case-finding strategies, costs and outcomes. Hence, these input data (i.e. parameters) enter the model as probability distributions to reflect uncertainty around the mean estimates. To simultaneously assess the joint impact of uncertainty in model inputs, we conduct PSA using Monte Carlo simulations. This is the recommended approach of propagating uncertainty in model parameters.171 Based on this probabilistic analysis, the results provide the probability of each strategy being the most cost-effective, conditional on decision-makers’ WTP for gain in health benefit. Finally, we also conducted a series of sensitivity analyses to evaluate the impact of model assumptions on the overall decision.

The following sections provide a detailed description of the screening/case-finding strategies evaluated in the structure of the decision model and the sources and assumptions about parameter inputs in the model. The cost-effectiveness analysis is reported according to the Consolidated Health Economic Evaluation Reporting Standards statement.172

Screening/case-finding strategies evaluated in the cost-effectiveness analysis

A number of strategies were considered for the cost-effectiveness analysis, based on the most recent NICE guidance (2014) on clinical management of antenatal and postnatal depression,8 and reflecting clinical practice in the NHS sites involved in the BaBY PaNDA study. Both one- and two-stage screening/case-finding strategies were compared; these included the following:

  • Whooley questions only
  • EPDS only
  • Whooley questions followed by EPDS
  • Whooley questions followed by PHQ-9
  • standard care case identification.

For the EPDS instrument, cut-off points of ≥ 10 and ≥ 13 were used in the cost-effectiveness analysis, which is in line with the findings of the statistical analysis reported earlier (see Chapter 5). For two-stage screening/case-finding strategies, Whooley questions were used as the first instrument because they have higher sensitivity than EPDS and PHQ-9, and are also less time-intensive than the other instruments. In line with NICE guidance (Clinical Guideline 192),8 the standard care case identification refers to routine clinical assessment that HPs would undertake to arrive at a diagnosis of depression in the perinatal period without the formal use of a diagnostic instrument. This was assumed to be provided by a GP and/or HV.

Structure of the decision model

We developed a decision tree model to evaluate the relative cost-effectiveness of screening/case-finding strategies. The model was based on the most recent NICE guidance on clinical management of antenatal and postnatal mental health.8 The model had two linked components: (1) a case-finding model, which uses diagnostic performance data for each screening/case-finding strategy to determine rates of TP, FP, TN and FN outcomes, and applies the associated administrative cost; and (2) a treatment model, which evaluates the costs and health outcomes (measured in terms of QALYs) of each of the four diagnostic outcomes of part (1), TP, FP, TN and FN.

The decision model then attaches costs and health consequences to each pathway, and estimates the expected costs and outcomes of each screening/case-finding strategy as a product of probability of the pathway and the outcome/cost. The model was developed using Microsoft Excel 2013 (Microsoft Corporation, Redmond, WA, USA).

Screening/case-finding model

The structure of the screening/case-finding model follows the model developed as part of NICE guidance8 and is presented in Figures 9 and 10. Both prenatal and postnatal models follow the same structure. The model pathways are described in detail below.

FIGURE 9. Diagnostic pathways: single screen.

FIGURE 9

Diagnostic pathways: single screen. SCI, standard care case identification.

FIGURE 10. Diagnostic pathways: sequential screening/case-finding.

FIGURE 10

Diagnostic pathways: sequential screening/case-finding.

At the start of the model, all women in the prenatal or postnatal period (whether depressed or non-depressed) undergo screening/case-finding for depression. The prenatal model starts at 20 weeks into pregnancy whereas the postnatal model starts at 12 weeks after childbirth. The model assumes that screening/case-finding strategies are implemented by MWs during pregnancy and by HVs postnatally. This is in line with the time at which screening/case-finding strategies were implemented in this study. Women undergo either one- or two-stage screening/case-finding, as described above. If undergoing a one-stage strategy (including ‘Whooley questions only’ or ‘EPDS only’), the model assumes that no further screening/case-finding will be offered. Two-stage screening/case-finding involves answering the Whooley questions first, followed by either EPDS or PHQ-9 if a woman is found to be depressed (positive) based on the Whooley questions.

The sensitivity and specificity of the Whooley questions, EPDS and PHQ-9 were established using a diagnostic gold standard clinical assessment of depression (i.e. the CIS-R) (see Chapter 5 for further details). Based on the outcome of the screening/case-finding strategy, women are classified in one of the four categories: (1) TP, (2) FP, (3) TN or (4) FN. Based on this classification, they are either treated or not treated. To simplify the model description, each of the pathways is described in detail below.

True positive

Women who were depressed and correctly identified as depressed were classed as TP. The probability of TP is based on the sensitivity of a screening/case-finding strategy. For a single-stage strategy it is the sensitivity of the screening/case-finding instrument (such as Whooley questions or EPDS). For two-stage screening/case-finding, the overall sensitivity is equal to the product of the sensitivity of the first screening/case-finding strategy and the second strategy (as defined below).

Net sensitivity of two-stage screening=sensitivity of screen 1×sensitivity of screen 2.
(1)

Alternatively, this can be represented as:

Net sensitivity or net TP rate=(TP1TP1+FN1)×(TP2TP2+FN2).
(2)

Here TP and FN represent true positives and false negatives, respectively, 1 represents the first screening instrument (i.e. Whooley questions) and 2 represents the second instrument (i.e. EPDS or PHQ-9). The sensitivity of the second stage is estimated using the sample of positive cases (both true and false) from the first stage. Hence, the combined sensitivity is always less than the sensitivity of one-stage screening/case-finding (unless the sensitivity of the second stage is 1, in which case the combined sensitivity will be the same as the first-stage sensitivity).

Based on expert opinion in the NICE guidance,8 all positive cases (including both TPs and FPs) were assumed to undergo further assessment by the HV or MW lasting approximately 1 hour; this was considered only in terms of cost of the HV’s/MW’s time and not in terms of the impact on treatment pathway, as no studies were available that reported if further assessment had an impact on subsequent treatment pathway.

False positive

Women who were not depressed but incorrectly identified as depressed were classed as FP. The probability of FP depends on the specificity of a screening/case-finding strategy. For single-stage screening/case-finding, the FPR is simply equal to 1 minus the specificity of the screening/case-finding strategy. For two-stage screening/case-finding, the overall specificity is equal to the sum of the two specificities minus the product of the two specificities, and the FPR is 1 minus the combined specificity (see below). Hence, the combined specificity of two-stage testing is higher than one-stage specificity. Therefore, FPR of a combined test is lower than the FPR of the first stage only.

True negative

Women who were not depressed and correctly identified as non-depressed were classed as TN. The probability of TN depends on the specificity of a screening/case-finding strategy (i.e. specificity is the TN rate).

Net specificity of two-stage screening or net TN rate=((TN1TN1+FP1)+(TN2TN2+FP2))((TN1TN1+FP1)×(TN2TN2+FP2)).
(3)

Moreover, we note that:

Net FPR of two-stage screening=1(net specificity of two-stage screening).
(4)
False negative

Women who were depressed but falsely identified as non-depressed were classed as FNs. The probability of being FN depends on the sensitivity of a test (i.e. the FN rate is 1 minus sensitivity of a test). Hence, two-stage strategies are likely to have higher FN rates (unless sensitivity of the second stage is 1).

Treatment model

Based on the case-finding outcome, women were assigned to a care pathway and followed this pathway until the end of the follow-up period (i.e. at 1 year after screening/case-finding in the postnatal model or until birth in the prenatal model). The treatment pathway was also based on interventions recommended in the NICE guidance (2014) and used in the NICE model.8 Below we describe the treatment pathways in detail and in Figures 1114 we present the schematics.

FIGURE 11. Treatment pathways: TP.

FIGURE 11

Treatment pathways: TP.

FIGURE 14. Treatment pathways: TN.

FIGURE 14

Treatment pathways: TN.

FIGURE 12. Treatment pathways: FP.

FIGURE 12

Treatment pathways: FP.

FIGURE 13. Treatment pathways: FN.

FIGURE 13

Treatment pathways: FN.

True positives and false positives

Women who were TP or FP were assumed to receive one of the following treatment options, in proportions reflecting severity of depression as reported in NICE guidance: 72% of women were assumed to have subthreshold or mild to moderate depression and received FSH; the remaining women (28%) were assumed to receive intensive psychological therapy in the prenatal model, and either intensive psychological therapy (20%) or pharmacological treatment (8%). In the postnatal model intensive psychological therapy may be offered for up to 16 sessions; however, most studies used to derive the treatment effect reported in NICE guidance used between 8 and 12 sessions.8 Therefore, we assumed 10 sessions of intensive psychological therapy with a CI of between 8 and 12 sessions. The pharmacological therapy in the postnatal model consisted of sertraline for 8 weeks plus 6 months of maintenance treatment.

In line with the NICE model, we assumed that FP women would receive the same treatments in the same proportions as TP women8 but that they would stop the treatment earlier and would consume only 20% of the treatment-related health-care resources (based on consensus of the NICE Guidance Development Group).8 However, we conducted sensitivity analysis of this assumption by varying the resource use incurred by FPs to be equal to 10% or 30% of the resource use by TPs.

True negatives and false negatives

Women who were TN were assumed to incur no further costs related to depression. It was assumed that a proportion of FN women will recover on their own. In line with NICE guidance, we assumed that women who do recover without treatment will incur additional health and social care costs while they are depressed – recovery was assumed to happen 6–7 weeks after implementing a screening/case-finding strategy (as in the NICE model).8 However, if women did not recover spontaneously, they were assumed to have one GP visit half-way through the follow-up period during which their depression could be detected and in which case treatment would be offered in the same proportion as to TP women (described above). Finally, if women were not detected by their GP during the follow-up period, then they were assumed to stay depressed and also incur health and social care costs until the end point of the model. Owing to lack of relevant data and short time horizon, relapse was not modelled.

Parameters and assumptions of the decision model

Parameters and assumptions in the model are summarised in Table 21 and discussed in detail below.

TABLE 21

TABLE 21

Model parameters, assumptions and sources

Clinical input parameters

The prevalence of depression was based on our primary study (as reported in Chapter 5). The prevalence of prenatal depression was 10.3% (95% CI 7.2% to 13.3%), whereas the prevalence of postnatal depression was 10.5% (95% CI 7.2% to 13.8%). Data on the diagnostic accuracy of screening/case-finding strategies were also based on our primary study (as reported in Chapter 5). Sensitivity and specificity data were specific to prenatal (20 weeks) and postnatal (3–4 months) periods. Diagnostic performance for second-stage screening was estimated conditional on the outcome of the first stage. We note that one of the limitations of the NICE model (due to lack of available evidence) was that the sensitivity and specificity of the second-stage screen was assumed to be independent of the first stage.

For standard care case identification, we used the sensitivity and specificity parameters reported in NICE guidance (2014), which were based on a meta-analysis of 118 studies that assessed accuracy of identifying depression by GPs.8,178 The authors reported weighted sensitivity and specificity of 50.1% and 81.3%, respectively, which were used in our model to approximate sensitivity and specificity of routine care case identification (as in the NICE model). These parameters are based on cases identified without the use of formal screening/case-finding instruments but as part of routine assessment. It should be noted that standard care case identification does not include conducting a gold standard interview to confirm initial diagnosis. Hence, HPs may incorrectly identify positive and negative cases.

Treatment effects of FSH and high-intensity psychological therapy were based on the meta-analysis reported in NICE guidance (2014).8 The rate of response to pharmacological therapy was assumed to be the same as the rate of response to high-intensity psychological therapy. See NICE guidance for further details about the meta-analysis,8 and see Table 21 for estimates of treatment effects. Note that treatment effects are reported as relative risk of no improvement (compared with usual care); hence, a relative risk of < 1 is desirable. More specifically, the absolute risk of no improvement in the usual care group is 0.667 [i.e. 33.3% (1 – 0.667) of women with depression will recover under usual care]. However, women receiving FSH have a relative risk of no improvement of 0.73, which is multiplied by the absolute risk of no improvement in the usual care group to arrive at an absolute risk of no improvement of 0.487 (= 0.73 × 0.667) in the FSH group. In other words, women receiving FSH have a 51.3% (1 – 0.487) chance of recovering under FSH. Owing to lack of data on treatment effect in the prenatal period, our prenatal model assumed the same treatment effect as the postnatal model.

As mentioned earlier, a proportion of women will spontaneously recover from perinatal depression. A review conducted by Dennis et al.177 reported that the rate of spontaneous recovery from postnatal depression was between 25% and 40% in the control arms of randomised trials. Consistent with the NICE model, we used the mid-point of this range to represent the proportion of women who would enter remission. This is also consistent with the meta-analysis of standard care arms reported in NICE guidance.8 We assumed the same recovery rate for prenatal depression.

Women who do not spontaneously enter remission may be detected by their GP through routine care. The model assumed that these women will have one GP consultation halfway through the follow-up period, during which their depression could be detected and treatment offered. Previous models8,81 have used an estimate from a study in general practice.176 Based on this study, 8% of women who did not enter spontaneous remission would be detected halfway through the follow-up period.

Resource use and cost data

Resource use data were based on a number of sources, including our primary study, NICE guidance (2014)8 and national cost databases. The times required to administer the Whooley questions (under 1 minute) and EPDS (under 2 minutes) were based on our primary study. We added 1 and 2 minutes, respectively, for scoring of Whooley questions and EPDS. PHQ-9 was assumed to require the same amount of time as EPDS. The cost of a HV’s time was based on NHS reference cost database (2014–15).179 Routine care case identification was assumed to require one GP consultation lasting 11.7 minutes.173

The cost of treatments was based on NICE guidance (2014).8 Of the positive cases, 72% received a mean of seven sessions of FSH (each lasting 25 minutes) with a psychological well-being practitioner whose costs were assumed to be equal to those of a mental health nurse (£75 per hour of face-to-face contact173). In addition, a self-help manual was costed at £9.09 (www.amazon.co.uk).174

For intensive psychological therapy, the NICE guidance (2014) included six studies in the meta-analysis to derive the treatment effect.8 The number of sessions in these studies varied between studies from three to more than 15; however, most studies used 8–12 sessions. Hence, for the economic analysis, we assumed 10 sessions for the base case and conducted sensitivity analyses using 8 and 12 sessions. Each session lasted 55 minutes and was costed at £94 [NHS reference cost database (2014–15)179]. Based on NICE expert group opinion (2014),8 women receiving intensive psychological therapy or self-help would receive additional care that would comprise three GP consultations.

Women receiving pharmacological therapy in the postnatal model would receive 8 weeks of initial therapy with sertraline followed by 6 months of maintenance therapy (i.e. 50 mg per day). The cost of sertraline was based on the most recent version of the British National Formulary.175 In addition, these women were actively monitored in primary or secondary care. Based on NICE guidance (2014),8 it was assumed that 15% would receive two consultant psychiatrist visits (one lasting 30 minutes and one lasting 15 minutes) and two consultations with the GP; the remaining 85% of women receiving pharmacological therapy were assumed to be managed in primary care and to have four GP consultations. The unit cost of a GP consultation was £44.173

False-negative women (i.e. those not identified as depressed by screening/case-finding strategies) were assumed to incur additional health and social care costs while they are depressed. Following the NICE model, the cost of health and social care was based on Petrou et al.,113 who estimated the cost of health and social care in a cohort of women with postnatal depression (including cost of primary and secondary care). These costs were adjusted to 2016 and were £9.00 per week.

All costs were expressed in 2015–16 prices. Discounting of costs and outcomes was not necessary owing to the short time horizon of the model.

Utility data and estimation of quality-adjusted life-years

Utility data were based on primary data collected in our study using the EQ-5D,141 which is the standard generic instrument used to estimate HRQoL. Responses to the EQ-5D were converted into QALYs using the UK tariff, and QALYs were estimated using an AUC approach.

Women were assigned utility values based on the diagnostic and treatment pathways. TP cases were assigned the utility of depression at the time of screening/case-finding. Women who recovered after treatment were assumed to have a linear improvement in quality of life until the end of therapy (estimated using linear interpolation). Women who did not respond to treatment were assumed to continue with the utility value of depression. FP cases were assumed to have no utility decrement due to false diagnosis in the base-case model; however, in a sensitivity analysis we assumed a 2% reduction in utility due to a FP diagnosis (in line with the NICE model). TN cases had the utility level of non-depressed women. FN women who entered spontaneous remission were assumed to recover linearly over 7 weeks after initial screening/case-finding (as per NICE model), whereas those who did not recover continued with the utility of depression unless identified and treated in routine care (as per the model pathway).

Model parameters and assumptions are summarised in Table 21.

Analysis

Pathways in the decision tree were evaluated using probabilities based on the sensitivity and specificity of screening/case-finding strategies and subsequent probabilities of treatment and response. More specifically, the model started with a cohort of depressed and non-depressed women, with the proportion of depressed women determined by the prevalence of depression in prenatal and postnatal periods (see Table 21 for parameters). For instance, based on a prevalence estimate of 10.5% in the postnatal period, 105 women in a cohort of 1000 women would be depressed. Both depressed and non-depressed groups were screened using either a one- or two-stage screening/case-finding strategy. Based on the sensitivity and specificity of the screening/case-finding strategy, the outcomes were TP, FP, TN or FN. For instance, using the Whooley questions as the first screen, 85.7% of depressed women (= the sensitivity of Whooley questions) in the postnatal period were identified as positive [i.e. TP = 90 women in a cohort of 1000 women (this is calculated as 105 depressed women times 85.7% sensitivity)]. As a result, 14.3% of depressed women were missed by this strategy (i.e. FN). Also, based on the specificity of Whooley questions, 80.6% of non-depressed women were correctly identified (i.e. TN), whereas 19.4% were incorrectly identified as depressed (i.e. FP). In the one-stage strategy based on Whooley questions alone, both women identified as TP and those identified as FP were offered treatment at this stage, as described earlier; however, those identified as TN or FN were not offered treatment. In the two-stage strategy, women identified as TP or FP were screened again, which reduced the number of FPs, but some more TPs were incorrectly identified as FNs. Subsequently, costs and outcomes were applied to each pathway, as mentioned earlier. Expected costs and QALYs were calculated for each pathway. The ICER was calculated as a ratio of the difference in costs and difference in QALYs compared with the next most effective screening/case-finding strategy. A screening/case-finding strategy is considered ‘dominated’ (and therefore ruled out) if it has higher costs and lower QALYs than another strategy. Moreover, ‘extended dominance’ rules out any strategy for which the ICER is higher than that of the next most effective strategy. Net monetary benefit was calculated as the total QALYs of each strategy times the WTP threshold minus the total cost of each strategy.

Probabilistic sensitivity analysis was conducted using 10,000 Monte Carlo simulations. For PSA, normal distribution was assumed for the following parameters: prevalence of depression; sensitivity and specificity of screening/case-finding tests; and absolute and relative risk of improvement. A beta distribution was assumed for utility weights and for the probability that depression is identified by GP during the follow-up period. Unit costs were assumed to be fixed. Using simulation results, cost-effectiveness acceptability curves and frontiers were plotted to present the probability of each screening/case-finding strategy being most cost-effective for a WTP threshold between £0 and £100,000. Further sensitivity analyses were conducted by (1) varying the prevalence of perinatal depression, (2) assuming 2% reduction in utility in FP cases (as per the NICE model) and (3) varying the level of resource use by FPs (for further details of the sensitivity analyses see Results).

Another health economist who was independent of the research group and not part of the model development process checked the decision model. The model was also checked for logical consistency by setting input parameters to null and extreme values and examining the direction of change of results.

Results

The results are presented separately for the prenatal and postnatal models. For each model, the base-case results are presented using the data sources and assumptions discussed in the Methods section; this is followed by sensitivity analyses to assess the impact of using alternative parameters and assumptions. We present the postnatal results first, followed by the prenatal model results.

Base-case results: postnatal depression

Table 22 presents a summary of the results of base-case cost-effectiveness analysis of screening/case-finding strategies for postnatal depression. Strategies were ranked based on estimates of average cost per woman screened (from the least expensive to the most expensive). Deterministic analysis shows that the average cost per woman varies between £66.30 for ‘Whooley questions followed by PHQ-9’ and £103.50 for ‘Whooley questions only’. The difference between the most and least expensive strategy in terms of QALYs was only 0.00078 per woman screened. The ICERs were compared with the next cheapest (non-dominated) strategy. Table 22 also shows that routine care case identification strategy was dominated by other strategies because it was more expensive and less effective than the next low-cost strategy in the ordered list. One-stage EPDS (≥ 13-point cut-off point) was extendedly dominated, whereas one-stage EPDS (≥ 10-point cut-off point) had an ICER of £105,352 compared with the next cheapest strategy. One-stage Whooley questions had an ICER of £137,883 compared with EPDS (≥ 10-point cut-off point) only. Hence, one-stage identification strategies were either dominated (or extendedly dominated) by two-stage screening or had an ICER that was significantly above the conventional WTP threshold range of £20,000–30,000 per QALY.

TABLE 22

TABLE 22

Total costs, QALYs and probability of cost-effectiveness of screening/case-finding strategies for postnatal depression

Costs and QALYs were jointly evaluated using net monetary benefit at conventionally used cost-effectiveness thresholds of £20,000 and £30,000 per QALY (see Appendix 14). The results show a very small difference between strategies in terms of net monetary benefit. The most cost-effective strategy in the range between £20,000 and £30,000 per QALY was ‘Whooley questions followed by PHQ-9’. However, the difference between this and the next most likely cost-effective strategy, in terms of net monetary benefit, was < £15 per woman for the conventionally used WTP thresholds.

Given uncertainties in costs and QALYs, ICERs and net benefits should be interpreted in the light of the probabilistic cost-effectiveness analysis presented in Table 22, and also graphically presented as cost-effectiveness acceptability frontier in Figure 15; the figure shows that ‘Whooley questions followed by PHQ-9’ is most cost-effective in the range between £20,000 and £30,000 per QALY (see Appendix 14 for cost-effectiveness acceptability curves for all strategies). It was also the most cost-effective strategy at a threshold of £13,000 per QALY (note: this is the lower threshold proposed in a recent report).180 Figure 15 shows that ‘Whooley questions followed by PHQ-9’ is cost-effective up to a WTP threshold of £51,500, beyond which EPDS (≥ 10-point cut-off point) becomes the most cost-effective strategy. However, these results should be interpreted in the light of net monetary benefits, which show a very small monetary difference between strategies.

FIGURE 15. Cost-effectiveness acceptability frontier of diagnostic strategies for postnatal depression.

FIGURE 15

Cost-effectiveness acceptability frontier of diagnostic strategies for postnatal depression.

Our analysis also found that Whooley questions alone was never a cost-effective strategy even at threshold of £100,000 per QALY. Moreover, when comparing EPDS alone with Whooley questions in terms of probability of cost-effectiveness, the EPDS alone always had a higher probability of being cost-effective than the Whooley questions alone.

To better understand these results, Table 23 compares screening/case-finding strategies in terms of sensitivity, specificity and health gains for a hypothetical cohort of 1000 postnatal women who undergo screening/case-finding. Table 23 shows that ‘Whooley questions only’ has the lowest specificity, of 80.6%, which implies a FPR of 19.4% (i.e. 100% – 80.6%). Given the prevalence of postnatal depression in the model, this FPR implies that 174 per 1000 women will be incorrectly identified as depressed. If these FP women receive 20% of the full treatment (as assumed in the base-case model), this implies that £30,043 (= 174 FP women × £173 cost of FP treatment) will be unnecessarily spent as a result of incorrect identification of non-depressed women. In comparison, the ‘Whooley questions followed by PHQ-9’ strategy has a FPR of 2.7% (combined specificity = 97.3%; see Methods) and would therefore incorrectly identify only 24 women per 1000 as depressed, resulting in relatively low unnecessary treatment cost of £4144 (i.e. £25,899 less than Whooley questions alone). This shows why specificity is a key driver of the cost-effectiveness results. It should also be noted that the base-case model makes the conservative assumption that, while a FP identification is associated with additional costs, it does not cause loss of quality of life as a result of false diagnosis. Relaxing this assumption (see Sensitivity analysis) makes ‘Whooley questions followed by PHQ-9’ even more likely to be cost-effective.

TABLE 23

TABLE 23

Sensitivity, specificity and diagnostic outcomes of a hypothetical cohort of 1000 women screened for postnatal depression

Table 23 shows that the ‘Whooley questions followed by PHQ-9’ strategy has the second lowest sensitivity (after routine care case identification) whereas ‘Whooley questions only’ has the highest sensitivity, of 85.7%. High sensitivity of ‘Whooley questions only’ implies that more cases of depression will be identified and given treatment – this is reflected in the highest number of cases identified and total QALYs in the ‘Whooley questions only’ strategy. However, the incremental difference in QALYs between ‘Whooley questions only’ and ‘Whooley questions followed by PHQ-9’ is relatively low. This is because of a number of factors, including (1) the prevalence of perinatal depression in the model – this implies that a gain in sensitivity by 10% would result in only 10.5 additional TP diagnoses per 1000 women; (2) the magnitude of the relative treatment effect for therapies offered to women identified as TP – the most common treatment in the model (i.e. FSH) reduces the relative risk of no improvement by 27%; (3) the spontaneous recovery rate of FN cases – based on the NICE guidance document, the model assumes that 33% of FN women will enter spontaneous remission within 6–7 weeks of screening/case-finding; and (4) the routine care identification of missed cases of depression (i.e. FNs) during the follow-up period. Taking account of these factors (see Methods for the parameter values), the difference in QALYs between ‘Whooley questions only’ and ‘Whooley questions followed by PHQ-9’ strategies is 0.78 per 1000 women screened. This shows why a difference in sensitivity has a relatively smaller impact on cost-effectiveness compared with a similar improvement in specificity.

Base-case results: prenatal depression

Table 24 presents a summary of the results of the base-case cost-effectiveness analysis of screening/case-finding strategies for prenatal depression. Strategies were ranked based on estimates of average cost per woman screened (from the least expensive to the most expensive). Deterministic analysis shows that the average cost per woman varies between £49.20 for ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’ and £91.60 for ‘Whooley questions only’. Difference between the most and least expensive strategy in terms of QALYs was only 0.00049 per woman screened. One-stage EPDS at ≥ 10 and ≥ 13 cut-off points was extendedly dominated by less expensive strategies, whereas one-stage ‘Whooley questions only’ had an ICER of £171,990 compared with a two-stage strategy of ‘Whooley questions followed by EPDS’ (≥ 10-point cut-off point). Hence, one-stage identification strategies were either extendedly dominated by two-stage screening or had an ICER that was significantly above the conventional WTP threshold range of £20,000–30,000 per QALY.

TABLE 24

TABLE 24

Total costs, QALYs and probability of cost-effectiveness of screening/case-finding strategies for prenatal depression

Costs and QALYs were jointly evaluated using net monetary benefit at conventionally used cost-effectiveness thresholds of £20,000 and £30,000 per QALY (see Appendix 14). The results show a very small difference between strategies in terms of net monetary benefit. Two strategies are cost-effective in the range between £20,000 and £30,000 per QALY [i.e. ‘Whooley questions followed by PHQ-9’ and ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’]. The difference in net monetary benefit between these strategies at a WTP threshold of between £20,000 and £30,000 per QALY was < £4 per woman.

As in the case of postnatal depression, our analysis found that ‘Whooley questions alone’ were never a cost-effective strategy, even at threshold of £100,000 per QALY. Moreover, when comparing the EPDS alone with the Whooley questions in terms of probability of cost-effectiveness, the EPDS alone always had higher probability of being cost-effective than the Whooley questions alone.

The cost-effectiveness results were evaluated probabilistically and are presented in Table 24 and also graphically as a cost-effectiveness acceptability frontier in Figure 16. Table 24 and Figure 16 show that ‘Whooley questions followed by PHQ-9’ is most cost-effective in the range between £20,000 and £30,000 per QALY, although at £20,000 per QALY ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’ is almost equally cost-effective. Using a lower threshold of £13,000 per QALY, ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’ would be the most cost-effective strategy. However, these results should be interpreted in the light of net monetary benefits, which show a very small monetary difference between strategies.

FIGURE 16. Cost-effectiveness acceptability frontier of diagnostic strategies for prenatal depression.

FIGURE 16

Cost-effectiveness acceptability frontier of diagnostic strategies for prenatal depression.

Table 25 compares screening/case-finding strategies in terms of sensitivity, specificity and health gains for a hypothetical cohort of 1000 prenatal women who undergo screening/case-finding. The table shows that routine care case identification and ‘Whooley questions only’ strategies have the lowest specificities of 81.3% and 83.7%, respectively. For ‘Whooley questions only’ this implies a FPR of 16.3%. For the prevalence of prenatal depression in the model, this FPR implies that 146 per 1000 women will be incorrectly identified as depressed using ‘Whooley questions only’. If these FP women receive 20% of the full treatment (as assumed in the base-case model), this implies that £27,309 will be unnecessarily spent due to incorrect identification of non-depressed women. In comparison, the ‘Whooley questions followed by PHQ-9’ strategy has a FPR of 1.7% and would therefore incorrectly identify only 15 women per 1000 as depressed, resulting in relatively low unnecessary treatment cost of £2875 (i.e. £24,434 less than ‘Whooley questions alone’). Similar analysis can be done for ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’. These results show why specificity is a key driver of the cost-effectiveness results. Also, as mentioned earlier, the base-case model makes the conservative assumption that although FP is associated with additional costs, it does not cause loss of quality of life due to false diagnosis. Relaxing this assumption (see Sensitivity analysis) makes ‘Whooley questions followed by PHQ-9’ and ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’ even more likely to be cost-effective.

TABLE 25

TABLE 25

Sensitivity, specificity and diagnostic outcomes of a hypothetical cohort of 1000 women screened for prenatal depression

In the prenatal model, sensitivity is found to be even less influential than the postnatal model. This is partly because of the short follow-up period (20 weeks) during which the QALY benefits of recovery from depression in the TP women are realised. Moreover, FN women (those not identified as depressed) who do not recover from depression do not have a long period with depression until birth. Therefore, in the prenatal model, specificity is even more influential in the cost-effectiveness analysis.

Sensitivity analyses: prenatal and postnatal depression

The base-case analysis made a number of assumptions about model parameters. We tested the robustness of our results by changing assumptions about model parameters. Three sets of sensitivity analyses were conducted, which are described below, along with the results.

Sensitivity analysis 1: prevalence of depression

The base-case model used prevalence estimates of prenatal and postnatal depression from the BaBY PaNDA study. However, lower and higher estimates have also been reported in the literature. For instance, the NICE postnatal model used a lower prevalence estimate of 8.7% based on a pragmatic randomised controlled trial of postnatal depression conducted in the UK.8,143 Higher estimates have also been reported in the literature. Hence, we conducted sensitivity analyses by varying the prevalence parameter between 8.7% and 12% for both prenatal and postnatal depression.

Tables 26 and 27 show the results of sensitivity analyses of prenatal and postnatal models, respectively. When lower prevalence of 8.7% is assumed in the model, the probability of ‘Whooley questions followed by PHQ-9’ being the most cost-effective strategy increases in both prenatal and postnatal models, whereas the opposite is observed when prevalence increases to 12%. This is not surprising, since lower prevalence implies that specificity is even more important because of the higher number of women without depression who could be misdiagnosed as FP by a strategy with lower specificity, and therefore incur additional costs. The results of prenatal and postnatal models are generally similar to the base-case analysis, except that at higher prevalence of 12% ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’ is cost-effective at £20,000 per QALY in the prenatal model; however, in all other comparisons, ‘Whooley questions followed by PHQ-9’ remains the most effective strategy.

TABLE 26

TABLE 26

Sensitivity analysis of varying the prevalence parameter in the prenatal model

TABLE 27

TABLE 27

Sensitivity analysis of varying the prevalence parameter in the postnatal model

Sensitivity analysis 2: reduction in quality of life in false positives

Our base-case model assumed that FP diagnosis is only associated with excess health-care cost, but does not result in reduction in quality of life. This assumption is common in most diagnostic studies. However, we checked the robustness of our assumption by assuming a 2% reduction in quality of life due to FP diagnosis of perinatal depression (as assumed in the NICE model).8

The results for prenatal and postnatal models are presented in Table 28. Both prenatal and postnatal results show that ‘Whooley questions followed by PHQ-9’ is the most cost-effective strategy in the range between £20,000 and £30,000 per QALY. Moreover, the probability of this strategy being the most cost-effective increased when assuming a QALY reduction. This is not surprising as the ‘Whooley questions followed by PHQ-9’ strategy has high specificity (and therefore a low FPR), which would result in limited loss of QALYs due to FP diagnosis. Moreover, the benefit of a low FPR partly offsets the lost opportunity of QALY gain from test sensitivity.

TABLE 28

TABLE 28

Sensitivity analysis of assuming a 2% reduction in quality of life in FP cases

Sensitivity analysis 3: resource use by false-positive cases

Based on the NICE guidance, the base-case cost-effectiveness analysis assumed that FP cases incur 20% of the treatment cost incurred by TP cases.8 However, the impact of this assumption needs to be evaluated. Hence, we conducted a sensitivity analysis assuming 10% and 30% resource use in the FP cases.

Tables 29 and 30 present the results of changing this assumption about resource use. When resource use by FP cases is assumed to be 10% of resource use by TP cases, the cost of FP cases reduces, which in turn reduces the influence of specificity of the test on the overall cost-effectiveness result. Hence, the probability of ‘Whooley questions followed by PHQ-9’ being the most cost-effective strategy decreases with this assumption. However, except at a threshold of £20,000 per QALY in the prenatal model when ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’ is the most cost-effective strategy, for all other analyses, ‘Whooley questions followed by PHQ-9’ remains the most cost-effective strategy.

TABLE 29

TABLE 29

Sensitivity analysis of varying the amount of resource use among the FP cases in the prenatal model

TABLE 30

TABLE 30

Sensitivity analysis of varying the amount of resource use among the FP cases in the postnatal model

In contrast, when assuming that a FP case incurs 30% of the cost of a TP case, the FPR becomes even more important than the base-case analysis because the unnecessary use of resources by FP cases is significant. As a result, the probability of ‘Whooley questions followed by PHQ-9’ being the most cost-effective strategy increases and it becomes the most cost-effective strategy in both prenatal and postnatal periods.

It should also be noted that two-stage screening/case-finding was found to be more cost-effective than one-stage screening/case-finding in all sensitivity analyses. Moreover, when comparing one-stage screening/case-finding strategies, ‘Whooley questions alone’ was always one of the least cost-effective strategies and therefore should not be preferred over a two-stage screening/case-finding or ‘EPDS alone’ strategy.

Discussion

In the postnatal period, ‘Whooley questions followed by PHQ-9’ had the highest probability of being cost-effective at conventionally used WTP thresholds in the UK. At values of £20,000 and £30,000 per QALY, ‘Whooley questions followed by PHQ-9’ had a probability of 0.43 and 0.35, respectively, of being the most cost-effective screening/case-finding strategy. The next most cost-effective strategy at these thresholds was ‘EPDS only (≥ 10-point cut-off point)’. ‘Whooley questions alone’ was never a cost-effective strategy even at a threshold of £100,000 per QALY. Moreover, when comparing ‘EPDS alone’ with ‘Whooley questions alone’ in terms of probability of cost-effectiveness, the ‘EPDS alone’ strategy always had higher probability of being cost-effective compared with ‘Whooley questions alone’ strategy.

In the prenatal model, ‘Whooley questions followed by PHQ-9’ and ‘Whooley questions followed by EPDS (≥ 13-point cut-off point)’ are the most cost-effective strategies, with small difference between the strategies in terms of net monetary benefit between £20,000 and £30,000 per QALY.

Our decision model found that specificity of case-finding instruments is a key driver of the cost-effectiveness analysis of screening/case-finding strategies. This is because, given the prevalence of perinatal depression, small changes in specificity result in significant increase in the number of FP cases detected. For instance, for a hypothetical cohort of 1000 women, a reduction in specificity of a postnatal diagnostic strategy by 5% would result in unnecessary treatment of 45 women (given the prevalence of postnatal depression). This would result in unnecessary resource use of £7785 in terms of treatment cost (assuming that FP cases receive 20% of the total treatment). Both the NICE guidance8 and Paulden et al.81 model have highlighted the significant role of the specificity parameter in the cost-effectiveness results for this particular condition.

With regards to the impact of FP diagnosis on quality of life, our base-case analysis took a conservative approach and assumed that FP diagnosis is not associated with loss of quality of life; however, our sensitivity analysis found that assuming a 2% reduction is quality of life for FPs (as assumed in the NICE guidance model8) further improves the value for money of the ‘Whooley questions followed by PHQ-9’ strategy. Hence, specificity is a key driver of the cost-effectiveness results.

Another important finding of the cost-effectiveness analysis is the comparison of two-stage sequential testing with one-stage testing. The specificity of a two-stage screening/case-finding strategy is higher than that of a one-stage strategy (and, therefore, has lower net FPR). This is important given the prevalence rate of perinatal depression and the known specificity of the Whooley questions alone and EPDS alone. Table 25 shows that use of a single instrument would result in 146 and 121 FP cases per 1000 women screened for Whooley questions alone and EPDS alone, respectively, in the prenatal period. Given that there were 871,038 conceptions in England and Wales in 2014,181 one-stage screening/case-finding would result in 127,172 and 105,396 FP cases with Whooley questions alone and EPDS alone, respectively, in the prenatal period. However, this improved specificity comes at the cost of reduced sensitivity of sequential tests. However, the opportunity cost of FN diagnoses in terms of QALYs lost was found to be relatively small. This is because of spontaneous recovery of perinatal depression in the weeks following the initial screen, relatively modest treatment effect of therapies offered for perinatal depression, subsequent identification of cases missed by initial screen, and the known prevalence of postnatal depression. Hence, taking into account both sensitivity and specificity, the most cost-effective strategy in the perinatal period is ‘Whooley questions followed by PHQ-9’.

Our findings are in line with the model reported in NICE guidance, which also found ‘Whooley questions followed by PHQ-9’ to be the most cost-effective strategy.8 Our model used the same model structure as in the NICE guidance; however, we used primary data on diagnostic performance of case-finding instruments and prevalence of depression in the perinatal period. Moreover, by virtue of having primary data on diagnostic performance, we were able to overcome some of the assumptions made in the NICE model due to data limitations, such as second-stage estimates of sensitivity and specificity conditional on the outcome of the first stage. Finally, it should be noted that the cost-effectiveness results should be interpreted in the light of net monetary benefit. In the case of both prenatal and postnatal depression, our model found that the difference in net monetary benefits between screening/case-finding strategies is small. Again, this is in line with the NICE model, which found the difference between the most cost-effective strategy (i.e. Whooley questions followed by PHQ-9) and the next most cost-effective strategy (i.e. Whooley questions followed by EPDS) to be < £3 (this is not explicitly reported in the report, but can be derived from the results). Given the small difference in expected net benefit in our study (and the previously published NICE model8), decision-makers should also consider other aspects of screening/case-finding, such as acceptability of screening/case-finding strategies.

Copyright © Queen’s Printer and Controller of HMSO 2018. This work was produced by Littlewood et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK481917

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (8.6M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...