3METHODS USED TO DEVELOP THIS GUIDELINE

Publication Details

3.1. OVERVIEW

The development of this guideline drew upon methods outlined by NICE (The Guidelines Manual [NICE, 2007b]). A team of healthcare professionals, lay representatives and technical experts known as the GDG, with support from NCCMH staff, undertook the development of a patient-centred, evidence-based guideline. There are six basic steps in the process of developing a guideline:

  • define the scope, which sets the parameters of the guideline and provides a focus and steer for the development work
  • define clinical questions considered important for practitioners and service users
  • develop criteria for evidence searching and search for evidence
  • design validated protocols for systematic review and apply to evidence recovered by search
  • synthesise and (meta-) analyse data retrieved, guided by the clinical questions, and produce evidence profiles and summaries
  • answer clinical questions with evidence-based recommendations for clinical practice.

The clinical practice recommendations made by the GDG are therefore derived from the most up-to-date and robust evidence base for the clinical and cost effectiveness of the treatments and services used in the treatment and management of depression in people with a chronic physical health problem. In addition, to ensure a service user and carer focus, the concerns of people with depression and a chronic physical health problem and their carers regarding health and social care have been highlighted and addressed by recommendations agreed by the whole GDG.

3.2. THE SCOPE

Guideline topics are selected by the Department of Health and the Welsh Assembly Government, which identify the main areas to be covered by the guideline in a specific remit (see NICE, 2007b). The NCCMH developed a scope for the guideline based on the remit.

The purpose of the scope is to:

  • provide an overview of what the guideline will include and exclude
  • identify the key aspects of care that must be included
  • set the boundaries of the development work and provide a clear framework to enable work to stay within the priorities agreed by NICE and the NCC and the remit from the Department of Health/Welsh Assembly Government
  • inform the development of the clinical questions and search strategy
  • inform professionals and the public about expected content of the guideline
  • keep the guideline to a reasonable size to ensure that its development can be carried out within the allocated period.

The draft scope was subject to consultation with registered stakeholders over a 4-week period. During the consultation period, the scope was posted on the NICE website (www.nice.org.uk). Comments were invited from stakeholder organisations and the Guideline Review Panel (GRP). Further information about the GRP can also be found on the NICE website. The NCCMH and NICE reviewed the scope in light of comments received, and the revised scope was signed off by the GRP.

3.3. THE GUIDELINE DEVELOPMENT GROUP

The GDG consisted of: professionals in psychiatry, clinical psychology, health psychology, nursing, general practice, occupational therapy, pharmacy, gerontology, cardiology, rheumatology; academic experts in psychiatry and psychology; and a person with depression and a chronic physical health problem. The GDG was recruited according to the specifications set out in the scope and in line with the process set out in the NICE guideline manual (NICE, 2007b). The guideline development process was supported by staff from the NCCMH, who undertook the clinical and health economics literature searches, reviewed and presented the evidence to the GDG, managed the process, and contributed to drafting the guideline.

3.3.1. Guideline Development Group meetings

GDG meetings were held between 22 January 2008, and 20 January 2009. During each day-long GDG meeting, in a plenary session, clinical questions and clinical and economic evidence were reviewed and assessed, and recommendations formulated. At each meeting, all GDG members declared any potential conflicts of interest, and the concerns of the person with depression and a chronic physical health problem were routinely discussed as part of a standing agenda.

3.3.2. Topic groups

The GDG divided its workload along clinically relevant lines to simplify the guideline development process, and GDG members formed smaller topic groups to undertake guideline work in that area of clinical practice. Three topic groups were formed to cover: (1) case identification and service configuration, (2) pharmacological interventions and (3) psychological and psychosocial interventions. These groups were designed to efficiently manage the large volume of evidence needing to be appraised prior to presenting it to the GDG as a whole. Each topic group was chaired by a GDG member with expert knowledge of the topic area (one of the healthcare professionals). Topic groups refined the clinical questions and the clinical definitions of treatment interventions, reviewed and prepared the evidence with the systematic reviewer before presenting it to the GDG as a whole and helped the GDG to identify further expertise in the topic. Topic group leaders reported the status of the group's work as part of the standing agenda. They also introduced and led the GDG discussion of the evidence review for that topic and assisted the GDG Chair in drafting the section of the guideline relevant to the work of each topic group. A group was also convened comprising the service user representative and members of the NCCMH review team to develop the chapter on experience of care (Chapter 4). The service user and NCCMH review team jointly ran the group and presented their findings at GDG meetings.

3.3.3. People with depression and a chronic physical health problem

A person with direct experience of services gave an integral service-user focus to the GDG and the guideline. They contributed as a full GDG member in writing the clinical questions, helping to ensure that the evidence addressed their views and preferences, highlighting sensitive issues and terminology relevant to the guideline, and bringing service user research to the attention of the GDG. In drafting the guideline they contributed to writing the guideline's introduction and Chapter 4, and identified recommendations from the service user perspective.

3.3.4. Special advisers

Special advisers, who had specific expertise in one or more aspects of treatment and management relevant to the guideline, assisted the GDG, commenting on specific aspects of the developing guideline and, where necessary, making presentations to the GDG. Appendix 3 lists those who agreed to act as special advisers.

3.3.5. National and international experts

National and international experts in the area under review were identified through the literature search and through the experience of the GDG members. These experts were contacted to recommend unpublished or soon-to-be-published studies to ensure that up-to-date evidence was included in the development of the guideline. They informed the group about completed trials at the pre-publication stage, systematic reviews in the process of being published, studies relating to the cost effectiveness of treatment, and trial data if the GDG could be provided with full access to the complete trial report. Appendix 6 lists researchers who were contacted.

3.4. CLINICAL QUESTIONS

Clinical questions were used to guide the identification and interrogation of the evidence base relevant to the topic of the guideline. Before the first GDG meeting, clinical questions were prepared by NCCMH staff based on the scope and an overview of existing guidelines, and discussed with the guideline Chair. The draft clinical questions were then discussed by the GDG at the first few meetings and amended as necessary. Where appropriate, the questions were refined once the evidence had been searched and, where necessary, subquestions were generated. Questions submitted by stakeholders were also discussed by the GDG and the rationale for not including questions was recorded in the minutes. The final list of clinical questions can be found in Appendix 7.

For questions about interventions, the PICO (patient, intervention, comparison and outcome) framework was used. This structured approach divides each question into four components: the patients (the population under study), the interventions (what is being done), the comparisons (other main treatment options) and the outcomes (the measures of how effective the interventions have been) (see Table 3).

Table 3. Features of a well-formulated question on effectiveness intervention – the PICO guide.

Table 3

Features of a well-formulated question on effectiveness intervention – the PICO guide.

Questions relating to diagnosis do not involve an intervention designed to treat a particular condition, therefore the PICO framework was not used. Rather, the questions were designed to pick up key issues specifically relevant to diagnostic tests – for example, their accuracy, reliability, safety and acceptability to the patient.

To help facilitate the literature review, a note was made of the best study design type to answer each question. There are four main types of clinical question of relevance to NICE guidelines. These are listed in Table 4. For each type of question, the best primary study design varies, where ‘best’ is interpreted as ‘least likely to give misleading answers to the question’.

Table 4. Best study design to answer each type of question.

Table 4

Best study design to answer each type of question.

However, in all cases, a well-conducted systematic review of the appropriate type of study is likely to always yield a better answer than a single study.

Deciding on the best design type to answer a specific clinical or public health question does not mean that studies of different design types addressing the same question were discarded.

3.5. SYSTEMATIC CLINICAL LITERATURE REVIEW

The aim of the clinical literature review was to systematically identify and synthesise relevant evidence from the literature to answer the specific clinical questions developed by the GDG. Thus, clinical practice recommendations are evidence-based where possible and, if evidence is not available, informal consensus methods are used (see Section 3.5.12) and the need for future research is specified.

3.5.1. Methodology

A step-wise, hierarchical approach was taken to locating and presenting evidence to the GDG. The NCCMH developed this process based on methods set out in The Guidelines Manual (NICE, 2007b) and after considering recommendations from a range of other sources. These included:

  • Clinical Policy and Practice Program of the New South Wales Department of Health (Australia)
  • Clinical Evidence online
  • The Cochrane Collaboration
  • Grading of Recommendations: Assessment, Development and Evaluation (GRADE) Working Group.
  • New Zealand Guidelines Group
  • NHS Centre for Reviews and Dissemination
  • Oxford Centre for Evidence-Based Medicine
  • Oxford Systematic Review Development Programme
  • Scottish Intercollegiate Guidelines Network (SIGN)
  • United States Agency for Healthcare Research and Quality.

3.5.2. The review process

During the development of the scope, a more extensive search was undertaken for systematic reviews and guidelines published since the first NICE depression guideline. These were used to inform the development of review protocols for each topic group. Review protocols included the relevant clinical question(s), the search strategy, the criteria for assessing the eligibility of studies, and any additional assessments.

The initial approach taken to locating primary-level studies depended on the type of clinical question and potential availability of evidence. Based on the first NICE depression guideline and GDG knowledge of the literature, a decision was made about which questions were best addressed by good practice based on expert opinion, which questions were likely to have a good evidence base and which questions were likely to have little or no directly relevant evidence. Recommendations based on good practice were developed by informal consensus of the GDG. For questions with a good evidence base, the review process depended on the type of key question (see below). For questions that were unlikely to have a good evidence base, a brief descriptive review was initially undertaken by a member of the GDG.

Searches for evidence were updated between 6 and 8 weeks before the guideline consultation. After this point, studies were included only if they were judged by the GDG to be exceptional (for example, the evidence was likely to change a recommendation).

3.5.3. The search process for questions concerning interventions

For questions related to interventions, the initial evidence base was formed from well-conducted RCTs that addressed at least one of the clinical questions. Although there are a number of difficulties with the use of RCTs in the evaluation of interventions in mental health, the RCT remains the most important method for establishing treatment efficacy. For other clinical questions, searches were for the appropriate study design (see above).

The search was exhaustive, using several databases and other sources. For RCTs the search consisted of terms relating to the clinical condition (that is, depression and a chronic physical health problem) and study design only, thereby yielding the largest number of relevant papers that might otherwise be missed by more specific searches, formed around additional elements of the question, including interventions and the outcomes of interest. The GDG did not limit the search to any particular therapeutic modality. Standard mental health related bibliographic databases (that is, CINAHL, Cochrane Library, EMBASE, MEDLINE, PsycINFO) were used for the initial search for all studies potentially relevant to the guideline. Where the evidence base was large, recent high-quality English-language systematic reviews were used primarily as a source of RCTs (see Appendix 11 for quality criteria used to assess systematic reviews). However, in some circumstances existing datasets were utilised. Where this was the case, data were cross-checked for accuracy before use. New RCTs meeting inclusion criteria set by the GDG were incorporated into the existing reviews and fresh analyses performed.

After the initial search, results were scanned liberally to exclude irrelevant papers, the review team used a purpose-built ‘study information’ database to manage both the included and the excluded studies (eligibility criteria were developed after consultation with the GDG). Double checking of all excluded studies was not done routinely, but a selection of abstracts was checked to ensure reliability of the sifting. For questions without good-quality evidence (after the initial search), a decision was made by the GDG about whether to (a) repeat the search using subject-specific databases (for example, AMED, ERIC, OpenSIGLE or Sociological Abstracts), (b) conduct a new search for lower levels of evidence or (c) adopt a consensus process (see Section 3.5.12). Future guidelines will be able to update and extend the usable evidence base starting from the evidence collected, synthesised and analysed for this guideline.

In addition, searches were made of the reference lists of all eligible systematic reviews and included studies, as well as the list of evidence submitted by stakeholders. Known experts in the field, based both on the references identified in early steps and on advice from GDG members, were sent letters requesting relevant studies that were in the process of being published (see Appendix 6)2. In addition, the tables of contents of appropriate journals were periodically checked for relevant studies.

3.5.4. The search process for questions of diagnosis and prognosis

For questions related to diagnosis, case identification and prognosis, the search process was the same as described above, except that the initial evidence base was formed from studies with the most appropriate and reliable design to answer the particular question. That is, for questions about diagnosis, the initial search was for cross-sectional studies; for questions about prognosis, it was for cohort studies of representative patients. In situations where it was not possible to identify a substantial body of appropriately designed studies that directly addressed each clinical question, a consensus process was adopted (see Section 3.5.12).

3.5.5. Search filters

Search filters developed by the review team consisted of a combination of subject heading and free-text phrases. Specific filters were developed for the guideline topic and, where necessary, for each clinical question. In addition, the review team used filters developed for systematic reviews, RCTs and other appropriate research designs (Appendix 9).

3.5.6. Study selection

All primary-level studies included after the first scan of citations were acquired in full and re-evaluated for eligibility (based on the relevant review protocol) at the time they were being entered into the study database. Appendix 8 lists the standard inclusion and exclusion criteria. More specific eligibility criteria were developed for each clinical question and are described in the relevant clinical evidence chapters. Eligible systematic reviews and primary-level studies were critically appraised for methodological quality (see Appendix 11 for the quality checklists and Appendix 18 for characteristics of each study including quality assessment). The eligibility of each study was confirmed by at least one member of the appropriate topic group.

For some clinical questions, it was necessary to prioritise the evidence with respect to the UK context (that is, external validity). To make this process explicit, the topic groups took into account the following factors when assessing the evidence:

  • participant factors (for example, gender, age and ethnicity)
  • provider factors (for example, model fidelity, the conditions under which the intervention was performed and the availability of experienced staff to undertake the procedure)
  • cultural factors (for example, differences in standard care and differences in the welfare system).

It was the responsibility of each topic group to decide which prioritisation factors were relevant to each clinical question in light of the UK context and then decide how they should modify their recommendations.

3.5.7. Unpublished evidence

The GDG used a number of criteria when deciding whether or not to accept unpublished data. First, the evidence must have been accompanied by a trial report containing sufficient detail to properly assess the quality of the research; second, where evidence was submitted directly to the GDG, it must have been done so with the understanding that details would be published in the full guideline. However, the GDG recognised that unpublished evidence submitted by investigators might later be retracted by those investigators if the inclusion of such data would jeopardise publication of their research.

3.5.8. Data extraction

Study characteristics and outcome data were extracted from all eligible studies, which met the minimum quality criteria, using the study database and Review Manager 4.2.7 (Cochrane Collaboration, 2004) for most outcomes. Study characteristics and outcome data on diagnostic accuracy (see Appendix 20) were extracted using Word-based forms and Stata 10 (StataCorp, 2007).

In most circumstances, for a given outcome (continuous and dichotomous) where more than 50% of the number randomised to any group were lost to follow-up, the data were excluded from the analysis (except for the outcome ‘leaving the study early’, in which case the denominator was the number randomised). Where possible, dichotomous efficacy outcomes were calculated on an intention-to-treat basis (that is, a ‘once-randomised-always-analyse’ basis). Where there was good evidence that those participants who ceased to engage in the study were likely to have an unfavourable outcome, early withdrawals were included in both the numerator and denominator. Adverse effects were entered into Review Manager as reported by the study authors because it was usually not possible to determine whether early withdrawals had an unfavourable outcome. Where there was limited data for a particular review, the 50% rule was not applied. In these circumstances, the evidence was downgraded due to the risk of bias.

Where necessary, standard deviations were calculated from standard errors (SEs), confidence intervals (CIs) or p-values according to standard formulae (see the Cochrane Reviewers' Handbook 4.2.7 [Cochrane Collaboration, 2008]). Data were summarised using the generic inverse variance method using Review Manager 4.2.7 (Cochrane Collaboration, 2004).

Consultation with another reviewer or members of the GDG was used to overcome difficulties with coding. Data from studies included in existing systematic reviews were extracted independently by one reviewer and cross-checked with the existing dataset. Where possible, two independent reviewers extracted data from new studies. Where double data extraction was not possible, data extracted by one reviewer were checked by the second reviewer. Disagreements were resolved with discussion. Where consensus could not be reached, a third reviewer or GDG members resolved the disagreement. Masked assessment (that is, blind to the journal from which the article comes, the authors, the institution and the magnitude of the effect) was not used since it is unclear that doing so reduces bias (Berlin, 2001; Jadad et al., 1996).

3.5.9. Synthesising the evidence

Analysis of efficacy studies

Where possible, meta-analysis was used to synthesise the evidence using Review Manager 4.2.7 (Cochrane Collaboration, 2004) for effectiveness data and Stata 10 for diagnostic accuracy. If necessary, reanalyses of the data or sub-analyses were used to answer clinical questions not addressed in the original studies or reviews. Studies have been given a ‘study ID’ to make them easier to identify in the text, tables and appendices of this guideline. Study IDs are composed of the first author's surname followed by the year of publication. References to included and excluded studies can be found in Appendix 18.

Dichotomous outcomes were analysed as relative risks (RR) with the associated 95% CI (for an example, see Figure 1). A relative risk (also called a ‘risk ratio’) is the ratio of the treatment event rate to the control event rate. An RR of 1 indicates no difference between treatment and control. In Figure 1, the overall RR of 0.73 indicates that the event rate (that is, non-remission rate) associated with intervention A is about three quarters of that with the control intervention or, in other words, the RR reduction is 27%.

Figure 1. Example of a forest plot displaying dichotomous data.

Figure 1

Example of a forest plot displaying dichotomous data.

The CI shows that 95% of the time the true treatment effect will lie within this range and can be used to determine statistical significance. If the CI does not cross the ‘line of no effect’, the effect is statistically significant.

Continuous outcomes were analysed as weighted mean differences (WMD), or as a standardised mean difference (SMD) when different measures were used in different studies to estimate the same underlying effect (for an example, see Figure 2). If provided, intention-to-treat data, using a method such as ‘last observation carried forward’, were preferred over data from completers.

Figure 2. Example of a forest plot displaying continuous data.

Figure 2

Example of a forest plot displaying continuous data.

To check for consistency between studies, both the I2 test of heterogeneity and a visual inspection of the forest plots were used. The I2 statistic describes the proportion of total variation in study estimates that is due to heterogeneity (Higgins & Thompson, 2002). The I2 statistic was interpreted in the following way:

  • >50%: notable heterogeneity (an attempt was made to explain the variation by conducting sub-analyses to examine potential moderators. In addition, studies with effect sizes greater than two standard deviations from the mean of the remaining studies were excluded using sensitivity analyses. If studies with heterogeneous results were found to be comparable with regard to study and participant characteristics, a random-effects model was used to summarise the results [DerSimonian & Laird, 1986]. In the random-effects analysis, heterogeneity is accounted for both in the width of CIs and in the estimate of the treatment effect. With decreasing heterogeneity the random-effects approach moves asymptotically towards a fixed-effects model).
  • 30 to 50%: moderate heterogeneity (both the chi-squared test of heterogeneity and a visual inspection of the forest plot were used to decide between a fixed and random-effects model).
  • <30%: mild heterogeneity (a fixed-effects model was used to synthesise the results).

To explore the possibility that the results entered into each meta-analysis suffered from publication bias, data from included studies were entered, where there was sufficient data, into a funnel plot. Asymmetry of the plot was taken to indicate possible publication bias and investigated further.

An estimate of the proportion of eligible data that were missing (because some studies did not include all relevant outcomes) was calculated for each analysis.

Included/excluded studies tables, generated automatically from the study database, were used to summarise general information about each study (see Appendix 18). Where meta-analysis was not appropriate and/or possible, the reported results from each primary-level study were also presented in the included studies table (and included, where appropriate, in a narrative review).

Analysis of diagnostic accuracy studies

The main outcomes extracted for diagnostic accuracy studies were sensitivity, specificity, positive predictive validity and negative predictive validity. These are discussed in detail below. In addition, negative likelihood ratios, positive likelihood ratios, and area under the curve (AUC) are briefly described.

The sensitivity of an instrument refers to the proportion of those with the condition who test positive. An instrument that detects a low percentage of cases will not be very helpful in determining the numbers of patients who should receive a known effective treatment, as many individuals who should receive the treatment will not do so. This would make for poor planning and underestimating the prevalence of the disorder and the costs of treatments to the community. As the sensitivity of an instrument increases, the number of false negatives it detects will decrease.

The specificity of an instrument refers to the proportion of those without the condition who test negative. This is important so that healthy individuals are not given treatments they do not need. As the specificity of an instrument increases, the number of false positives will decrease.

To illustrate this: from a population in which the point prevalence rate of depression is 10% (that is, 10% of the population has depression at any one time), 1000 people are given a test which has 90% sensitivity and 85% specificity. It is known that 100 people in this population have depression, but the test detects only 90 (true positives), leaving 10 undetected (false negatives). It is also known that 900 people do not have depression, and the test correctly identifies 765 of these (true negatives), but classifies 135 incorrectly as having depression (false positives). The positive predictive value of the test (the number correctly identified as having depression as a proportion of positive tests) is 40% (90/90 + 135), and the negative predictive value (the number correctly identified as not having depression as a proportion of negative tests) is 98% (765/765 + 10). Therefore, in this example a positive test result is correct in only 40% of cases, whilst a negative result can be relied upon in 98% of cases.

The example above illustrates some of the main differences between positive predictive values and negative predictive values in comparison with sensitivity and specificity. For both positive predictive values and negative predictive values prevalence explicitly forms part of their calculation (see Altman & Bland, 1994a). When the prevalence of a disorder is low in a population this is generally associated with a higher negative predictive value and a lower positive predictive value. Therefore, although these statistics are concerned with issues probably more directly applicable to clinical practice (for example, the probability that a person with a positive test result actually has depression), they are largely dependent on the characteristics of the populations sampled and cannot be universally applied (Altman & Bland, 1994a).

In contrast, sensitivity and specificity do not theoretically depend on prevalence (Altman & Bland, 1994b). For example, sensitivity is concerned with the performance of an identification test conditional on a person having depression. Therefore the higher false positives often associated with samples of low prevalence will not affect such estimates. The advantage of this approach is that sensitivity and specificity can be applied across populations (Altman & Bland, 1994b). However, the main disadvantage is that clinicians tend to find such estimates more difficult to interpret.

When describing the sensitivity and specificity of the different instruments, the GDG defined ‘excellent’ as values above 0.9, ‘good’ as 0.8 to 0.9, ‘moderate’ as 0.5 to 0.7, ‘low’ as 0.3 to 0.5, and ‘poor’ as less than 0.3.

Receiver operating curves

The qualities of a particular tool are summarised in a receiver operator characteristic (ROC) curve, which plots sensitivity (expressed as %) against (100-specificity) (see Figure 3).

Figure 3. ROC curve.

Figure 3

ROC curve.

A test with perfect discrimination would have an ROC curve that passed through the top left hand corner; that is, it would have 100% specificity and pick up all true positives with no false positives. Whilst this is never achieved in practice, the AUC measures how close the tool gets to the theoretical ideal. A perfect test would have an AUC of 1, and a test with AUC above 0.5 is better than chance. As discussed above, because these measures are based on sensitivity and 100-specificity, theoretically these estimates are not affected by prevalence.

Negative and positive likelihood ratios

Negative (LR−) and positive (LR+) likelihood ratios examine similar outcomes to negative and positive predictive values, for example, whether a person with a positive test actually has the disorder. The main difference is that likelihood ratios are thought not to be dependent on prevalence. LR− is calculated by sensitivity/1-specificity and LR+ is 1-sensitivity/specificity. A value of LR+ >5 and LR− <0.3 suggests the test is relatively accurate (Fischer et al., 2003).

Diagnostic odds ratios

The diagnostic odds ratio is calculated as (sensitivity × specificity)/[(1-sensitivity) × (1-specificity)] and is relatively independent of changes in prevalence. Tools with diagnostic odds ratios greater than 20 are likely to be useful for clinical practice.

3.5.10. Presenting the data to the Guideline Development Group

Study characteristics tables and, where appropriate, forest plots generated with Review Manager 4.2.7 (Cochrane Collection, 2004) were presented to the GDG to prepare a GRADE evidence profile table for each review and to develop recommendations.

Evidence profile tables

A GRADE evidence profile was used to summarise, with the exception of diagnostic studies (methods for these studies are at present not sufficiently developed), both the quality of the evidence and the results of the evidence synthesis (see Table 5 for an example of an evidence profile). For each outcome, quality may be reduced depending on the following factors:

Table 5. Example of GRADE evidence profile.

Table 5

Example of GRADE evidence profile.

  • study design (randomised trial, observational study, or any other evidence)
  • limitations (based on the quality of individual studies; see Appendix 11 for the quality checklists)
  • inconsistency (see section 3.5.9 for how consistency was measured)
  • indirectness (that is, how closely the outcome measures, interventions and participants match those of interest)
  • imprecision (based on the CI around the effect size).

For observational studies, the quality may be increased if there is a large effect, if plausible confounding would have changed the effect, or there is evidence of a dose–response gradient (details would be provided under the other considerations column). Each evidence profile also included a summary of the findings: number of patients included in each group, an estimate of the magnitude of the effect, and the overall quality of the evidence for each outcome.

The quality of the evidence was based on the quality assessment components (study design, limitations to study quality, consistency, directness and any other considerations) and graded using the following definitions:

  • High = further research is very unlikely to change our confidence in the estimate of the effect
  • Moderate = further research is likely to have an important impact on our confidence in the estimate of the effect and may change the estimate
  • Low = further research is very likely to have an important impact on our confidence in the estimate of the effect and is likely to change the estimate
  • Very low = any estimate of effect is very uncertain.

For further information about the process and the rationale of producing an evidence profile table, see GRADE (2004).

3.5.11. Forming the clinical summaries and recommendations

Once the GRADE profile tables relating to a particular clinical question were completed, summary tables incorporating important information from the GRADE profiles were developed (these tables are presented in the evidence chapters; the full profiles are in Appendix 21).

The evidence base for depression in adults with a chronic physical health problem was much more limited than the literature for depression in the general population. In the judgement of the GDG, the nature of depression in the physically ill is not fundamentally different from the broader population who do not experience additional physical illness. Therefore, the GDG decided to draw upon the evidence for depression more generally when forming recommendations. In doing so the GDG for this guideline worked closely with the GDG updating the depression guideline (NICE, 2009; NCCMH, 2010) and discussed the clinical questions and the outcome of the reviews with them.

Extrapolating evidence from other populations is a complex process, therefore it is important to have transparent and clear principles guiding these judgements. Table 6 summarises the main principles used by the GDG and examples of these in the guideline. Where there was evidence regarding people with a chronic physical health problem that contradicted that found in the general population, then extrapolation did not take place. When there were congruent findings (positive or negative evidence) in both the general population and the physically ill population, then evidence from both populations was considered. When there was positive evidence in the general population but no clear or robust evidence in the physically ill, then decisions on extrapolation were determined by the GDG.

Table 6. Principles for extrapolating from general depression population.

Table 6

Principles for extrapolating from general depression population.

Finally, the systematic reviewer in conjunction with the topic group lead produced a clinical evidence summary. Once the GRADE profiles and clinical summaries were finalised and agreed by the GDG and the evidence from depression in the general populations was taken into account, the associated recommendations were drafted, taking into account the trade-off between the benefits and downsides of treatment as well as other important factors. These included economic considerations, values of the GDG and society and the GDG's awareness of practical issues (Eccles et al., 1998). The confidence surrounding the evidence in the depression update guideline (NCCMH, 2010) also influenced the GDG's decision to extrapolate.

3.5.12. Method used to answer a clinical question in the absence of appropriately designed, high-quality research

In the absence of appropriately designed, high-quality research, or where the GDG was of the opinion (on the basis of previous searches or their knowledge of the literature) that there were unlikely to be such evidence, either an informal or formal consensus process was adopted. This process focused on those questions that the GDG considered a priority.

Informal consensus

The starting point for the process of informal consensus was that a member of the topic group identified, with help from the systematic reviewer, a narrative review that most directly addressed the clinical question. Where this was not possible, a brief review of the recent literature was initiated.

This existing narrative review or new review was used as a basis for beginning an iterative process to identify lower levels of evidence relevant to the clinical question and to lead to written statements for the guideline. The process involved a number of steps:

  • A description of what is known about the issues concerning the clinical question was written by one of the topic group members.
  • Evidence from the existing review or new review was then presented in narrative form to the GDG and further comments were sought about the evidence and its perceived relevance to the clinical question.
  • Based on the feedback from the GDG, additional information was sought and added to the information collected. This may include studies that did not directly address the clinical question but were thought to contain relevant data.
  • If, during the course of preparing the report, a significant body of primary-level studies (of appropriate design to answer the question) was identified, a full systematic review was done.
  • At this time, subject possibly to further reviews of the evidence, a series of statements that directly addressed the clinical question were developed.
  • Following this, on occasions and as deemed appropriate by the GDG, the report was then sent to appointed experts outside the GDG for peer review and comment. The information from this process was then fed back to the GDG for further discussion of the statements.
  • Recommendations were then developed and could also be sent for further external peer review.
  • After this final stage of comment, the statements and recommendations were again reviewed and agreed upon by the GDG.

3.6. HEALTH ECONOMICS METHODS

The aim of the health economics was to contribute to the guideline's development by providing evidence on the cost effectiveness of interventions for people with depression and a chronic physical health problem covered in the guideline. This was achieved by:

  • a systematic literature review of existing economic evidence
  • economic modelling, where economic evidence was lacking or was considered inadequate to inform decisions; areas for further economic analysis were prioritised based on anticipated resource implications of the respective recommendations as well as on the quality and availability of respective clinical data.

Systematic search of the economic literature was undertaken on all areas covered in this guideline. Moreover, literature on health-related quality of life (HRQoL) of people with depression was systematically searched to identify studies reporting utility weights appropriate for people with a comorbid chronic physical health problem that could be utilised in a cost-utility analysis.

In addition to the systematic review of economic literature, the following economic issues were identified by the GDG in collaboration with the health economist as key priorities for economic modelling in this guideline:

  • cost effectiveness of collaborative care versus usual care in the care of those with moderate and severe depression and a chronic physical health problem
  • cost analysis of low-intensity psychological interventions.

These topics were selected after considering potential resource implications of the respective recommendations.

The rest of this section describes the methods adopted in the systematic literature review of economic studies undertaken for this guideline. Methods employed in de novo economic modelling carried out for this guideline are described in the respective sections of the guideline.

3.6.1. Search strategy

For the systematic review of economic evidence the standard mental-health-related bibliographic databases (EMBASE, MEDLINE, CINAHL and PsycINFO) were searched. For these databases, a health economics search filter adapted from the Centre for Reviews and Dissemination at the University of York was used in combination with a general search strategy for depression. Additional searches were performed in specific health economics databases (NHS Office of Health Economics Health Economic Evaluation Database [OHE HEED]), as well as in the Health Technology Assessment (HTA) database. For the HTA and NHS EED databases, the general strategy for depression was used. OHE HEED was searched using a shorter, database-specific strategy. Initial searches were performed in early 2008. The searches were updated regularly, with the final search performed in January 2009. Details of the search strategy for economic studies on interventions for people with depression and a chronic physical health problem are provided in Appendix 13.

In parallel to searches of electronic databases, reference lists of eligible studies and relevant reviews were searched by hand. Studies included in the clinical evidence review were also screened for economic evidence.

The systematic search of the literature identified approximately 35,000 references (stage 1). Publications that were clearly not relevant were excluded (stage 2). The abstracts of all potentially relevant publications were then assessed against a set of selection criteria by the health economist (stage 3). Full texts of the studies potentially meeting the selection criteria (including those for which eligibility was not clear from the abstract) were obtained (stage 4). Studies that did not meet the inclusion criteria, were duplicates, were secondary publications to a previous study, or had been updated in more recent publications were subsequently excluded (stage 5). Finally, all papers eligible for inclusion were assessed for internal validity and critically appraised (stage 6). The quality assessment was based on the checklists used by the British Medical Journal to assist referees in appraising full and partial economic analyses (Drummond & Jefferson, 1996) (see Appendix 14).

3.6.2. Selection criteria

The following inclusion criteria were applied to select studies identified by the economic searches for further analysis:

  • Only papers published in English language were considered.
  • Studies published from 1998 onwards were included. This date restriction was imposed in order to obtain data relevant to current healthcare settings and costs.
  • Only studies from Organisation for Economic Co-operation and Development countries were included, as the aim of the review was to identify economic information transferable to the UK context.
  • Selection criteria based on types of clinical conditions and patients were identical to the clinical literature review.
  • Studies were included provided that sufficient details regarding methods and results were available to enable the methodological quality of the study to be assessed, and provided that the study's data and results were extractable. Poster presentations and abstracts were excluded from the review.
  • Full economic evaluations that compared two or more relevant options and considered both costs and consequences (that is, cost–consequence analysis, cost-effectiveness analysis, cost–utility analysis or cost–benefit analysis) were included in the review.
  • Studies were included if they used clinical effectiveness data from an RCT, a prospective cohort study, or a systematic review and meta-analysis of clinical studies. Studies were excluded if they had a mirror-image or other retrospective design, or if they utilised efficacy data that were based mainly on assumptions.

3.6.3. Data extraction

Data were extracted by the health economist using a standard economic data extraction form (Appendix 15).

3.6.4. Presentation of economic evidence

The economic evidence identified by the health economics systematic review is summarised in the respective chapters of the guideline, following presentation of the clinical evidence. The references to included studies, as well as the evidence tables with the characteristics and results of economic studies included in the review, are provided in Appendix 17. Methods and results of economic modelling are reported in the economic sections of the respective evidence chapters.

3.7. METHODS FOR REVIEWING EXPERIENCE OF CARE

3.7.1. Introduction

The chapter on experience of care (Chapter 4) presents two different types of evidence: interviews from the Healthtalkonline website (www.healthtalkonline.org) and a review of the qualitative literature.

3.7.2. Interviews from Healthtalkonline

Using the interviews of people with depression and a chronic physical health problem available from healthtalkonline.org, the review team analysed the available data and identified emergent themes. Each transcript was read and re-read, and sections of the text were collected under different headings using a qualitative software program (NVivo). Two reviewers independently coded the data and all themes were discussed to generate a list of the main themes. The evidence is presented in the form of these themes, with selected quotations from the interviews. The methods used to synthesise the qualitative data are in line with good practice (Braun & Clarke, 2006).

3.7.3. Review of the qualitative literature

A systematic search for published reviews of relevant qualitative studies of people with depression and a chronic physical health problem was undertaken using standard NCCMH procedures as described in the other evidence chapters. Reviews were sought of qualitative studies that used relevant first-hand experiences of people with depression and a chronic physical health problem and their families or carers. The GDG did not specify a particular outcome. Instead, the review was concerned with any narrative data that highlighted the experience of care. The evidence is presented in the form of themes, which were again developed and reviewed by the topic group.

3.7.4. From evidence to recommendations

The themes emerging from the qualitative analysis of the Healthtalkonline transcripts and the literature review were reviewed by the topic group. They are summarised in Chapter 4 and this summary provides the evidence for the recommendations that appear in that chapter.

3.8. STAKEHOLDER CONTRIBUTIONS

Professionals, people with depression and a chronic physical health problem and companies have contributed to and commented on the guideline at key stages in its development. Stakeholders for this guideline include:

  • people with depression and a chronic physical health problem/carer stakeholders: the national organisations for people with depression and a chronic physical health problem and carers that represent people whose care is described in this guideline
  • professional stakeholders: the national organisations that represent healthcare professionals who are providing services to people with depression and a chronic physical health problem
  • commercial stakeholders: the companies that manufacture medicines used in the treatment of depression in people with a chronic physical health problem
  • Primary Care Trusts
  • Department of Health and Welsh Assembly Government.

Stakeholders have been involved in the guideline's development at the following points:

  • commenting on the initial scope of the guideline and attending a briefing meeting held by NICE
  • contributing possible clinical questions and lists of evidence to the GDG
  • commenting on the draft of the guideline (see Appendices 4 and 5)
  • highlighting factual errors in the pre-publication check.

3.9. VALIDATION OF THE GUIDELINE

Registered stakeholders had an opportunity to comment on the draft guideline, which was posted on the NICE website during the consultation period. Following the consultation, all comments from stakeholders and others were responded to, and the guideline updated as appropriate. The GRP also reviewed the guideline and checked that stakeholders' comments had been addressed.

Following the consultation period, the GDG finalised the recommendations and the NCCMH produced the final documents. These were then submitted to NICE for the pre-publication check where stakeholders were given the opportunity to highlight factual errors. Any errors were corrected by the NCCMH, then the guideline was formally approved by NICE and issued as guidance to the NHS in England and Wales.

Footnotes

2

Unpublished full trial reports were also accepted where sufficient information was available to judge eligibility and quality (see Section 3.5.7).