METHODS USED TO DEVELOP THIS GUIDELINE

National Collaborating Centre for Mental Health (UK)

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Collaborating Centre for Mental Health (UK). Alcohol-Use Disorders: Diagnosis, Assessment and Management of Harmful Drinking and Alcohol Dependence. Leicester (UK): British Psychological Society (UK); 2011. (NICE Clinical Guidelines, No. 115.)

Alcohol-Use Disorders: Diagnosis, Assessment and Management of Harmful Drinking and Alcohol Dependence.

Show details

Contents

< Prev Next >

3METHODS USED TO DEVELOP THIS GUIDELINE

3.1. OVERVIEW

The development of this guideline drew upon methods outlined by NICE (further information is available in The Guidelines Manual; NICE, 2009a). A team of health professionals, lay representatives and technical experts known as the Guideline Development Group (GDG), with support from the NCCMH staff, undertook the development of a patient-centred, evidence-based guideline. There are six basic steps in the process of developing a guideline:

Define the scope, which sets the parameters of the guideline and provides a focus and steer for the development work.
Define review questions considered important for practitioners and service users.
Develop criteria for evidence searching and search for evidence.
Design validated protocols for systematic review and apply to evidence recovered by search.
Synthesise and (meta-) analyse data retrieved, guided by the review questions, and produce Grading of Recommendations: Assessment, Development and Evaluation (GRADE) evidence profiles and summaries.
Answer review questions with evidence-based recommendations for clinical practice.

The clinical practice recommendations made by the GDG are therefore derived from the most up-to-date and robust evidence base for the clinical and cost effectiveness of the treatments and services used in the treatment and management of alcohol dependence and harmful alcohol use. In addition, to ensure a service user and carer focus, the concerns of service users and carers regarding health and social care have been highlighted and addressed by recommendations agreed by the whole GDG.

3.2. THE SCOPE

Guideline topics are selected by the Department of Health and the Welsh Assembly Government, which identify the main areas to be covered by the guideline in a specific remit (see The Guidelines Manual [NICE, 2009a] for further information). The NCCMH developed a scope for the guideline based on the remit. The purpose of the scope is to:

provide an overview of what the guideline will include and exclude
identify the key aspects of care that must be included
set the boundaries of the development work and provide a clear framework to enable work to stay within the priorities agreed by NICE and the National Collaborating Centre, and the remit from the Department of Health/Welsh Assembly Government
inform the development of the review questions and search strategy
inform professionals and the public about expected content of the guideline
keep the guideline to a reasonable size to ensure that its development can be carried out within the allocated period.

An initial draft of the scope was sent to registered stakeholders who had agreed to attend a scoping workshop. The workshop was used to:

obtain feedback on the selected key clinical issues
identify which patient or population subgroups should be specified (if any)
seek views on the composition of the GDG
encourage applications for GDG membership.

The draft scope was subject to consultation with registered stakeholders over a 4-week period. During the consultation period, the scope was posted on the NICE website (www.nice.org.uk). Comments were invited from stakeholder organisations and the Guideline Review Panel (GRP). Further information about the GRP can also be found on the NICE website. The NCCMH and NICE reviewed the scope in light of comments received, and the revised scope was signed off by the GRP.

3.3. THE GUIDELINE DEVELOPMENT GROUP

The GDG consisted of: professionals in psychiatry, clinical psychology, nursing, social work, and general practice; academic experts in psychiatry and psychology; and service user, lay member and carer representatives. The guideline development process was supported by staff from the NCCMH, who undertook the clinical and health economic literature searches, reviewed and presented the evidence to the GDG, managed the process and contributed to drafting the guideline.

3.3.1. Guideline Development Group meetings

Twelve GDG meetings were held between March 2009 and September 2010. During each day-long GDG meeting, in a plenary session, review questions and clinical and economic evidence were reviewed and assessed, and recommendations formulated. At each meeting, all GDG members declared any potential conflicts of interest, and service user and carer concerns were routinely discussed as part of a standing agenda.

3.3.2. Topic groups

The GDG divided its workload along clinically relevant lines to simplify the guideline development process, and GDG members formed smaller topic groups to undertake guideline work in that area of clinical practice. Topic group membership was decided after a discussion between all GDG members, and each topic group was chaired by a GDG member with expert knowledge of the topic area (one of the healthcare professionals). Topic Group 1 covered questions relating to pharmacological intervention. Topic Group 2 covered psychological and psychosocial interventions. Topic Group 3 covered assessment of alcohol misuse, Topic Group 4 covered service user and carer experiences of care, and Topic Group 5 covered delivery settings for treatment. These groups were designed to efficiently manage the large volume of evidence appraisal prior to presenting it to the GDG as a whole. Topic groups refined the review questions and the clinical definitions of treatment interventions, reviewed and prepared the evidence with the systematic reviewer before presenting it to the GDG as a whole, and helped the GDG to identify further expertise in the topic. Topic group leaders reported the status of the group's work as part of the standing agenda. They also introduced and led the GDG discussion of the evidence review for that topic and assisted the GDG Chair in drafting the section of the guideline relevant to the work of each topic group. All statements and recommendations in this guideline have been agreed by the whole GDG.

3.3.3. Service users and carers

Individuals with direct experience of services gave an integral service-user focus to the GDG and the guideline. The GDG included service user, carer and lay member representatives who contributed as full GDG members to writing the review questions, helping to ensure that the evidence addressed their views and preferences, highlighting sensitive issues and terminology relevant to the guideline, and bringing service-user research to the attention of the GDG. In drafting the guideline, they contributed to writing Chapter 4 and identified recommendations from the service user and carer perspective.

3.3.4. Special advisors

Special advisors, who had specific expertise in one or more aspects of treatment and management relevant to the guideline, assisted the GDG, commenting on specific aspects of the developing guideline and making presentations to the GDG. Appendix 3 lists those who agreed to act as special advisors.

3.3.5. National and international experts

National and international experts in the area under review were identified through the literature search and through the experience of the GDG members. These experts were contacted to recommend unpublished or soon-to-be published studies to ensure that up-to-date evidence was included in the development of the guideline. They informed the group about completed trials at the pre-publication stage, systematic reviews in the process of being published, studies relating to the cost effectiveness of treatment, and trial data if the GDG could be provided with full access to the complete trial report. Appendix 6 lists researchers who were contacted.

3.3.6. Integration of other guidelines on alcohol-use disorders

In addition to this guideline, there are two other pieces of NICE guidance addressing alcohol-use disorders outlined in Chapter 1. During development, steering group meetings were held in which representatives from the three development groups met to discuss any issues, such as overlapping areas of review work and integration of the guidelines.

3.4. REVIEW QUESTIONS

Review (clinical) questions were used to guide the identification and interrogation of the evidence base relevant to the topic of the guideline. The draft review questions were discussed by the GDG at the first few meetings and amended as necessary. Where appropriate, the questions were refined once the evidence had been searched and, where necessary, subquestions were generated. Questions submitted by stakeholders were also discussed by the GDG and the rationale for not including any questions was recorded in the minutes. The final list of review questions can be found in Appendix 7.

For questions about interventions, the Patient, Intervention, Comparison and Outcome (PICO) framework was used (see Table 2).

Table 2

Features of a well-formulated question on effectiveness intervention – the PICO guide.

Questions relating to assessment and diagnosis do not involve an intervention designed to treat a particular condition, therefore the PICO framework was not used. Rather, the questions were designed to identify key issues specifically relevant to diagnostic tests, for example their accuracy, reliability and safety.

In some situations, the prognosis of a particular condition is of fundamental importance, over and above its general significance in relation to specific interventions. Areas where this is particularly likely to occur relate to assessment of risk, for example in terms of behaviour modification or screening and early intervention. In addition, review questions related to issues of service delivery are occasionally specified in the remit from the Department of Health/Welsh Assembly Government. In these cases, appropriate review questions were developed to be clear and concise.

To help facilitate the literature review, a note was made of the best study design type to answer each question. There are four main types of review question of relevance to NICE guidelines. These are listed in Table 3. For each type of question the best primary study design varies, where ‘best’ is interpreted as ‘least likely to give misleading answers to the question’.

Table 3

Best study design to answer each type of question.

However, in all cases a well-conducted systematic review (of the appropriate type of study) is likely to yield a better answer than a single study.

Deciding on the best design type to answer a specific review question does not mean that studies of different design types addressing the same question were discarded.

The GDG classified each review question into one of three groups: (1) questions concerning good practice; (2) questions likely to have little or no directly relevant evidence; and (3) questions likely to have a good evidence base. Questions concerning good practice were answered by the GDG using informal consensus. For questions that were unlikely to have a good evidence base, a brief descriptive review was initially undertaken and then the GDG used informal consensus to reach a decision (see Section 3.5.7). For questions with a good evidence base, the review process followed the methods outlined in Section 3.5.1.

3.5. CLINICAL EVIDENCE METHODS

The aim of the clinical evidence review was to systematically identify and synthesise relevant evidence from the literature to answer the specific review questions developed by the GDG. Thus, clinical practice recommendations are evidence-based where possible and, if evidence is not available, informal consensus methods are used (see Section 3.5.7) and the need for future research is specified.

3.5.1. The search process

Scoping searches

A broad preliminary search of the literature was undertaken in September 2008 to obtain an overview of the issues likely to be covered by the scope and to help define key areas. Searches were restricted to clinical guidelines, health technology assessment (HTA) reports, key systematic reviews and RCTs, and conducted in the following databases and websites:

British Medical Journal Clinical Evidence
Canadian Medical Association (CMA) Infobase (Canadian guidelines)
Clinical Policy and Practice Program of the New South Wales Department of Health (Australia)
Clinical Practice Guidelines (Australian Guidelines)
Cochrane Central Register of Controlled Trials (CENTRAL)
Cochrane Database of Abstracts of Reviews of Effects (DARE)
Cochrane Database of Systematic Reviews (CDSR)
Excerpta Medica Database (EMBASE)
Guidelines International Network (G-I-N)
Health Evidence Bulletin Wales
Health Management Information Consortium (HMIC)
HTA database (technology assessments)
Medical Literature Analysis and Retrieval System Online (MEDLINE)/MEDLINE in Process
National Health and Medical Research Council (NHMRC)
National Library for Health (NLH) Guidelines Finder
New Zealand Guidelines Group
NHS Centre for Reviews and Dissemination (CRD)
OmniMedicalSearch
Scottish Intercollegiate Guidelines Network (SIGN)
Turning Research Into Practice (TRIP)
US Agency for Healthcare Research and Quality (AHRQ)
Websites of NICE and the National Institute for Health Research (NIHR) HTA Programme for guidelines and HTAs in development.

Existing NICE guidelines were updated where necessary. Other relevant guidelines were assessed for quality using the AGREE instrument (AGREE Collaboration, 2003). The evidence base underlying high-quality existing guidelines was utilised and updated as appropriate. Further information about this process can be found in The Guidelines Manual (NICE, 2009a).

Systematic literature searches

After the scope was finalised, a systematic search strategy was developed to locate all the relevant evidence. The balance between sensitivity (the power to identify all studies on a particular topic) and specificity (the ability to exclude irrelevant studies from the results) was carefully considered, and a decision made to utilise a broad approach to searching, to maximise the retrieval of evidence to all parts of the guideline. Searches were restricted to: systematic reviews, meta-analyses, RCTs, observational studies, quasi-experimental studies and qualitative research. Searches were conducted in the following databases:

Allied and Complementary Medicine Database (AMED)
Cumulative Index to Nursing and Allied Health Literature (CINAHL)
EMBASE
MEDLINE/MEDLINE In-Process
Psychological Information Database (PsycINFO)
DARE
CDSR
CENTRAL
HTA database.

For standard mainstream bibliographic databases (AMED, CINAHL, EMBASE, MEDLINE and PsycINFO), search terms for alcohol dependence and harmful alcohol use were combined with study design filters for systematic reviews, RCTs and qualitative research. For searches generated in databases with collections of study designs at their focus (DARE, CDSR, CENTRAL and HTA), search terms for alcohol dependence and harmful alcohol use were used without a filter. The sensitivity of this approach was aimed at minimising the risk of overlooking relevant publications, due to inaccurate or incomplete indexing of records, as well as potential weaknesses resulting from more focused search strategies (for example, for interventions).

For focused searches, terms for case management and assertive community treatment (ACT) were combined with terms for alcohol dependence and harmful alcohol use, and filters for observational and quasi-experimental studies.

Reference manager

Citations from each search were downloaded into Reference Manager (a software product for managing references and formatting bibliographies) and duplicates removed. Records were then screened against the inclusion criteria of the reviews before being quality appraised (see Section 3.5.2). To keep the process both replicable and transparent, the unfiltered search results were saved and retained for future potential re-analysis.

Search filters

The search filters for systematic reviews and RCTs are adaptations of filters designed by the CRD and the Health Information Research Unit of McMaster University, Ontario. The qualitative, observational and quasi-experimental filters were developed in-house. Each filter comprised index terms relating to the study type(s) and associated text words for the methodological description of the design(s).

Date and language restrictions

Date restrictions were not applied, except for searches of systematic reviews, which were limited to research published from 1993 onwards. Systematic database searches were initially conducted in June 2008 up to the most recent searchable date. Search updates were generated on a 6-monthly basis, with the final re-runs carried out in March 2010 ahead of the guideline consultation. After this point, studies were only included if they were judged by the GDG to be exceptional (for example, if the evidence was likely to change a recommendation).

Post-guideline searching: following the draft guideline consultation, searches for observational and quasi-experimental studies were conducted for case management and ACT.

Although no language restrictions were applied at the searching stage, foreign language papers were not requested or reviewed unless they were of particular importance to a review question.

Other search methods

Other search methods involved: (1) scanning the reference lists of all eligible publications (systematic reviews, stakeholder evidence and included studies) for more published reports and citations of unpublished research; (2) sending lists of studies meeting the inclusion criteria to subject experts (identified through searches and the GDG) and asking them to check the lists for completeness, and to provide information of any published or unpublished research for consideration (see Appendix 3); (3) checking the tables of contents of key journals for studies that might have been missed by the database and reference list searches; (4) tracking key papers in the Science Citation Index (prospectively) over time for further useful references.

Full details of the search strategies and filters used for the systematic review of clinical evidence are provided in Appendix 9.

Study selection and quality assessment

All primary-level studies included after the first scan of citations were acquired in full and re-evaluated for eligibility at the time when they were being entered into the study information database. More specific eligibility criteria were developed for each review question and are described in the relevant clinical evidence chapters. Eligible systematic reviews and primary-level studies were critically appraised for methodological quality (see Appendix 11 for methodology checklists). The eligibility of each study was confirmed by at least one member of the appropriate topic group.

For some review questions, it was necessary to prioritise the evidence with respect to the UK context (that is, external validity). To make this process explicit, the topic groups took into account the following factors when assessing the evidence:

participant factors (for example, gender, age and ethnicity)
provider factors (for example, model fidelity, the conditions under which the intervention was performed and the availability of experienced staff to undertake the procedure)
cultural factors (for example, differences in standard care and the welfare system).

It was the responsibility of each topic group to decide which prioritisation factors were relevant to each review question in light of the UK context. Any issues and discussions within topic groups were brought back to the wider GDG for further consideration.

Unpublished evidence

The GDG used a number of criteria when deciding whether or not to accept unpublished data. First, the evidence must have been accompanied by a trial report containing sufficient detail to properly assess the quality of the data. Second, the evidence must have been submitted with the understanding that data from the study and a summary of the study's characteristics would be published in the full guideline. Therefore, the GDG did not accept evidence submitted as commercial in confidence. However, the GDG recognised that unpublished evidence submitted by investigators might later be retracted by those investigators if the inclusion of such data would jeopardise publication of their research.

3.5.2. Data extraction

Study characteristics and outcome data were extracted from all eligible studies that met the minimum quality criteria using a Microsoft Word-based form (see Appendix 11).

In most circumstances, for a given outcome (continuous and dichotomous), where more than 50% of the number randomised to any group were lost to follow-up, the data were excluded from the analysis (except for the outcome ‘leaving the study early’, in which case the denominator was the number randomised). Where possible, dichotomous efficacy outcomes were calculated on an intention-to-treat basis (that is, a ‘once-randomised-always-analyse’ basis). Where there was good evidence that those participants who ceased to engage in the study were likely to have an unfavourable outcome, early withdrawals were included in both the numerator and denominator. Adverse effects were entered into Review Manager, as reported by the study authors, because it is usually not possible to determine whether early withdrawals have had an unfavourable outcome. Where there was limited data for a particular review, the 50% rule was not applied. In these circumstances the evidence was downgraded due to the risk of bias.

Where some of the studies failed to report standard deviations (for a continuous outcome) and where an estimate of the variance could not be computed from other reported data or obtained from the study author, the following approach was taken.⁶

When the number of studies with missing standard deviations was less than one third and when the total number of studies was at least ten, the pooled standard deviation was imputed (calculated from all the other studies in the same meta-analysis that used the same version of the outcome measure). In this case, the appropriateness of the imputation was made by comparing the standardised mean differences (SMDs) of those trials that had reported standard deviations against the hypothetical SMDs of the same trials based on the imputed standard deviations. If they converged, the meta-analytical results were considered to be reliable.

When the conditions above could not be met, standard deviations were taken from another related systematic review (if available). In this case, the results were considered to be less reliable.

The meta-analysis of survival data, such as time to any drinking episode, was based on log hazard ratios and standard errors. Since individual patient data were not available in included studies, hazard ratios and standard errors calculated from a Cox proportional hazard model were extracted. Where necessary, standard errors were calculated from confidence intervals (CIs) or p-value according to standard formulae (see Cochrane Handbook for Systematic Reviews of Interventions, 5.0.2, Higgins et al., 2009). Data were summarised using the generic inverse variance method, using Review Manager.

Consultation with another reviewer or members of the GDG was used to overcome difficulties with coding. Data from studies included in existing systematic reviews were extracted independently by one reviewer and cross-checked with the existing data set. Where possible, two independent reviewers extracted data from new studies. Where double data extraction was not possible, data extracted by one reviewer was checked by the second reviewer. Disagreements were resolved through discussion. Where consensus could not be reached, a third reviewer or GDG members resolved the disagreement. Masked assessment (that is, blind to the journal from which the article comes, the authors, the institution and the magnitude of the effect) was not used since it is unclear that doing so reduces bias (Berlin, 2001; Jadad et al., 1996).

3.5.3. Synthesising the evidence

Meta-analysis

Where possible, meta-analysis was used to synthesise the evidence using Review Manager. If necessary, reanalyses of the data or sub-analyses were used to answer review questions not addressed in the original studies or reviews.

Dichotomous outcomes were analysed as relative risks (RR) with the associated 95% CI (for an example, see Figure 1). A relative risk (also called a risk ratio) is the ratio of the treatment event rate to the control event rate. An RR of 1 indicates no difference between treatment and control. In Figure 1, the overall RR of 0.73 indicates that the event rate (that is, non-remission rate) associated with intervention A is about three quarters of that with the control intervention or, in other words, the RR reduction is 27%.

Figure 1

Example of a forest plot displaying dichotomous data.

The CI shows a range of values within which we are 95% confident that the true effect will lie. If the effect size has a CI that does not cross the ‘line of no effect’, then the effect is commonly interpreted as being statistically significant.

Continuous outcomes were analysed using the SMD because different measures were used in different studies to estimate the same underlying effect (for an example see Figure 2). If reported by study authors, intention-to-treat data using a valid method for imputation of missing data were preferred over data only from people who completed the study.

Figure 2

Example of a forest plot displaying continuous data.

The number needed to treat for benefit (NNTB) or the number needed to treat for harm (NNTH) was reported for each outcome where the baseline risk (that is, the control group event rate) was similar across studies. In addition, numbers needed to treat (NNTs) calculated at follow-up were only reported where the length of follow-up was similar across studies. When the length of follow-up or baseline risk varies (especially with low risk), the NNT is a poor summary of the treatment effect (Deeks, 2002).

Heterogeneity

To check for consistency of effects among studies, both the I² statistic and the chi-squared test of heterogeneity as well as a visual inspection of the forest plots were used. The I² statistic describes the proportion of total variation in study estimates that is due to heterogeneity (Higgins & Thompson, 2002). The I² statistic was interpreted in the following way based on Higgins and Green (2009):

0 to 40%: might not be important
30 to 60%: may represent moderate heterogeneity
50 to 90%: may represent substantial heterogeneity
75 to 100%: considerable heterogeneity.

Two factors were used to make a judgement about importance of the observed value of I²: first, the magnitude and direction of effects, and second, the strength of evidence for heterogeneity (for example, p-value from the chi-squared test, or a CI for I²).

Publication bias

Where there was sufficient data, we intended to use funnel plots to explore the possibility of publication bias. Asymmetry of the plot would be taken to indicate possible publication bias and investigated further. However, due to a paucity of data, funnel plots could not be used.

Where necessary, an estimate of the proportion of eligible data that were missing (because some studies did not include all relevant outcomes) was calculated for each analysis.

3.5.4. Summary statistics used to evaluate assessment instruments

The main outcomes that need to be extracted from diagnostic accuracy studies are sensitivity, specificity, positive predictive validity and negative predictive validity. These are discussed in detail below. Negative likelihood ratios, positive likelihood ratios and area under the curve will also be briefly described. In addition, definitions of relevant validation and reliability assessment strategies will be provided below.

The sensitivity of an instrument refers to the proportion of those with the condition who test positive. An instrument that detects a low percentage of cases will not be very helpful in determining the numbers of patients who should receive a known effective treatment because many individuals who should receive the treatment will not do so. This would make for poor planning, and underestimate the prevalence of the disorder and the costs of treatments to the community. As the sensitivity of an instrument increases, the number of false negatives it detects will decrease.

The specificity of an instrument refers to the proportion of those without the condition who test negative. This is important so that well individuals are not given treatments they do not need. As the specificity of an instrument increases, the number of false positives will decrease.

To illustrate this: from a population in which the point prevalence rate of alcohol dependence is 10% (that is, 10% of the population has alcohol dependence at any one time), 1000 people are given a test that has 90% sensitivity and 85% specificity. It is known that 100 people in this population have alcohol dependence, but the test detects only 90 (true positives), leaving ten undetected (false negatives). It is also known that 900 people do not have alcohol dependence and the test correctly identifies 765 of these (true negatives), but classifies 135 incorrectly as having alcohol dependence (false positives). The positive predictive value of the test (the number correctly identified as having alcohol dependence as a proportion of positive tests) is 40% (90/90 + 135) and the negative predictive value (the number correctly identified as not having alcohol dependence as a proportion of negative tests) is 98% (765/765 + 10). Therefore, in this example a positive test result is correct in only 40% of cases whilst a negative result can be relied upon in 98% of cases.

The example above illustrates some of the main differences between positive predictive values and negative predictive values in comparison with sensitivity and specificity. For both positive predictive values and negative predictive values, prevalence explicitly forms part of their calculation (see Altman & Bland, 1994a). When the prevalence of a disorder is low in a population this is generally associated with a higher negative predictive value and a lower positive predictive value. Therefore, although these statistics are concerned with issues probably more directly applicable to clinical practice (for example, the probability that a person with a positive test result actually has alcohol dependence), they are largely dependent on the characteristics of the populations sampled and cannot be universally applied (Altman & Bland, 1994a).

In contrast, sensitivity and specificity do not theoretically depend on prevalence (Altman & Bland, 1994b). For example, sensitivity is concerned with the performance of an identification test conditional on a person having depression. Therefore the higher false positives often associated with samples of low prevalence will not affect such estimates. The advantage of this approach is that sensitivity and specificity can be applied across populations (Altman & Bland, 1994b). However, the main disadvantage is that clinicians tend to find such estimates more difficult to interpret.

Criterion validity (or predictive validity) is evaluated when the purpose is to use an instrument to estimate some important form of behaviour that is external to the measuring instrument itself, the latter being referred to as the criterion (Nunnally, 1978). Criterion validity evaluates how well scores on a measure relate to real-world behaviours such as motivation for treatment and long-term treatment outcomes. The degree of correspondence between the test and the criterion is estimated by the size of their correlation.

Construct validity refers to the experimental demonstration that a test is measuring the construct it was intended to measure. Relationships among items, domains and concepts conform to a priori hypotheses concerning logical relationships that should exist with other measures or characteristics of patients and patient groups (Brown, 1996).

Content validity is derived from the degree to which a test is a representative sample of the content of whatever objectives or specifications the test was originally designed to measure (Brown, 1996).

Inter-rater reliability refers to the degree to which observers, or raters, are consistent in their scoring on a measurement scale. Internal reliability gives an indication of how much homogeneity or consensus there is amongst the raters (Allen, 2003).

Test–retest reliability is determined by administering the measurement instrument two or more times to each subject. If the correlation between scores is high, the measurement instrument can be said to have good test–retest reliability. This is desirable when measuring constructs that are not expected to change over time, for example family history of alcoholism, age of onset of problem drinking and general expectancies of alcohol effects. In contrast, when measuring more transient constructs such as cravings and treatment motivation, the test–retest reliability would be expected to be lower (Allen, 2003).

Internal consistency is a measure based on the correlation between different items within the scale itself. For instruments designed to measure a single phenomenon, these correlation coefficients should be high (Allen, 2003).

3.5.5. Presenting the data to the Guideline Development Group

Study characteristics tables and, where appropriate, forest plots generated with Review Manager were presented to the GDG.

Where meta-analysis was not appropriate and/or possible, the reported results from each primary-level study were included in the study characteristics table (and, where appropriate, in a narrative review).

Evidence profile tables

A GRADE⁷ evidence profile was used to summarise both the quality of the evidence and the results of the evidence synthesis (see Table 4 for an example of an evidence profile). The GRADE approach is based on a sequential assessment of the quality of evidence followed by judgement about the balance between desirable and undesirable effects and subsequent decision about the strength of a recommendation.

Table 4

Example of GRADE evidence profile.

For each outcome, quality may be reduced depending on the following factors:

study design (randomised trial, observational study, or any other evidence)
limitations (based on the quality of individual studies)
inconsistency (see Section 3.5.3 for how consistency was assessed)
indirectness (that is, how closely the outcome measures, interventions and participants match those of interest)
imprecision (based on the CI around the effect size).

For observational studies the quality may be increased if there is a large effect, plausible confounding would have changed the effect, or there is evidence of a dose–response gradient (details would be provided under the other considerations column). Each evidence profile also included a summary of the findings: number of patients included in each group, an estimate of the magnitude of the effect and the overall quality of the evidence for each outcome.

3.5.6. Forming the clinical summaries and recommendations

Once the GRADE evidence profiles relating to a particular review question were completed, summary evidence tables were developed (these tables are presented in the evidence chapters). Finally, the systematic reviewer in conjunction with the topic group lead produced a clinical evidence summary.

After the GRADE profiles and clinical summaries were presented to the GDG, the associated recommendations were drafted. In making recommendations, the GDG took into account the trade-off between the benefits and downsides of treatment as well as other important factors, such as economic considerations, social value judgements⁸, the requirements to prevent discrimination and to promote equality⁹, and the group's awareness of practical issues (Eccles et al., 1998; NICE, 2009a).

3.5.7. Method used to answer a review question in the absence of appropriately designed, high-quality research

In the absence of appropriately designed, high-quality research, or where the GDG were of the opinion (on the basis of previous searches or their knowledge of the literature) that there were unlikely to be such evidence, an informal consensus process was adopted. This process focused on those questions that the GDG considered a priority.

Informal consensus

The starting point for the process of informal consensus was that a member of the topic group identified, with help from the systematic reviewer, a narrative review that most directly addressed the review question. Where this was not possible, a brief review of the recent literature was initiated.

This existing narrative review or new review was used as a basis for beginning an iterative process to identify lower levels of evidence relevant to the review question and to lead to written statements for the guideline. The process involved a number of steps:

A description of what was known about the issues concerning the review question was written by one of the topic group members.
Evidence from the existing review or new review was then presented in narrative form to the GDG and further comments were sought about the evidence and its perceived relevance to the review question.
Based on the feedback from the GDG, additional information was sought and added to the information collected. This may include studies that did not directly address the review question but were thought to contain relevant data.
If, during the course of preparing the report, a significant body of primary-level studies (of appropriate design to answer the question) were identified, a full systematic review was done.
At this time, subject possibly to further reviews of the evidence, a series of statements that directly addressed the review question were developed.
Following this, on occasions and as deemed appropriate by the development group, the report was then sent to appointed experts outside of the GDG for peer review and comment. The information from this process was then fed back to the GDG for further discussion of the statements.
Recommendations were then developed and could also be sent for further external peer review.
After this final stage of comment, the statements and recommendations were again reviewed and agreed upon by the GDG.

3.6. HEALTH ECONOMICS METHODS

The aim of health economics was to contribute to the guideline's development by providing evidence on the cost effectiveness of interventions for alcohol misuse covered in the guideline. This was achieved by:

a systematic literature review of existing economic evidence
decision-analytic economic modelling.

Systematic reviews of economic literature were conducted in all areas covered in the guideline. Economic modelling was undertaken in areas with likely major resource implications, where the current extent of uncertainty over cost effectiveness was significant and economic analysis was expected to reduce this uncertainty, in accordance with The Guidelines Manual (NICE, 2009a). Prioritisation of areas for economic modelling was a joint decision between the health economist and the GDG. The rationale for prioritising review questions for economic modelling was set out in an economic plan agreed between NICE, the GDG, the health economist and the other members of the technical team. The following economic questions were selected as key issues that were addressed by economic modelling:

What is the preferred method of medically-assisted withdrawal, in terms of clinical and cost effectiveness (taking into consideration the benefits/adverse effects) and for which people and in which setting (taking into account the nature of intervention in each setting)?
–
Community (taking into account levels of supervision: structured versus unstructured day programme)
–
Residential
–
Inpatient: mental health or acute hospital
–
Prisons.
For people who are alcohol dependent or harmful drinkers, which pharmacological interventions aimed at attenuation of drinking/maintenance of abstinence are clinically and cost-effective?
For people who are alcohol dependent or harmful drinkers, which psychological and psychosocial interventions aimed at attenuation of drinking/maintenance of abstinence are clinically and cost-effective?
For people who are alcohol dependent or harmful drinkers, which combination of psychological/psychosocial and pharmacological interventions aimed at attenuation of drinking/maintenance of abstinence are clinically and cost-effective?

In addition, literature on the health-related quality of life of people with alcohol-use disorders was systematically searched to identify studies reporting appropriate utility scores that could be utilised in a cost-utility analysis.

The rest of this section describes the methods adopted in the systematic literature review of economic studies. Methods employed in economic modelling are described in the respective sections of the guideline.