NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Brasure M, MacDonald R, Fuchs E, et al. Management of Insomnia Disorder [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2015 Dec. (Comparative Effectiveness Reviews, No. 159.)

Cover of Management of Insomnia Disorder

Management of Insomnia Disorder [Internet].

Show details


Analytic Framework

Figure 1 provides an analytic framework to illustrate the population, interventions, outcomes, and adverse effects that will guide the literature search and synthesis.

Figure 1 is the analytical framework describing the flow of individuals with insomnia disorder through a treatment to outcomes. Treatments may be behavioral or psychological, pharmaceutical, complementary or alternative medicine, or a combination of types. Any of these treatments may have adverse effects. Treatments lead to sleep outcomes, global outcomes, and functioning, mood, and quality of life outcomes.

Figure 1

Analytic framework. CAM = complementary and alternative interventions; KQ = Key Question

Criteria for Inclusion/Exclusion of Studies in the Review

We included or excluded studies based on the PICOTS framework outlined above and the study-specific inclusion criteria described in Table 3. Treatments for insomnia disorder in primary care settings needed to address certain subpopulations such as the elderly. Coexisting diseases are common among patients with sleep problems, so we included studies that enrolled participants with comorbidities (sometimes called ‘secondary insomnia’) and trials enrolling pure subgroups of patients with certain conditions (i.e., anxiety, mild depression, noncancer pain). Other medical or mental health conditions (e.g., pregnancy, menopause, major depressive disorder, bipolar disorder, post-traumatic stress disorder, fibromyalgia, rheumatoid arthritis, Parkinson's disease, etc.) may explain insomnia symptoms, and therefore trials enrolling these subgroups were excluded; it is not clear that these patients meet diagnostic criteria for insomnia disorder. These conditions deserve the attention of a separate review and are considered outside the scope of this review. Insomnia disorder is a chronic condition, so a study duration of at least 4-weeks was required for eligibility. We included studies that reported subjective outcomes. Polysomography outcomes are not patient-centered and trials reporting only these outcomes were excluded. Providers use history and patient report to diagnose insomnia disorder and assess patient opinion regarding treatment. Providers are more likely to value a patients' perspective of improvement based upon their typical sleep routine. Sleep parameters obtained in a laboratory environment are not necessary or relevant to insomnia treatment.

Table 3. Study inclusion criteria.

Table 3

Study inclusion criteria.

Searching for the Evidence: Literature Search Strategies for Identification of Relevant Studies To Answer the Key Questions

We searched Ovid Medline®, Ovid PsycInfo®, Ovid Embase®, and the Cochrane Library to identify previous systematic reviews and randomized controlled trials published and indexed in bibliographic databases from 2004 through January 2015 (Appendix A). We chose our beginning literature search date in 2004 because previous systematic reviews with ending search dates from 2003 to 2005 were available. We identified eligible studies published prior to 2004 through these systematic reviews. Our search strategy included relevant medical subject headings and natural language terms for the concept of insomnia. This concept was combined with filters to select randomized controlled trials (RCTs) and systematic reviews. Bibliographic database searches were supplemented with backward citation searches of highly relevant systematic reviews. We relied on previous systematic reviews to identify studies published prior to 2004. Studies that were rated low or moderate risk of bias and had study durations of 4 weeks or more were identified in the previous Agency for Healthcare Research and Quality (AHRQ) review.33 This review is not an update of that review, but our Key Questions overlap with those of the previous AHRQ review. Other reviews were important to identify studies not included in the AHRQ review.34-37

Two independent investigators reviewed titles and abstracts of search results to identify systematic reviews and trials evaluating interventions for insomnia. Citations deemed eligible by either investigator underwent full text screening. Two investigators independently screened full text to determine if inclusion criteria were met. Discrepancies in screening decisions were resolved by consultation between investigators, and, if necessary, consultation with a third investigator. We documented the inclusion and exclusion status of citations undergoing full-text screening.

We conducted grey literature searching to identify relevant completed and ongoing studies. Relevant grey literature resources include and the FDA drug database. We also reviewed Scientific Information Packets (SIPs) sent by manufacturers of relevant drugs. Grey literature search results were used to identify studies, outcomes, and analyses not reported in the published literature to assess publication and reporting bias and inform future research needs.

Our study eligibility criteria including only RCTs identified few studies reporting rare or long-term harms associated with use of sleep medications for longer periods of time. We therefore supplemented our search with searches of bibliographic databases for large observational studies of individuals taking medications for insomnia. Studies had to have a sample size of 100 and a study duration of at least 6 months.

Data Abstraction and Data Management

We used data from relevant comparisons in previous systematic reviews to replace the de novo extraction process when the comparison was sufficiently relevant and the systematic review quality was assessed as fair or high (according to methods described below).

Remaining RCTs meeting inclusion criteria were distributed among investigators for risk of bias assessment and data extraction. For studies assessed as having low to moderate risk of bias (according to methods described below), one investigator extracted relevant study, population demographic, and outcomes data. Data fields extracted included author, year of publication; setting, subject inclusion and exclusion criteria, intervention and control characteristics (intervention components, timing, frequency, duration), followup duration, participant baseline demographics, comorbidities; insomnia definition, method of diagnosis and severity, descriptions and results of primary outcomes and adverse effects, and study funding source. Relevant data were extracted into Excel spreadsheets for descriptive analysis. Data were analyzed in RevMan 5.238 software. Data used in quantitative synthesis were checked for accuracy by a second investigator. Data appearing in final evidence tables are uploaded to the Systematic Review Data Repository.

Assessment of Methodological Risk of Bias of Individual Studies

Quality of systematic reviews meeting eligibility criteria was assessed using AMSTAR criteria.39 Two investigators independently assessed risk of bias for eligible RCTs using an assessment tool developed for this project (Appendix B).40 Investigators assess several types of bias including selection bias (method of randomization, group similarity at baseline, allocation concealment), performance bias (blinding of provider and recipient, intervention definition—theory based, manualized, fidelity to treatment), detection bias (outcome assessors blinded, instruments validated and reliable, clinical significance of outcomes, co-interventions avoided or similar, correction for multiple comparisons, power—if pooling not possible), attrition bias (extent of attrition, reasons for incomplete data provided, incomplete data handled appropriately), reporting bias (select group of outcomes reported, select analysis conducted), and other sources of bias. Certain items (such as adequacy of intervention definition and implementation) were especially necessary to adequately capture all potential risk of bias associated with psychological interventions. Each investigator summarized overall risk of bias for each study classifying it as low, moderate, or high based upon the collective risk of bias inherent in each domain and their confidence that the results were believable given the study's limitations. Both investigators' summary Risk of Bias assessments were aggregated. Studies that two investigators rated as high risk of bias were excluded from analysis.

Data Synthesis

When a comparison was adequately addressed by a previous systematic review of acceptable quality and no new studies were available, we reiterated the conclusions drawn from that review. When new trials were available, previous systematic review data were synthesized with data from additional trials by rerunning pooled analysis.

We summarized study characteristics and outcomes of RCTs not included in previous eligible systematic reviews in evidence tables. We grouped studies by population, intervention, and comparison. Studies that included adults of any age were classified as general adult population; studies that included only older adults (age cutoffs varied among studies) were classified as older adults. We assessed the clinical and methodological heterogeneity and variation in effect size to determine appropriateness of pooling data.41 Pooling was conducted when populations, interventions, and outcomes were sufficiently similar. Meta-analysis was performed using random effects models (DerSimonian and Laird models using RevMan 5.238 software). We calculated risk ratios (RR) and absolute risk differences (RD) with the corresponding 95% confidence intervals (CI) for binary primary outcomes. Weighted mean differences (WMD) and/or standardized mean differences (SMD) with the corresponding 95% CIs were calculated for continuous outcomes. We assessed statistical heterogeneity with Cochran's Q test and measured magnitude with I2 statistic.41 An I2 score of 50 percent suggests moderate heterogeneity and 75 percent or greater indicates substantial heterogeneity among studies.

Global outcomes were most often measured using the ISI and the PSQI (Table 4). We searched the literature to identify minimum important differences to facilitate interpretation of results for these outcomes. We identified one study estimating minimum important difference (MID) for the ISI.42 Distribution- and anchor-based approaches were used. The anchor-based approach used 14 variables from three different instruments (the SF-36 Health Survey, the Work Limitations Questionnaire, and the Fatigue Severity Scale) and the SF-36 Vitality scale as the anchors in estimating the MID for the ISI. Anchor-based MIDs are considered superior to distribution-based methods, but distribution-based MIDs can be supplemental or when anchor-based methods are not available.43 MIDs can vary depending on estimation method and population studied.44 MIDs are also often closely related to baseline values.45 Despite these complications, trials that conduct responder analysis based upon established MID offer simplistic interpretation. Unfortunately, many trials did not conduct responder analysis and report only mean scale scores or mean change in scale scores. It is not appropriate to apply the MID established based upon changes from baseline for individuals to WMDs between groups.44,46 We did not identify MIDs relevant to interpreting differences between groups. We therefore interpret the WMDs between groups in relation to the MID. WMDs between groups equal or above MID suggests that many patients may gain important benefits from treatment; WMDs between 0.5 (MID) and MID suggest that the treatment may benefit an appreciable number of individuals; and if the weighted mean difference falls below 0.5 (MID) suggests that it is less likely that that an appreciable number of patients will achieve important benefits from treatment.47

Table 4. Characteristics of instruments measuring global outcomes.

Table 4

Characteristics of instruments measuring global outcomes.

Grading the Strength of Evidence for Individual Comparisons and Outcomes

The overall strength of evidence for primary outcomes within each comparison was evaluated based on five required domains: (1) study limitations (risk of bias); (2) directness (single, direct link between intervention and outcome); (3) consistency (similarity of effect direction and size); (4) precision (degree of certainty around an estimate); and (5) reporting bias.49 Evidence from previous systematic reviews was reassessed based upon the information provided (evidence quality or attributes of the data and included studies) by the systematic review. Based on study design and conduct of the individual studies making up the body of evidence for a particular comparison, study limitations were rated low, medium, or high based upon the number and magnitude of limitations detected during risk of bias assessments. Consistency was rated as consistent, inconsistent, or unknown (e.g., single study) after comparing the direction and size of the effect across studies. Directness was rated direct or indirect depending upon whether the outcome measured had a direct link to patient wellbeing and if comparisons were direct. Precision was rated precise or imprecise based upon whether confidence intervals contain or exceed clinical differences. Reporting bias was rated as undetected or suspected after assessing the presence of publication bias, selective outcome reporting bias, and selective analysis bias. Reporting bias was assessed by comparing the methods section with results to identify outcomes or analysis not planned or reported. Other factors considered in assessing strength of evidence included dose-response relationship, the presence of confounders, and strength of association. These factors were used to upgrade or downgrade strength of evidence assessments arising from the five required domains. A strong association was suggested when the total number of trials, total number of participants, and effect size demonstrate a robust outcome. Based on these factors, the overall strength of evidence for each outcome was rated as:49

  • High: Very confident that estimate of effect lies close to true effect. Few or no deficiencies in body of evidence, findings believed to be stable.
  • Moderate: Moderately confident that estimate of effect lies close to true effect. Some deficiencies in body of evidence; findings likely to be stable, but some doubt.
  • Low: Limited confidence that estimate of effect lies close to true effect; major or numerous deficiencies in body of evidence. Additional evidence necessary before concluding that findings are stable or that estimate of effect is close to true effect.
  • Insufficient: No evidence, unable to estimate an effect, or no confidence in estimate of effect. No evidence is available or the body of evidence precludes judgment.

Assessing Applicability

Applicability of studies was determined according to the PICOTS framework. Study characteristics affecting applicability include, but are not limited to, the population from which the study participants were enrolled (i.e., studies enrolling participants from sleep medicine clinics may not produce results applicable to the general population of patients being treated for insomnia in primary care clinics), narrow eligibility criteria, and patient and intervention characteristics different from those described by population studies of insomnia.50 Specific factors that could modify the effect of treatment and affect applicability of findings include diagnostic accuracy, insomnia severity, and specific patient characteristics such as age.