NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Qayyum R, Wilson LM, Bolen S, et al. Comparative Effectiveness, Safety, and Indications of Insulin Analogues in Premixed Formulations for Adults With Type 2 Diabetes [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2008 Sep. (Comparative Effectiveness Reviews, No. 14.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Comparative Effectiveness, Safety, and Indications of Insulin Analogues in Premixed Formulations for Adults With Type 2 Diabetes

Comparative Effectiveness, Safety, and Indications of Insulin Analogues in Premixed Formulations for Adults With Type 2 Diabetes [Internet].

Show details


In response to Section 1013 of the Medicare Modernization Act, the AHRQ requested an evidence report to synthesize the evidence on the comparative effectiveness and safety of premixed insulin analogues and other antidiabetic agents. Our Evidence-based Practice Center (EPC) established a team and a work plan to develop the evidence report. The project consisted of formulating and refining the specific questions, performing a comprehensive literature search, summarizing the state of the literature, constructing evidence tables, synthesizing the evidence, and submitting the report for peer review.

Topic Development

The topic for this report was nominated in a public process. With input from technical experts, the Scientific Resource Center for the AHRQ Effective Health Care Program drafted the initial Key Questions and, after approval from the AHRQ, posted them to a public Web site. The public was invited to comment on these questions. After reviewing the public commentary, the Scientific Resource Center drafted final Key Questions and submitted them to the AHRQ for approval.

Search Strategy

We searched the following databases for primary studies during the stipulated periods of time: MEDLINE® (1966 to February 2008), EMBASE® (1974 to February 2008), the Cochrane Central Register of Controlled Trials (CENTRAL; 1966 to February 2008), and the Cumulative Index to Nursing & Allied Health Literature (CINAHL®; 1982 through February 2008). The electronic search was first conducted in August 2007 and then repeated in February 2008. We developed a search strategy for MEDLINE, accessed via PubMed, based on an analysis of the medical subject headings (MeSH) terms and text words of key articles identified a priori. Our search strategy combined terms for type 2 diabetes and premixed insulin analogues. The PubMed strategy formed the basis for the strategies developed for the other electronic databases (see Appendix A).

We hand-searched 13 journals that were most likely to publish articles on this topic (see Appendix B), scanning the table of contents of each issue for relevant citations from June through September 2007. We also reviewed the reference lists of included articles.

In addition, we received the following material from the Scientific Resource Center:

  • Medical reviews and labels of insulin aspart 70/30, insulin lispro 75/25, and insulin lispro 50/50 obtained from the Web site of the United States FDA.
  • Scientific Discussion sections of the European Public Assessment Report obtained from the Web site of the European Medicines Agency (EMEA).
  • Public registries of clinical trials, such as the Clinical Study Results Web site ( and (
  • Scientific information packets submitted by Eli Lilly and Company (Indianapolis, IN) and Sanofi-Aventis (Bridgewater, NJ). We requested, but did not receive, a scientific information packet from Novo Nordisk (Bagsværd, Denmark).

The search results were downloaded and imported into ProCite® version 5 (ISI ResearchSoft, Carlsbad, CA). We scanned for exact article duplicates, author/title duplicates, and title duplicates using the duplication check feature. From ProCite®, the articles were uploaded to SRS© 4.0 (TrialStat! Corporation, Ottawa, Ontario, Canada), a Web-based software package developed for systematic review data management. This database was used to track the search results at the title review, abstract review, article inclusion/exclusion, and data abstraction levels. A list of excluded articles is presented in Appendix C.

Study Selection

Study selection proceeded in two phases: title review and abstract review. Two independent reviewers conducted title scans in a parallel fashion. For a title to be eliminated at this level, both reviewers had to indicate that it was ineligible. If the two reviewers did not agree on the eligibility of an article, it was promoted to the next level (see Appendix D). The title review phase was designed to capture as many studies reporting on the efficacy or safety of premixed insulin analogues as possible. All titles that were thought to address efficacy, effectiveness, safety, or quality of life were promoted to the abstract review phase. Titles generally included terms related to type 2 diabetes or insulin treatment.

The abstract review phase was designed to identify studies comparing the effects of premixed insulin analogues and other antidiabetic agents on clinical outcomes, intermediate outcomes, safety and adverse events, quality of life, or adherence. While quality of life was not a specifically stated outcome in the Key Questions, we included these outcomes because they may indirectly affect patient adherence. Abstracts were reviewed independently by two investigators and were excluded if both investigators agreed that the article met one or more of the exclusion criteria (see inclusion and exclusion criteria listed in Table 2 and Appendix D). Differences in opinion regarding abstract inclusion or exclusion were resolved through consensus adjudication.

Table 2. Criteria for inclusion in the reviews.

Table 2

Criteria for inclusion in the reviews.

Full-text articles initially selected on the basis of abstract review underwent another independent parallel review by the investigators to determine whether they should be included in the full data abstraction (see Appendix D). Differences of opinion regarding article inclusion were resolved through consensus adjudication.

Data Abstraction

We used a systematic approach for extracting data to minimize the risk of bias in this process. By creating standardized forms for data extraction, we sought to maximize consistency in identifying all the pertinent data available for synthesis.

Each article underwent double review by the study investigators, at the levels of data abstraction and assessment of study quality. The second reviewer confirmed the first reviewer's data abstraction forms for completeness and accuracy. Reviewer pairs were formed to include personnel with both clinical and methodological expertise. Reviewers were not masked to the articles' authors, institution, or journal.19 In most instances, data were directly abstracted from the article. If possible, relevant data were also abstracted from figures. Differences in opinion were resolved through consensus adjudication. For assessments of study quality, each reviewer independently judged study quality and rated items on quality assessment forms (see Appendix D).

Reviewers extracted information on general study characteristics (e.g., study design, study period and followup, country, exclusion criteria), study participants (e.g., age, gender, race, weight/body mass index (BMI), A1c levels, duration of diabetes, and previous treatments), interventions (e.g., starting, mean, and range of doses, timing, and duration of use), outcome measures, and the results of each outcome, including measures of variability (see Appendix D).

All information from the article review process was entered into the SRS© 4.0 database by the individual completing the review. Reviewers entered comments into the system whenever applicable.

Quality Assessment

We developed a quality assessment tool for randomized controlled trials (RCTs) and nonrandomized studies based on the Jadad criteria20 and the Newcastle-Ottawa Scale;21 this tool was supplemented with additional questions as suggested by the Guide for Conducting Comparative Effectiveness Reviews.22 The quality of each study was assessed using the following criteria: (1) whether the study question was clearly stated; (2) whether the patients, providers, or outcome assessors were blinded; (3) what method was used to assess the primary outcome; (4) whether the followup was long enough for outcomes to occur; (5) the adequacy of the followup; (6) whether there was a description of those lost to followup; (7) whether the main conclusions were reflective of the results; (8) what funding source was identified; and (9) whether there was a statement of conflict of interest. In addition, RCTs were evaluated with regard to the appropriateness of their randomization scheme. Nonrandomized studies were also evaluated with regard to the selection of the comparison group, the ascertainment of exposure, the demonstration that the outcome of interest was not present at the start of the study, and the adjustment for key confounders. Reviewers rated the overall quality of each study as:

  • Good (low risk of bias). These studies had the least bias, and the results were considered valid. These studies adhered to the commonly held concepts of high quality, including the following: a formal randomized controlled design; a clear description of the population, setting, interventions, and comparison groups; appropriate measurement of outcomes; appropriate statistical and analytic methods and reporting; no reporting errors; a low dropout rate; and clear reporting of dropouts.
  • Fair. These studies were susceptible to some bias, but not enough to invalidate the results. They did not meet all the criteria required for a rating of good quality because they had some deficiencies, but no flaw was likely to cause major bias. The study may have been missing information, making it difficult to assess limitations and potential problems.
  • Poor (high risk of bias). These studies had significant flaws that imply biases of various types that might have invalidated the results. They had serious errors in design, analysis, or reporting; large amounts of missing information; or discrepancies in reporting.22


Throughout the report, we discuss the applicability of studies in terms of how well the study population was consistent with the general population of individuals with type 2 diabetes. We evaluated the applicability in terms of (1) the source population from which the subjects were enrolled; (2) the percentage of patients enrolled, as compared to those screened for the trial; (3) the percentage of patients excluded during a run-in period because of poor compliance, poor treatment response, or side-effects; (4) the similarity of the demographic characteristics of the study population to the general U.S. diabetic population;23 (5) the representativeness of the spectrum of illness severity to all stages of illness; (6) the degree to which the study reflected current clinical practice with regard to the intervention and monitoring; (7) the appropriateness of the comparator; (8) the extent and quality of the reporting on important clinical outcomes and adverse events; and (9) the similarity of the standards of care to that of the U.S.

Data Analysis and Synthesis

For each Key Question, we created a set of detailed evidence tables containing all the information extracted from eligible studies. We conducted meta-analyses for outcomes when there were sufficient data (two or more trials) and studies were homogenous with respect to key variables (population characteristics, study duration, and drug dose).

Data Synthesis for Intermediate Outcomes and Adverse Events

For intermediate outcomes and the adverse outcome of weight change, we recorded the mean difference between groups, along with its measure of dispersion. If this information was not reported, we calculated the point estimate using the mean difference from baseline for each group. If the mean difference from baseline was not reported, we calculated this value from the baseline and final values for each group. If no measure of dispersion was reported for the between-group difference, we calculated this value using the sum of the variances for the mean difference from baseline for each group. If there were no measures of dispersion for the mean difference from baseline for each group, we then calculated the variance using the standard deviation of the baseline and final values, assuming a correlation between baseline and final values of 0.5.24, 25 If data were only presented in graphical form, we abstracted data from the graphs. We pooled the results of the plasma and blood glucose levels from different studies, since blood glucose measurements accurately reflect plasma glucose levels.26

For the adverse outcome of hypoglycemia, we used two strategies to synthesize data: If a trial reported the incidence of hypoglycemia (number of patients who developed hypoglycemia), we calculated an odds ratio using the incidence of hypoglycemia in each study group. If a trial reported event rates in episodes per patient per 30 days, we calculated the rate ratio by dividing the event rate in the premixed insulin analogue arm by the event rate in the comparator arm. If a trial reported the number of episodes in each arm or study period or reported an event rate in a form other than episodes per patient per 30 days, we converted this information into episodes per patient per 30 days and used this event rate to calculate the rate ratio in the two arms or study periods of the trial.

Following a qualitative synthesis of the literature, we pooled the results of individual studies within each set of comparisons using Comprehensive Meta-Analysis (version 2.2.046) software. Because we found some clinical heterogeneity within the same-group comparisons, we decided to use a random-effects model. We chose a random-effects model because it assumes that the included studies differ from each other more than would be expected as a result of random error, and it incorporates between-study heterogeneity in pooling study results, thus giving a more conservative estimate of the confidence interval (CI) around the point estimate of the effect size. In contrast, a fixed-effect model assumes that studies differ from each other as a result of random error alone, and it gives narrower CIs around the point estimates of the effect size. Another advantage of using a random-effects model is that if there is no between-study heterogeneity and the studies differ from each other only as a result of random error, the CIs from random-effects and fixed-effects models are similar. Thus, the use of a random-effects model is not over-conservative in the absence of between-study heterogeneity. Although we measured the Q-statistic and I-square index, we did not use these statistics to choose the model for pooling data, since tests for homogeneity are known to have low power for detecting between-study heterogeneity.27, 28 Given the limited number of studies in each comparison group, we did not perform meta-regression to evaluate the effect of study variables on the outcomes. To evaluate excessive influence of a study on the results of meta-analysis, we conducted sensitivity analyses by excluding one study from the meta-analysis at a time and examining the change in the meta-analysis results. We assessed publication bias by visual inspection of the funnel plot and by statistical means using Begg's29, Egger's30 and trim-and-fill31 tests.

Data Synthesis for Clinical Outcomes

We included all studies that reported any information about the clinical outcomes identified in our key questions. If a study reported no cases of specific types of events, we still included the study in its respective section. We abstracted data on events for each arm, and all analyses followed the principle of intention-to-treat. First, we synthesized the data qualitatively. We then conducted meta-analyses when there were sufficient data (two or more studies) and the studies were homogenous with respect to key variables (patient populations, drug comparators such as using one of several accepted comparator drugs, outcome definitions, and study duration). For trials with more than one arm involving premixed insulin analogues, we combined these arms into a premixed insulin analogue group when appropriate. We believed that the results were similar enough for premixed insulin analogues and for “any other” active comparator to allow us to combine the data into these two groups, although we do discuss the studies qualitatively both in combination and separately. In the one study in which there were three arms (premixed insulin analogue versus rapid-acting insulin analogue versus long-acting insulin analogue),32 we chose what we thought was the most relevant comparison to include in the meta-analyses (premixed insulin analogue versus long-acting insulin analogue). Choosing the other comparator would not have markedly changed the results. We excluded crossover studies from the main meta-analyses, since these studies did not report whether events occurred prior to the first crossover, making it difficult to determine whether the event occurred as a result of the first or second drug given.

Pooled odds ratios and 95 percent CIs were calculated using a Mantel-Haenszel fixed-effects model (with a 0.1 continuity correction) for the main analysis.33, 34 We used a fixed-effects model because there is evidence to suggest that these methods are less biased with rare event data.35 We also calculated pooled odds ratios and 95 percent CIs using several other well-established methods as a sensitivity analysis, since experts disagree about the best meta-analytic technique for rare event data. These methods included Peto's method, the Mantel-Haenszel fixed-effects model (with a 0.5 and 0.01 continuity correction), and a Bayesian analysis.36 Heterogeneity among the trials in all the meta-analyses was tested with a standard chi-squared test, using a significance level of alpha less than or equal to 0.10. We also examined inconsistency among studies with an I2 statistic, which describes the variability in effect estimates that is due to heterogeneity rather than random chance.37 A value greater than 50 percent may be considered to represent substantial variability. We conducted sensitivity analyses by omitting one study at a time to assess the influence of any one study on the pooled estimate. A sensitivity analysis was conducted in which we included the crossover studies and the unpublished crossover data.

Because statistically significant findings are more likely to be published than are studies without statistically significant results (publication bias), we examined whether there was evidence that smaller, negative studies appeared to be missing from the literature. We therefore conducted formal tests for publication bias using Begg's29 and Eggers30 tests, including evaluation of the asymmetry of funnel plots.

Rationale for the Inclusion of Crossover Designs

We decided to include crossover trials because this study design allows comparison of interventions at the individual rather than the group level. In addition, the use of premixed insulin analogues over a short period of time was unlikely to produce long-lasting effects in study participants, and participants were unlikely to differ systematically between the initial phase of the study and subsequent phases.

However, we used data from crossover studies only for intermediate outcomes, namely A1c, fasting glucose, and postprandial glucose. We excluded crossover trials from the evaluation of outcomes that were either progressive, such as retinopathy, or irreversible, such as mortality. For the evaluation of A1c, we included only those crossover trials that had at least 12 weeks of followup. Crossover clinical trials with a shorter duration of followup were excluded from the analysis of A1c because treatment during an earlier phase, lead-in phase, or pretrial phase could affect the A1c level.26

We aimed to use within-individual comparisons data from crossover trials if the trials reported the data in such detail. If results were reported only for each intervention, we ignored the crossover design and used the reported estimates as if they came from a parallel trial. We understand that this is a conservative approach that ignores within-patient correlation and produces wider CIs. If a trial reported a carryover effect, we included only the data from the first period of the crossover trial on the grounds that this period is, in effect, a parallel group trial. Further sensitivity analyses were performed by pooling data without crossover studies and comparing the pooled results from parallel studies alone to the pooled results from combining both study designs.

Data Entry and Quality Control

After a second reviewer reviewed the data that had been entered into SRS© 4.0, adjudicated data were re-entered into Web-based data collection forms by the second reviewer. Second reviewers were generally more experienced members of the research team. If problems were recognized in a reviewer's data abstraction, the problems were discussed at a meeting with the reviewers. In addition, research assistants used a system of random data checks to assure data abstraction accuracy.

Rating the Body of Evidence

At the completion of our review, we graded the quantity, quality and consistency of the best available evidence addressing the Key Questions by adapting an evidence grading scheme recommended by the GRADE Working Group.38 We applied evidence grades to bodies of evidence on each type of intervention comparison for each major type of outcome. We assessed the strength of the study designs, with RCTs considered best, followed by non-RCTs and observational studies. To assess the quantity of evidence, we focused on the number of studies with the strongest design. We also assessed the quality and consistency of the best available evidence, including assessment of the limitations to individual study quality (using individual quality scores), certainty regarding the directness of the observed effects in the studies, the precision and strength of the findings, and the availability (or lack thereof) of data to answer the Key Question.

We classified evidence bodies pertaining to the Key Questions into three basic categories: (1) “high” grade (indicating confidence that further research is very unlikely to change our confidence in the estimated effect in the abstracted literature), (2) “moderate” grade (indicating that further research is likely to have an important impact on our confidence in the estimates of effects and may change the estimates in the abstracted literature), and (3) “low” grade (indicating further research is very likely to have an important impact on confidence in the estimates of effects and is likely to change the estimates in the abstracted literature). We graded the body of evidence as “no evidence” if there were no studies evaluating a drug comparison.

Peer Review and Public Commentary

A draft of the completed report was sent to the peer reviewers, the representatives of the AHRQ and the Scientific Resource Center. The draft report was posted to a Web site for public comment. In response to the comments of the peer reviewers and the public, revisions were made to the evidence report, and a summary of the comments and their disposition was submitted to the AHRQ.


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (3.9M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...