Internal Validity
In the included studies, randomization was conducted using a validated system. In some trials differences in characteristics were noted which may have occurred by chance or may be reflective of issues with the randomization implementation itself. For example in SUSTAIN-1, patients treated with higher dose SEM had shorter duration of the disease (mean of 3.65 years in the SEM 1 mg group compared with 4.85 years in the SEM 0.5 mg group). This implies that patients in higher dose group were more likely to achieve treatment goals, because they did not have as long-standing diabetes as those in the lower dose group, therefore the treatment effect of SEM 1 mg could be overestimated.
For trials using a double-blind design, appropriate methods were used to ensure allocation concealment. However, it is possible that patients and investigators could review and discuss changes in the A1C levels, body weight, and AEs, particularly some specific drug effects which are known to be associated with the administration of GLP-1 receptor agonists, such as gastrointestinal AEs. This may have allowed certain patients and/or investigators to surmise the assigned treatment, and subsequently may have an impact on patient-reported outcomes or AEs. For trials with open-label design, patients were aware of the treatment allocation, therefore the evaluation of patient-reported outcomes or AEs may also be affected by unblinded treatment regimen. The primary outcome variable in the majority of the included trials was change in A1C, which is an objective outcome measure. Even though it is unlikely that this had an important impact on the study results for the primary analysis, the clinicians could easily determine the A1C levels and may have adjusted medications or initiated rescue medications accordingly, thus the secondary end points and sensitivity analyses could have been affected.
In terms of the methods of statistical analysis, efficacy analyses were performed in FAS. Although a true ITT population was not used (patients were required to receive at least one dose of the study drug) in all trials except for SUSTAIN-6, it is less likely that this would have an impact on the study results due to the small number of patients that were excluded from FAS. For NI trial, acceptable margins (0.5% change in A1C in placebo-controlled trial, or 0.3% change in A1C in active-controlled trial) were used.
A hierarchical testing procedure was used to account for multiple comparisons among primary end point and the key secondary end points. Various approaches (e.g., model-based imputation and LOCF) were used to handle missing data in the included studies. The hierarchical sequence in the included studies was pre-specified and included clinically outcomes that were commonly accepted in diabetes trials. Outcomes outside of the testing hierarchy, such as occurrence of diabetes-related morbidity (e.g., individual component of MACE or expanded composite CV outcome), change in blood pressure, and HRQoL, need to be interpreted with caution due to the possible inflated type I error. Moreover, it is unclear how to interpret the diabetes-related morbidity end points as determination of the NI margin for these individual component analyses were not specified in the study protocol. Some important outcomes in diabetes trials were not included in the testing hierarchy, such as hypoglycemia.
The proportion of missing data were substantial (greater than 20%) and differential between SEM and the controls in most of the trials. MMRM was used in primary efficacy analysis, and it assumes that data are MAR. However, missing data can be associated with treatment discontinuation due to lack of efficacy (e.g., poor glycemic control) or intolerable adverse effects from the treatment, it would not be considered MAR, and may potentially impact the results of the trials in favour of SEM. In addition, due to the repeated measures design of MMRM, the potential impact of missing data during scheduled time points on the overall study results which were measured at the end of follow-up is uncertain and could have affected the study results. In the SUSTAIN trials, sensitivity analyses were conducted to assess the validity of MAR assumption and to evaluate the robustness of the conclusion of the primary analysis. Although these analyses showed similar results as the primary data analysis, these analyses cannot fully account for the impact of missing data. In a chronic progressive disease like diabetes where patients continue to lose glycemic control over time, the LOCF can introduce bias as it assumes patients remain stable for all subsequent time points which is rarely the case in the real world. Thus, although the trials outlined appropriately the various approaches of missing data handling, none are sufficient to overcome the missing data and may have introduced bias into the results.
The majority of the included trials evaluated the change from baseline in A1C as the primary outcome and were not designed to test for longer-term diabetes-related morbidity or mortality. There is one single cardiovascular safety trial (SUSTAIN-6) included in this review, and this two-year trial may address some of the questions of interest in clinical practice.
None of the included trials were powered to assess efficacy outcomes such as change in blood pressure or blood lipids, or for harm outcomes such as hypoglycemia.
In the SUSTAIN trials, a number of pre-defined subgroup analyses based on various patient’s baseline characteristics were conducted to examine the consistency of the primary analysis results across subgroup levels; however most of these, such as the subgroups based on renal function in SUSTAIN-5 or subgroups based on A1C level at baseline in SUSTAIN-6, were not included as a stratification variable at randomization. Thus, balance of patient’s baseline characteristics was unlikely to be maintained between such subgroups, and this could subsequently bias the results. In SUSTAIN-2, 3, and 4, subgroup analyses based on prior antidiabetic therapy were performed to explore treatment effect of the study drug in special subgroups. However, these were post hoc analyses and were exploratory in nature, therefore it is challenging in data interpretation due to insufficient power to detect a true difference between treatment groups, imbalance of patient’s baseline characteristics across the subgroups, complexity of testing interaction effect (P-values less than 0.05 for some of the interaction tests), or the inconsistency between statistical significance and clinical importance. Wider 95% CIs for the point estimate of between-group differences in efficacy outcomes were observed in several of these subgroups, which may be expected given the lack of power within the subgroup analyses. In addition, multiplicity and potential inflated type I error are concerns within the subgroups. As a result, the interpretation of subgroup analyses with respect to various outcomes is difficult. Moreover, in these NI trials, it is unclear how the specific subgroups effects should have been interpreted with respect to the NI margins which were defined for the overall population, or whether subgroup-specific margins should have been employed. Similarly, in SUSTAIN-6, it is questionable to interpret the results of individual component based on the NI margin used for the composite outcome, as the margin for individual component may not be the same as the composite outcome. In this trial, the observed CV benefits appeared to be mainly driven by non-fatal MI and non-fatal stroke, while the rate of CV death was similar between SEM and placebo. There is also a concern with multiplicity of testing in SUSTAIN-6.
Patients in SUSTAIN-2, 3, and 4 had various background antidiabetic therapies before entering the trials. Therefore, SEM was used as second-line or third-line therapy. Given that patients who are controlled on two prior OADs represent a more advanced diabetes population (longer history of disease, more comorbidities or diabetes-related complications, inadequate response to previous treatment, etc.) than those controlled on one OAD, it is questionable to mix these patients in one analysis. This also affects the generalizability of the study findings.
For most of the trials (e.g., SUSTAIN-2, 3, 4, and 6), safety outcomes were reported in the full population where semaglutide can be used as first-line, second-line or third-line therapy, and subgroup data were not available. It is difficult to examine the safety outcome according to patient’s prior background antidiabetic therapy.
In general, diet and exercise are a part of the standard care of patients with T2DM. In SUSTAIN-2 to 7, it was unknown whether “diet and exercise” was background therapy. Therefore, there could be validity issues related to this and likely decreases generalizability of the study results.
External Validity
SEM was not used as a first-line therapy in the vast majority of the included trials. SUSTAIN-1 recruited patients who were drug naive, therefore did not provide evidence to support the manufacturer’s requested reimbursement criteria of “for the treatment of adult patients with T2DM to improve glycemic control, in combination with MET (second-line treatment) and in combination with MET + SU (third-line treatment)”. All included trials except for the Japanese study were multinational. Although few Canadian patients were enrolled, the consulted clinical expert indicates that the study results are generalizable to Canadian population, according to the selection criteria and the patient’s baseline characteristics including diabetes-related characteristics.
The treatment duration of the included trials (ranging from 30 weeks to 56 weeks) was considered adequate in terms of assessing efficacy of semaglutide treatment versus comparators on change in A1C, the safety, tolerability, and patient satisfaction, but may not be sufficient for assessment of sustainability of A1C change or CV outcomes. The CVOT (SUSTAIN-6) had a treatment duration of 104 weeks; therefore, it would be able to provide some evidence on longer-term effect of the study drug.
The included trials provide direct evidence for the comparisons between semaglutide and other GLP-1 receptor agonists, DPP-4 inhibitors, or insulin glargine. There is a lack of direct evidence on the comparisons between semaglutide and other currently available active treatment, such as SGLT2 inhibitors.