Table 24Summary of Evidence for all Key Questions

Key QuestionNumber and Quality of StudiesLimitations/ConsistencyApplicabilitySummary of Findings
KQ1. What are the sensitivities, specificities, reliabilities, and yields of current screening tests for GDM? (A) After 24 weeks' gestation? (B) During the first trimester and up to 24 weeks' gestation?A) After 24 wk gestation
51 prospective studies
Fair to good quality
Limitations: Lack of an agreed upon gold standard for diagnosis of GDM creates challenges for assessing the accuracy of tests and comparing across studies. GDM was confirmed using criteria developed by CC, ADA, NDDG, and WHO. There were sparse data comparing overall approaches for diagnosis and screening, e.g., one-step vs. two-step, selective vs. universal.

Consistency: Across studies, numerous tests and thresholds were examined. Screening tests included the 50 g OGCT, FPG risk factor-based screening, and other less common tests such as HbA1c, serum fructosamine.
Prevalence of GDM varied across studies and diagnostic criteria used. Results need to be interpreted in the context of prevalence.
Comparisons involving WHO criteria are less applicable to the North American setting because these criteria are not used in North America.
  • Prevalence varied across studies and diagnostic criteria: ADA 2000-2010 (75 g) 2.0 to 19% (range), CC 3.6 to 38%, NDDG 1.4 to 50%, WHO 2 to 24.5%.
  • 9 studies examined a 50 g OGCT with a cutoff value of ≥140 mg/dL; GDM was confirmed using CC criteria. Results: sensitivity 85%, specificity 86%, prevalence 3.8 to 31.9%, PPV 18 to 27% (prevalence <10), PPV 32 to 83% (prevalence ≥10), NPV median 98%.
  • 6 studies examined a 50 g OGCT (≥130 mg/dL); GDM was confirmed using CC criteria. Results: sensitivity 99%, specificity 77%, prevalence 4.3 to 29.5%, PPV 11 to 31% (prevalence <10), PPV 31 to 62% (prevalence ≥10), NPV median 100%.
  • 1 study examined a 50 g OGCT (≥200 mg/dL); GDM was confirmed using CC criteria. Sensitivity, specificity, PPV, and NPV were all 100%. Prevalence was 6.4%.
  • 7 studies examined a 50 g OGCT (≥140 mg/dL); GDM was confirmed using NDDG criteria. Results: sensitivity 85%, specificity 83%, prevalence 1.4 to 45.8%, PPV 12 to 39% (prevalence <10), PPV 57% (prevalence ≥10), NPV median 99%.
  • 3 studies examined a 50 g OGCT (≥130 mg/dL); GDM was confirmed using NDDG criteria. Results: sensitivity 67 to 90% (range), specificity 47 to 84%, prevalence 16.7 to 35.3%, PPV 20 to 75%, NPV 86 to 95%.
  • 3 studies examined a 50 g OGCT (different thresholds); GDM was confirmed using ADA 2000-2010 (75 g) criteria. Prevalence was 1.6 to 4.1 (range). Results: sensitivity 86 to 97% (range), specificity 79 to 87%, PPV 7 to 20%, NPV 99 to 100%.
  • 3 studies examined a 50 g OGCT (≥140 mg/dL); GDM was confirmed using WHO criteria. Results: sensitivity 43 to 85%, specificity 73 to 94%, prevalence 3.7 to 15.7%, PPV 18 to 20% (prevalence <10), PPV 58% (prevalence ≥10), NPV median 99%.
  • 7 studies examined FPG at different thresholds; GDM was confirmed using CC criteria. Results: at ≥85 mg/dL sensitivity 87%, specificity 52%; at ≥90 mg/dL sensitivity 77%, specificity 76%; at ≥92 mg/dL sensitivity 76%, specificity 92%; at ≥95 mg/dL sensitivity 54%, specificity 93%. At ≥85 mg/dL prevalence 1.4 to 34.53 (range). PPV 10% (prevalence <10) and 23 to 59% (prevalence ≥10). Median NPV 93%.
  • 8 studies examined risk factor-based screening but were not pooled. Studies used different criteria to confirm GDM. Results: sensitivity 48 to 95% (range), specificity 22 to 94%, prevalence 1.7 to 16.9%, PPV 5 to 19% (prevalence <10), PPV 20% (prevalence ≥10), NPV median 99%.
  • 1 study compared IADPSG vs. ADIPS 2 step (reference) to diagnose GDM. Results: sensitivity 82%, specificity 94%, prevalence 13.0%, PPV 61%, NPV 98%.
  • 4 studies compared 75 g and 100 g load tests to diagnose GDM. Prevalence ranged from 1.4 to 50%. Results were not pooled: sensitivity 18 to 100%, specificity 86 to 100%, PPV 12 to 100%, NPV 62 to 100%.
(B) During the first trimester and up to 24 wk gestation
3 prospective cohort studies
Limitations: Only 3 studies of women before 24 wks gestation; therefore, no conclusions can be made for test characteristics in early pregnancy.

Consistency: Not applicable (not enough studies addressing the same question to judge consistency).
Evidence too limited to judge applicability.
  • 1 study examined the 50 g OGCT at 10 wks and confirmed GDM using JSOG criteria (75 g). Results: sensitivity 88%, specificity 79%, prevalence 1.6%, PPV 7%, NPV 100%.
  • 1 study examined 50 g OGCT at 20 wks and confirmed GDM using ADA (2000-2010) 100 g criteria. Results: sensitivity 56%, specificity 94%, prevalence 3.6%, PPV 24%, NPV 98%.
  • 1 study compared 1st and 2nd trimester results using 3 screening tests (OGCT at ≥130 mg/dL, FPG, HbA1c); GDM confirmed using JSOG criteria. Results (OGCT) 1st trimester: prevalence 1.9%, sensitivity 93%, specificity 77%, PPV 7.1, NPV 99%; 2nd trimester: prevalence 2.9%, sensitivity 100%, specificity 85%, PPV 17%, NPV 100%.
KQ2: What is the direct evidence on the benefits and harms of screening women (before and after 24 weeks' gestation) for GDM to reduce maternal, fetal, and infant morbidity and mortality?2 retrospective cohort studies
Fair and good quality
Limitations: No RCTs available to answer this question.

Consistency: Not applicable (not enough studies addressing the same question to judge consistency).
The comparison for this question was women who had and had not undergone screening. Since screening is now commonplace it may be unlikely to identify studies or cohorts where this comparison is feasible.
  • 1 study (n=1,000) showed more cesarean deliveries in the screened group. A second study (n=93) found the incidence of macrosomia (≥4.3 kg) was the same in screened and unscreened groups (7% each group).
  • Based on the small number of studies and sample sizes, the effect of screening women for GDM on health outcomes is inconclusive.
KQ3: In the absence of treatment, how do health outcomes of mothers who meet various criteria for GDM and their offspring compare to those who do not meet the various criteria?38 prospective or retrospective cohort studies; 2 studies were long-term followup from RCTs; however, only data from the untreated patients were included.
Fair to good quality
Limitations: Strength of evidence was low to insufficient for all graded outcomes due to risk of bias (all observational studies), inconsistency, and/or imprecision. For many comparisons, the numbers of studies, participants, and/or events was low; therefore, findings of no statistically significant differences between groups do not imply equivalence or rule out potential differences.

Consistency: A wide variety of diagnostic criteria and thresholds were compared across studies. There were often few studies with similar comparison groups. Differences in defining and assessing outcomes may have contributed to heterogeneity in results across studies (e.g., biochemical vs. clinical assessment of neonatal hypoglycemia).
All studies or groups included for analysis involved women who had not received treatment for GDM. These women may differ from the general population in other ways that are related to the reasons that they did not seek or receive early prenatal care (e.g., socioeconomic status).Maternal outcomes:
  • A methodologically strong study showed a continuous positive relationship between increasing glucose levels and the incidence of primary cesarean section. This study also found significantly fewer cases of preeclampsia and cesarean section for women with no GDM vs. IADPSG.
  • For preeclampsia, significant differences were found for CC vs. patients with no GDM (3 studies), with fewer cases among the patients with no GDM, and for CC vs. false-positive groups (2 studies), with fewer cases among the false positives. The strength of evidence was low. No differences were found for NDDG false positive (2 studies), NDDG 1 abnormal OGTT vs. no GDM (1 study), or IGT WHO vs. no GDM (3 studies); the strength of evidence was insufficient.
  • For maternal weight gain, significant differences were found for 3 of 12 comparisons: IADPSG IGT vs. no GDM (favored IGT), IADPSG IFG vs. no GDM (favored IFG), IADPSG IGT-2 vs. no GDM (favored IGT-2). All comparisons were based on single studies (strength of evidence insufficient).
Fetal/neonatal/child outcomes:
  • 2 methodologically strong studies showed a continuous positive relationship between increasing glucose levels and the incidence of macrosomia. 1 of these studies also showed significantly fewer cases of shoulder dystocia and/or birth injury, clinical neonatal hypoglycemia, and hyperbilirubinemia for women with no GDM vs. IADPSG.
  • For macrosomia >4,000 g, 6 of 11 comparisons showed a significant difference: patient groups with no GDM had fewer cases compared with CC GDM (10 studies), CC 1 abnormal OGTT (7 studies), NDDG GDM (unrecognized) (1 study), NDDG false positives (4 studies), and WHO IGT (1 study). Fewer cases were found for women with false-positive results compared with CC GDM (5 studies). Data for macrosomia >4,500 g were available for 4 comparisons and showed significant differences in 2 cases: patient groups with no GDM had fewer cases compared with CC GDM (3 studies) and unrecognized NDDG GDM (1 study). The strength of evidence for macrosomia was low to insufficient.
  • For shoulder dystocia, significant differences were found for 7 of 17 comparisons; all comparisons but 1 were based on single studies (insufficient strength of evidence). Patient groups with no GDM showed lower incidence of shoulder dystocia when compared with CC GDM (5 studies, low strength of evidence), NDDG GDM (unrecognized), NDDG false positive, WHO IGT, IADPSG IFG, and IADPSG IGT IFG. The other significant difference showed lower incidence among the false-positive group compared with CC 1 abnormal OGTT.
  • For fetal birth trauma/injury, single studies compared CC GDM and WHO IGT with no GDM and showed no differences. Two studies showed fewer cases for no GDM compared with NDDG GDM. Strength of evidence was insufficient for all comparisons.
  • No differences were found for neonatal hypoglycemia for any comparison, including CC GDM vs. no GDM (3 studies), CC GDM vs. 1 abnormal OGTT (1 study), CC 1 abnormal OGTT vs. no GDM (4 studies), NDDG GDM vs. no GDM (1 study), NDDG false positive vs. no GDM (1 study), and WHO IGT vs. no GDM (3 studies). Strength of evidence was insufficient for all comparisons.
KQ4: Does treatment modify the health outcomes of mothers who meet various criteria for GDM and offspring?5 RCTs and 6 retrospective cohort studies.
Poor to good quality
Limitations: For some outcomes, particularly the long-term outcomes, the strength of evidence was insufficient or low. Moreover, for some outcomes events were rare and the studies may not have had the power to detect clinically important differences between groups; therefore, findings of no significant difference should not be interpreted as equivalence between groups.

Consistency: Some inconsistency occurred at 2 levels. First, there were inconsistencies for some outcomes between RCTs and observational studies which may be attributable to confounding and methods of selecting study groups (e.g.,historical control groups). Second, in some instances there were inconsistencies across studies within designs that were often attributable to the manner in which outcomes were defined or assessed (e.g., clinical vs. biochemical assessment of neonatal hypoglycemia).
For the most part, study populations included women whose glucose intolerance was less marked, as those whose glucose intolerance was more pronounced would not be entered into a trial in which they may be assigned to a group receiving no treatment. The majority of studies were conducted in North America or Australia, with 2 from Italy. Most of the North American studies were inclusive of mixed racial populations and are likely applicable to the general U.S. population. Even though the Australian RCT population had more white women with a lower BMI than the U.S. RCT; this should not affect applicability of most of their findings for the U.S. women because these subject characteristics would be factors associated with lower risk of poor outcomes.Maternal outcomes:
  • Moderate evidence from 3 RCTs showed a significant difference for preeclampsia, with fewer cases in the treated group.
  • There was inconsistency across studies in terms of maternal weight gain (4 RCTs and 2 cohort studies); the strength of evidence was insufficient due to inconsistency and imprecision in effect estimates.
Offspring outcomes:
  • There was insufficient evidence to make a conclusion for birth injury. There was inconsistency across studies with the 2 RCTs showing no difference and the 1 cohort study showing a difference in favor of the treated group. The low number of events and participants across all studies resulted in imprecise estimates.
  • Moderate evidence showed significantly lower incidence of shoulder dystocia in the treated groups, and this finding was consistent for the 3 RCTs and 4 cohort studies.
  • There was low evidence of no difference between groups for neonatal hypoglycemia based on 4 RCTs and 2 cohort studies.
  • For outcomes related to birthweight (including macrosomia >4,000 g, macrosomia >4,500 g, actual birthweight, and large for gestational age), differences were often observed favoring the treated groups. Strength of evidence was moderate for macrosomia >4,000 g.
  • 1 RCT followed patients for 7 to 11 years and found no differences for impaired glucose tolerance or type 2 DM, although the strength of evidence was considered insufficient.
  • No differences were observed in single studies that assessed BMI >95 (7-11 year followup) and BMI >85 percentile (5-7 year followup). Overall, pooled results showed no difference in BMI, and the strength of evidence was considered low.
KQ5: What are the harms of treating GDM and do they vary by diagnostic approach?4 RCTs and 1 retrospective cohort study.
Fair to good quality
Limitations: No study evaluated costs and resource allocation. Limited evidence on harms. Limited evidence for number of prenatal visits and NICU admissions. Findings of no significant differences may be attributable to low power and should not be interpreted as equivalence.

Consistency: Not applicable (not enough studies addressing the same question to judge).
As above for KQ4. In addition, differences in billing structures between the United States and Australia may have accounted for the discrepant findings with respect to NICU admissions between these studies and as a result limit the applicability of this finding in the United States.
  • 1 RCT assessed depression and anxiety at 6 weeks after study entry and 3 months postpartum.
  • There was no significant difference between groups in anxiety at either time point, although there were significantly lower rates of depression in the treatment group at 3 months postpartum.
  • 4 RCTs reported small for gestational age and found no significant difference.3 RCTs and 1 cohort study provided data on admission to NICU and showed no significant differences overall. One trial was an outlier because it showed a significant difference favoring the no treatment group. This difference may be attributable to site-specific policies and procedures.
  • 2 RCTs reported on the number of sprenatal visits and generally found more visits among the treatment groups.
  • 2 RCTs reporting on induction of labor showed different results, with 1 showing a significant difference with more cases in the treatment group and the other showing no difference.
  • Based on studies included in KQ4, no differences between groups were found for cesarean section (5 RCTs, 6 cohorts) or unplanned cesarean section (1 RCT, 1 cohort).

ADA = American Diabetes Association; ADIPS = Australasian Diabetes in Pregnancy Society; BMI = body mass index; CC = Carpenter-Coustan; DM = diabetes mellitus; FPG = fasting plasma glucose; GDM = gestational diabetes mellitus; HbA1c = glycated hemoglobin; IADPSG = International Association of Diabetes in Pregnancy Study Groups; IFG = impaired fasting glucose; IGT = impaired glucose tolerance; IGT-2 = double impaired glucose intolerance; JSOG = Japan Society of Obstetrics and Gynecology; KQ = Key Question; NDDG = National Diabetes Data Group; NPV = negative predictive value; NICU = neonatal intensive care unit; OGCT = oral glucose challenge test; OGTT = oral glucose tolerance test; PPV = positive predictive value; RCT = randomized controlled trial; wk(s) = week(s); WHO = World Health Organization

From: Discussion

Cover of Screening and Diagnosing Gestational Diabetes Mellitus
Screening and Diagnosing Gestational Diabetes Mellitus.
Evidence Reports/Technology Assessments, No. 210.
Hartling L, Dryden DM, Guthrie A, et al.

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.