NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Hartling L, Dryden DM, Guthrie A, et al. Screening and Diagnosing Gestational Diabetes Mellitus. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Oct. (Evidence Reports/Technology Assessments, No. 210.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Screening and Diagnosing Gestational Diabetes Mellitus

Screening and Diagnosing Gestational Diabetes Mellitus.

Show details

Results

This chapter reports on the results of our literature review and synthesis. First, we describe the results of our literature search and selection process. Description of the characteristics and methodological quality of the studies follow. We present our analysis of the study results by Key Question. Metagraphs and tables reporting the strength of evidence for key outcomes are available within each applicable section. Within each metagraph, the studies that provided data are indexed by the name of the first author. A list of abbreviations is provided at the end of the report.

Several appendixes provide supporting information to the findings presented in this section. Appendix C provides the quality assessment ratings by domain for each study. Appendix D contains detailed evidence tables describing the study, characteristics of the population, screening criteria or diagnostic tests used, details of treatment (where relevant), and outcomes. A list of citations for the excluded and unobtained studies is available in Appendix E. Appendixes are available at the Agency for Healthcare Research and Quality (AHRQ) Web site www.effectivehealthcare.ahrq.gov/reports/final.cfm.

Results of Literature Searches

The search strategy identified 14,398 citations from electronic databases. Screening based on titles and abstracts identified 598 potentially relevant studies. We identified 30 additional studies by hand searching the reference lists from included studies. Using the detailed selection criteria, 151 studies met the inclusion criteria and 469 were excluded. Of the 151 studies, 26 were identified as companion publications and 125 were unique studies (Figure 3). Of the 125 unique studies, 28 were further excluded during data extraction due to a lack of comparison or outcome of interest, leaving the total number of included studies at 97.

Figure 3 is a flow diagram that outlines the study retrieval and selection process. It consists of boxes with text that describe the study selection process from the total number of citations retrieved from the literature searches to the number of studies that satisfied the inclusion criteria of the report. The figure is described further on page 19 of the main report as follows: “The search strategy identified 14,398 citations from electronic databases. Screening based on titles and abstracts identified 598 potentially relevant studies. We identified 30 additional studies by hand searching the reference lists from included studies. Using the detailed selection criteria, 151 studies met the inclusion criteria and 469 were excluded. Of the 151 studies, 26 were identified as companion publications and 125 were unique studies. Of the 125 unique studies, 28 were further excluded during data extraction due to a lack of comparison or outcome of interest, leaving the total number of included studies at 97. The most frequent reasons for exclusion were: ineligible comparator (studies comparing two or more treatments but lacking a control group); ineligible publication type (abstracts, conference proceedings, studies published prior to 1995); ineligible study design (studies other than RCT's, NRCT's, PCS, and RCS); study did not report pre-specified outcomes of interest (lacking test properties for Key Question 1, specified outcomes for Key Questions 3,4,5 including harms of screening or treatment); duplicate publication; intervention not of interest (studies without evaluation of screening tests or criteria, or treatments for GDM); and population not of interest (if > 20 percent of pregnant women had known pre-existing diabetes without subgroup analysis).”

Figure 3

Flow diagram of study retrieval and selection. * Five studies addressed more than one Key Question, therefore the sum of studies addressing the Key Questions exceeds the total number of studies

The most frequent reasons for exclusion were: (1) ineligible comparator (studies comparing two or more treatments but lacking a control group; n = 227); (2) ineligible publication type (abstracts, conference proceedings, studies published prior to 1995; n = 106); (3) ineligible study design (studies other than randomized controlled trials [RCTs], nonrandomized controlled trials [NRCTs], prospective cohort studies, and retrospective cohort studies; n = 11); (4) study did not report prespecified outcomes of interest (lacking test properties for Key Question 1, specified outcomes for Key Questions 3,4, and 5 including harms of screening or treatment; n = 34); (5) duplicate publication (n = 10); (6) intervention not of interest (studies without evaluation of screening tests or criteria, or treatments for gestational diabetes mellitus [GDM]; n = 12); and (7) population was not of interest (if >20 percent of pregnant women enrolled in study had known pre-existing diabetes without subgroup analysis; n = 15). In addition, for Key Question 1 only prospective studies were eligible for inclusion; 54 retrospective cohort studies were excluded. A complete list of excluded studies and reasons for exclusion is provided in Appendix E.

Description of Included Studies

A total of 97 studies met the eligibility criteria for this review, including 6 RCTs, 63 prospective cohort studies, and 28 retrospective cohort studies. The studies were published between 1995 and 2012 (median 2004). Studies were conducted in the United States (24 percent), Europe (23 percent), Asia (22 percent), the Middle East (20 percent), Australia (4 percent), Central and South America (3 percent), and Canada (4 percent). The source of funding for the included studies was academic (23 studies, 24 percent), foundation or organization (17 studies, 18 percent), government (14 studies, 14 percent), “other” (such as the WHO, or non-governmental organization; 8 studies, 10 percent), and industry (10 studies, 10 percent). Twenty-two studies presented more than one source of funding. Two studies reported no external source of funding (2 percent), and 46 studies (47 percent) did not describe a source of funding.

Forty-eight studies (50 percent) analyzed women tested for GDM between 24-28 weeks, with a OGCT taking place first and the OGTT following within 7 days.50,62-108 Thirty-one studies (32 percent) did not specify when screening or diagnostic procedures took place.54,109-137 Of the 31 studies, one scheduled testing between 24 and 28 weeks, with different undefined test points if clinically warranted.138 Eighteen studies (18 percent) screened or tested within unique time ranges.133,139-155 Of these, one study screened participants with a OGCT at 21-23 weeks followed by a diagnostic OGTT at 24-28 weeks;140 another screened a group of participants after 37 weeks;146 one study screened before 24 weeks; 143 and one study screened women at risk between 14-16 weeks with normal women screened at the usual 24-28 weeks.148 Remaining studies generally provided broader screening times ranging from 21-32 weeks gestation.139,142,144,145,150-152 Studies employing WHO criteria generally screened further into gestation as only an OGTT was performed: one study screened at 28-32 weeks,149 one study between 26-30 weeks,155 another between 25-30 weeks,154 and another study screened women at high risk at 18-20 weeks and others at 28-30 weeks.147 One study using WHO criteria did not specify the time of testing.133

The number of women enrolled in each study ranged from 32143 to 23,3163 (median 750). The mean age of study participants was 30 years. The mean age was consistent among most studies, although women of slightly younger mean age (23-28 years) were enrolled studies originating from countries outside North America (India, Turkey, Hong Kong, United Arab Emirates).113,114,144,156

When duration of followup was reported, it was often described as “until birth” or “to delivery.”62,73,84,95,114,120,146,152 One study reported followup extending from the first prenatal visit (<13 weeks) until a OGCT (26-29 weeks),139 one study within the first trimester until 24-28 weeks gestation,101 and another began at first antenatal booking which ranged from first trimester through to the third in women who were present for antenatal care in late gestation.157 One study followed women for 3 months postpartum;83 and two studies provided longer-term followup extending to 5-7 years132 and 7-11 years, respectively.96 Remaining studies did not provide specific details on duration of followup.

Methodological Quality of Included Studies

The methodological quality of each study was assessed by two independent reviewers. Our approach to assessing study quality is described in the methods section. The consensus ratings for each study and domains are presented in Appendix C, Tables C1, C2, and C3. Studies were assessed using different tools depending on the Key Question and study design: for Key Question 1, QUADAS-2 was used; for Key Questions 2 to 5, the Cochrane Risk of Bias tool was used for RCTs and the Newcastle Ottawa Scale was used for cohort studies. The methodological quality of studies is described in detail within the results section for each Key Question.

Key Question 1. What are the sensitivities, specificities, reliabilities, and yields of current screening tests for GDM?

GDM is diagnosed by having one or several glucose values at or above set glucose thresholds following an OGTT administered in the fasting state during pregnancy. Variations in glucose dose, time intervals of glucose measurements, and diagnostic glucose threshold values exist (Table 1). The most commonly used screening practice is a 50 g OGCT without regard to timing of last meal; plasma glucose is measured 1-hour after the glucose challenge. This was first proposed by O'Sullivan and Mahan158 and has been modified over the years. There are two different glucose threshold values commonly used for this screen in North America: ≥140 mg/dL (≥7.8 mmol/L) and ≥130 mg/dL (≥7.2 mmol/L). Clinical and historical risk factors and fasting plasma glucose (FPG) are two other screening practices included in this current review.

Two related issues make it difficult to organize and analyze the studies that address Key Question 1. First, there are several screening options (e.g., risk factor-based, universal), and several techniques (e.g., glucola-based, fasting, postprandial). In addition, there is no ‘gold standard’ for diagnosing GDM. There are five different, but commonly used, glucose-based diagnostic measures that overlap in the criteria they use.

We grouped studies according to the comparator OGTT diagnosis practices that were used, specifically glucose load, time intervals, and threshold values. These groupings include: 3-hour, 100 g OGTT using Carpenter and Coustan (CC) criteria; 3-hour, 100 g OGTT using National Diabetes Data Group (NDDG) criteria; 2-hour, 75 g OGTT using American Diabetes Association (ADA) (2000-2010) criteria, and, 2-hour, 75 g OGTT using WHO criteria (Table 1). We present results of screening tests based on these groupings that included women who underwent the 50 g OGCT screen (further subdivided by screening threshold ≥140 mg/dL and ≥130 mg/dL), fasting plasma glucose (FPG), clinical and historical risk factors, and other screening criteria. This is followed by a section on studies that compared early and late screening practices. The final section summarizes the evidence comparing different glucose loads for the OGTT diagnostic tests. Forest plots present 2×2 data, sensitivity and specificity; summary tables present prevalence, positive and negative predictive values (PPV, NPV), and accuracy for individual studies.

Description of Included Studies

There were 51 studies (reported in 52 papers) that met the inclusion criteria for Key Question 1.62-77,91,99-101,104,105,107-115,117-121,123-127,129,138-140,142-144,151,153,157 Two papers from the Tri-Hospital group142 are included as they report on results for different screening practices.159,160 Studies were conducted in a wide range of regions: 11 in North America,64,69,72,104,105,121,123,126,127,142,143 10 in Europe,62,65,66,68,108,115,119,125,151,153 12 in Asia,70,73,101,107,111,114,118,128,129,139,140,157 15 in the Middle East,67,71,74-77,99,100,109,110,112,113,117,138,144 2 in South America,63,120 and 1 in Australia.124 All studies were prospective cohort studies. A summary table of the study and patient characteristics of the individual studies can be found in Appendix D.

The prevalence of GDM varied across studies. The variability is due to differences in study setting (i.e., country), screening practices (e.g., universal vs. selective), and/or population characteristics (e.g., race/ethnicity, age, body mass index [BMI], parity). The range of GDM prevalence for each diagnostic criteria is as follows: CC/ADA (2000-2010) (100 g) 3.6 to 38.0 percent; National Diabetes Data Group (NDDG) 1.4 to 50.0 percent, ADA (2000-2010) (75 g) from 2.0 to 19.0 percent, and WHO from 1.7 to 24.5 percent. Prevalence results for individual studies are presented in the following sections.

Methodological Quality of Included Studies

We used the QUADAS-2 tool to assess the quality of the studies included in this review. The tool comprises four key domains that discuss patient selection, index test, reference standard, and flow of patients through the study and the timing of the index tests and reference standard (flow and timing). The first part of QUADAS-2 concerns bias; the second part considers applicability or concerns that the study does not match the review question. Figure 4 summarizes the assessments for risk of bias and Figure 5 summarizes assessments of applicability. Detailed assessments for each study are presented in Appendix C1.

Figure 4 is a bar graph that reports the assessment of risk of bias using the QUADAS-2 tool to evaluate methodological quality of studies of screening tests for gestational diabetes. 44 prospective cohort studies provided data for Key Question 1. The risk of bias dimension comprises of four domains: flow and timing, reference standard, index test, and patient selection. These are listed along the Y-axis. The percentage of studies that were rated as ‘low,’ high,’ or ‘unclear’ are listed along the X-axis at intervals of twenty percentage points. This figure is further described in “Methodological Quality of Included Studies”, a summary description is as follows: “The domain of flow and timing was assessed as low risk of bias for 57 percent of the studies. The domain of the index test was generally rated as low risk that the conduct or interpretation of the index test introduced bias (55 percent). The domain of the reference standard was generally rated as unclear risk that the conduct or interpretation of the reference standard introduced bias (52 percent). The domain of patient selection was rated as low risk that the selection of patients introduced bias for 30 percent of the studies, and 55 percent of studies were rated as unclear due to inadequate description.”

Figure 4

QUADAS-2 assessment of risk of bias by domain.

Figure 5 is a bar graph that reports the assessment of applicability using the QUADAS-2 tool to evaluate the methodological quality of studies of screening tests for gestational diabetes. The concern regarding applicability comprises three domains: reference standard, index test, and patient selection. These are listed along the Y-axis. The percentage of studies that were rated as ‘low,’ high,’ or ‘unclear’ are listed along the X-axis at intervals of twenty percentage points. This figure is further described in “Methodological Quality of Included Studies”, a summary description is as follows: “Concern about applicability of the reference standard was assessed as low (84 percent), and concern about applicability of the index test was assessed as also low (77 percent). Overall, 32 percent of studies were assessed as having high concerns about applicability for the patient selection domain.”

Figure 5

QUADAS-2 assessment of applicability by domain.

The domain of patient selection was rated as low risk that the selection of patients introduced bias for 53 percent of the studies. These studies were prospective cohort studies, most enrolled a consecutive sample of patients, and most avoided inappropriate exclusions. However, 25 percent of studies were rated as unclear due to inadequate description. Overall, 55 percent of studies were assessed as having high concerns about applicability for this domain. This was primarily because these studies were conducted in developing countries and used the WHO criteria to diagnose GDM. The results of these studies may not be directly relevant to the population in the United States.

The domain of the index test was generally rated as low risk that the conduct or interpretation of the index test introduced bias (53 percent). For most studies, the screening test (i.e., the index test) was conducted before the reference standard, and the threshold for the screening test was pre-specified. Concern about applicability was assessed as low (82 percent).

The domain of the reference standard (i.e., the criteria used to confirm a diagnosis of GDM) was generally rated as unclear risk that the conduct or interpretation of the reference standard introduced bias (63 percent). For most studies the result of the screening test was used to determine whether patients underwent further testing for GDM. Concern about applicability was assessed as low (86 percent).

The domain of flow and timing was assessed as low risk of bias for 39 percent of the studies. For most studies, the interval between the index test and reference standard was appropriate according to the criteria used in the study. Most patients received the reference standard, and received the same reference standard. However, in 35 percent of studies not all patients received a confirmatory reference standard if the screening test was below a certain threshold. These were assessed as unclear risk of bias.

Key Points

  • Comparisons between screening tests and diagnostic thresholds were difficult because of the variety of different populations and different tests that were studied.
  • Prevalence of GDM varied across studies and the diagnostic criteria used. The range of prevalence was: CC 3.6 to 38.0 percent; NDDG 1.4 to 50.0 percent; ADA (75 g) 2.0 to 19.0 percent; and WHO 1.7 to 24.5 percent.
  • The 50 g OGCT with the 130 mg/dL cutpoint has higher sensitivity when compared with the 140 mg/dL cutpoint, however, specificity is lower (6 studies). Both thresholds have high NPV but variable PPV across a range of GDM prevalence.
  • The use of a high cutoff for a diagnosis of GDM on an OGCT is supported by one study that assessed a 50 g OGCT (≥200 mg/dL) with GDM confirmed using the CC criteria. Sensitivity, specificity, PPV and NPV were all 100 percent.
  • Fasting plasma glucose at a threshold of ≥85 mg/dL has similar sensitivity to 50 g OGCT; specificity is lower (4 studies).
  • There were sparse data to assess screening and diagnostic tests for GDM less than 24 weeks' gestation.
  • Four studies compared a 75 g load with a 100 g load (reference standard) to diagnose GDM. The prevalence of GDM ranged from 1.4 to 50 percent. Median sensitivity and PPV were low; median specificity and NPV were high.
  • One study compared the IADPSG criteria with a two-step strategy. Sensitivity was 82 percent and specificity was 94 percent. Prevalence of GDM was 13.0 percent with IADPSG criteria compared with 9.6 percent with the two-step strategy. PPV and NPV were 61 percent and 98, respectively.

Detailed Synthesis

50 g OGCT Screening and GDM Diagnosis with 100 g OGTT

This section includes studies in which women underwent a 2-step practice that included screening with a 50 g OGCT at 24 to 28 weeks followed by a 100 g OGTT to confirm a diagnosis of GDM. The 50 g OGCT screening test is grouped by the two following diagnostic confirmation criteria: CC and ADA (2000-2010) criteria and the NDDG criteria.

Carpenter and Coustan and ADA (2000-2010) Criteria

Description of Included Studies

Fourteen studies confirmed a diagnosis of GDM with a 100 g, 3-hour OGTT using CC/ADA 2000-2010 criteria (Appendix D).63,64,68,72,75-77,99,104,108,121,140,159,161 Ten studies used a universal screening practice,63,64,68,72,76,77,108,121,159,161 three studies used a selective, risk-based screening practice for an OGCT,75,99,140 and one study only included women with an abnormal OGCT.104 Six studies performed the OGTT on all women regardless of OGCT value,63,68,72,108,140,159 while eight performed an OGTT in patients with a positive OGCT. 64,75-77,99,104,121,161

Studies were conducted in the United States,64,104,121 Canada,15 Iran,71,75-77 Brazil,63 France,108 Mexico,72 Switzerland,68 Thailand,140 and United Arab Emirates.99 The number of patients analyzed ranged from 138 to 11,545. Maternal age was reported in 12 studies and the mean ranged from 23.7 to 32.5 years. Mean BMI was reported in 10 studies and ranged from 23.3 to 29.6 kg/m2. One study included women tested at ≥20 weeks' gestation.121

Results

Nine studies provided data to estimate the test characteristics of a 50 g OGCT screening tested at the 1-hour interval and cutoff value of ≥140 mg/dL.63,64,68,72,76,99,108,140,159 The accuracy of the OGCT (i.e., the proportion of true positive and true negative results) was generally high (median = 86.5 percent) and ranged from 66 to 94 percent (Table 3). Figure 6 presents the sensitivities and specificities for the individual studies. The joint estimates of sensitivity and specificity were 85 percent (95% CI, 76 to 90) and 86 percent (95% CI, 80 to 90). Hierarchical summary receiver operator characteristic (HSROC) curves comparing the sensitivity and specificity for all studies are presented in Appendix F. The prevalence of GDM ranged from 3.8 to 31.9 (Table 3). The PPV ranged from 18.5 to 83.1 percent; the NPV ranged from 95.1 to 99.0 percent (Table 3). The study by Rust et al. 121 included women ≥20 weeks and reported a sensitivity of 56 percent (95% CI, 30 to 80) and specificity of 94 percent (95% CI, 91 to 96). The prevalence of GDM was 3.6 percent.

Table 3. Prevalence and diagnostic test characteristics for 50 g OGCT by CC or ADA (2000–2010) diagnostic criteria.

Table 3

Prevalence and diagnostic test characteristics for 50 g OGCT by CC or ADA (2000–2010) diagnostic criteria.

Figure 6 is a forest plot depicting the sensitivity and specificity of the 50 g OGCT screening test by ADA (2000-2010) or Carpenter-Coustan diagnostic criteria. Results are reported for two different OGCT thresholds of ≥130 mg/dL and ≥140 mg/dL. Six studies provided data on the sensitivity and specificity of a 50 g OGCT tested at the 1-hour interval and cutoff value of ≥140mg/dL. For each study, sensitivity and specificity are as follows: Ayach 77 (95% CI 46 to 95) and 87 (95% CI 82 to 90) percent; De Los Monteros 88 (95% CI 77 to 96) and 87 (95% CI 83 to 90) percent; Gandevani 86 (95% CI 79 to 92) and 85 (95% CI 81 to 88) percent; Perucchini 58 (95% CI 44 to 72) and 91 (95% CI 88 to 93) percent; Rust 56 (95% CI 30 to 80) and 94 (95% CI 91 to 96) percent; and Trihospital 68 (95% CI 62 to 73) and 83 (95% CI 82 to 85) percent. Four studies provided data on the sensitivity and specificity of a 50 g OGCT with a cutoff value of ≥130mg/dL. For each study, sensitivity and specificity are as follows: Eslamian 92 (95% CI 62 to 100) and 77 (95% CI 68 to 84) percent; Gandevani 99 (95% CI 96 to 100) and 73 (95% CI 69 to 77) percent; Kashi 100 (95% CI 83 to 100) and 75 (95% CI 68 to 81) percent; Soheilykhah 95 (95% CI 91 to 97) and 76 (95% CI 72 to 80) percent.

Figure 6

Forest plot of sensitivity and specificity: 50 g OGCT by CC or ADA (2000–2010) criteria. ADA = American Diabetes Association; CC = Carpenter-Coustan; FN = false negative; FP = false positive; OGCT = oral glucose challenge test; TN = true negative; (more...)

Six studies used an OGCT cutoff value of ≥130 mg/dL.64,71,75-77,108 The accuracy of the OGCT ranged from 64.5 to 90.4 (median = 78.5 percent) (Table 3). Figure 6 presents the sensitivities and specificities for the individual studies. The joint estimates of sensitivity and specificity were 99 percent (95% CI, 95 to 100) and 77 percent (95% CI, 68 to 83), respectively. The prevalence of GDM ranged from 4.3 to 29.5 (Table 3). The PPV ranged from 10.7 to 62.3 percent; the NPV ranged from 97.3 to 100 percent (Table 3).

One study used an OGCT cutoff value of >200 mg/dL.104 The prevalence was 29.4 percent. The sensitivity was 100 (95% CI, 0.87 to 100) and specificity was 100 percent (95% CI, 0.99 to 100).

The studies by Agarwal,99 Weerakiet,140 Bobrowski, 104 and Kashi75 are at high risk for selection bias due to the use of selective screening practice. Not all women received a confirmatory OGTT in the studies by Eslamian,71 Gandevani,76 Soheilykhah,77 and Yogev64 are at high risk for partial verification bias.

NDDG Criteria

Description of Included Studies

Ten studies used the NDDG criteria to confirm a diagnosis of GDM (Appendix D).66,67,69,72-74,104,123,144,159 Eight studies used a universal screening practice;66,67,69,72-74,144,159 two included only women with an abnormal OGCT.104 123 Six studies performed the OGTT on all women regardless of OGCT value,63,68,72,108,140,159 while the remaining studies performed an OGTT only in patients with a positive OGCT.

Four studies were conducted in North America,69,104,123,159 two in Europe,74,144 and one each in Mexico,72 Saudi Arabia,67 and Thailand,73 and Turkey.66 The number of patients enrolled ranged from 80 to 4,274. Mean maternal age, reported in seven studies, ranged from 25.7 to 32.1 years. Only two studies reported BMI. All studies screened women after 24 weeks' gestation.

Results

Seven studies provided data to estimate the test characteristics of a 50 g OGCT tested at the 1-hour interval and cutoff value of ≥140 mg/dL.66,69,72-74,144,159 The accuracy of the OGCT was generally high (median = 82 percent) (Table 4). Figure 7 presents the sensitivities and specificities for the individual studies. HSROC curves comparing the sensitivity and specificity for all studies are presented in Appendix F. The joint estimates of sensitivity and specificity were 85 percent (95% CI, 73 to 92) and 83 percent (95% CI, 78 to 87), respectively. The prevalence of GDM ranged from 1.4 to 45.8 (median = 6.2) (Table 4). The PPV ranged from 12.0 to 57.1; the NPV ranged from 70 to 100 (Table 4).

Table 4. Prevalence and diagnostic test characteristics for 50 g OGCT by NDDG diagnostic criteria.

Table 4

Prevalence and diagnostic test characteristics for 50 g OGCT by NDDG diagnostic criteria.

Figure 7 is a forest plot depicting the sensitivity and specificity of the 50 g OGCT screening test by NDDG diagnostic criteria. Results are reported for two different OGCT thresholds of ≥130 mg/dL and ≥140 mg/dL. Eight studies provided data on sensitivity and specificity of a 50 g OGCT and cutoff value of ≥140mg/dL. For each study, sensitivity and specificity are as follows: Cetin 65 (95% CI 38 to 86) and 88 (95% CI 83 to 91) percent; De Los Monteros 88 (95% CI 75 to 96) and 85 (95% CI 81 to 89) percent; Deerochanawong 100 (95% CI 69 to 100) and 90 (95% CI 87 to 92) percent; Lamar 80 (95% CI 28 to 99) and 82 (95% CI 74 to 88) percent; Perea-Carrasco 97 (95% CI 86 to 100) and 75 (95% CI 71 to 78) percent; Trihospital 77 (95% CI 69 to 83) and 82 (95% CI 81 to 83) percent; Uncu 79 (95% CI 49 to 95) and 54 (95% CI 34 to 72) percent; and Yogev 97 (95% CI 94 to 99) and 63 (95% CI 61 to 65) percent. Three studies reported sensitivity and specificity of a 50 g OGCT and cutoff value of ≥130mg/dL. For each study, sensitivity and specificity are as follows: Ardawi 88 (95% CI 80 to 94) and 84 (95% CI 78 to 89) percent; Berkus 90 (95% CI 70 to 99) and 66 (95% CI 53 to 78) percent; and Uncu 79 (95% CI 49 to 95) and 54 (95% CI 34 to 72) percent.

Figure 7

Forest plot of sensitivity and specificity: 50 g OGCT by NDDG criteria. NDDG = National Diabetes Data Group; OGCT = oral glucose challenge test

Three studies67,74,113 used a cutoff ≥130 mg/dL. The accuracy of the test ranged from 50.0 to 85.5 percent (Table 4). Figure 7 presents the sensitivities and specificities for the individual studies. As there were only three studies, we did not pool the results. The prevalence of GDM ranged from 16.7 to 35.3 (Table 4). The PPV ranged from 20.0 to 75.0; the NPV ranged from 87.5 to 92.9 (Table 4). One study used an OGCT cutoff value of >200 mg/dL. The sensitivity, specificity, PPV and NPV were all 100 percent.

The studies by Ardawi,67 Bobrowski,104 Berkus123 Cetin,144 Deerochanawong, 73 Lamar,69 and Uncu,74 are at high or unclear risk for selection bias due to selective or unclear screening practices. Studies by Ardawi,67 De los Monteros,72 and Lamar,69 are at high or unclear risk for partial verification bias as not all women received a confirmatory OGTT.

50 g OGCT Screening and GDM Diagnosis with 75 g OGTT

This section includes studies in which women underwent a 2-step screening and diagnostic practice that included a 50 g OGCT followed by a 75 g OGTT to confirm a diagnosis of GDM.

ADA (2000-2010) Criteria

Description of Included Studies

Three studies101,125,139 used the ADA 75 g, 2-hour criteria after a 50 g, 1-hour OGCT (Appendix D). All but the study by Maegawa et al.101 used a threshold of ≥140 mg/dL for the OGCT. The studies were conducted in Japan,101,139 and Germany.125 One Canadian study105 confirmed diagnosis using the Canadian Diabetes Association 75 g, 2-hour criteria.

The number of patients analyzed ranged from 509 to 912. All studies reported maternal age, which ranged from 28.5 to 33.4 years. BMI ranged from 20.0 to 24.8 kg/m2. All studies performed the OGCT screening at 24-28 weeks; two studies also screened women in early pregnancy.101,139

Results

The accuracy of the ADA (2000-2010) 75 g ranged from 84 percent to 87 percent (Table 5). Figure 8 presents the sensitivities and specificities for the individual studies. The results were not pooled. The prevalence of GDM ranged from 1.6 to 18.1 (Table 5). The PPV ranged from 7 to 20; the NPV ranged from 99 to 100 (Table 5). The accuracy of the CDA 75 g was 72 percent; PPV was 37 percent and NPV was 94 percent, respectively.

Table 5. Prevalence and diagnostic test characteristics for 50 g OGCT (different thresholds) by ADA (2000–2010) 75 g criteria.

Table 5

Prevalence and diagnostic test characteristics for 50 g OGCT (different thresholds) by ADA (2000–2010) 75 g criteria.

Figure 8 is a forest plot depicting the sensitivity and specificity of the 50g OGCT screening test at differing thresholds by ADA (2000-2010) criteria on a 75 g test. Three studies provided data on sensitivity and specificity for ADA criteria on a 2 hour, 75 g test. Two of these studies by Buhling and Yachi confirmed GDM with a threshold of ≥ 140 mg/dL, while Maegawa used a threshold of ≥180 mg/dL. One study by Rey confirmed the diagnosis of GDM using CDA criteria on a 2 hour, 75 g test with a threshold of 160 mg/dL. For each study, sensitivity and specificity are as follows: Buhling 97 (95% CI 86 to 100) and 84 (95% CI 81 to 86) percent; Yachi 88 (95% CI 88 to 79) and 79 (95% CI 75 to 82) percent; Maegawa 86 (95% CI 65 to 97) and 87 (95% CI 84 to 89) percent; Rey 81 (95% CI 58 to 95) and 69 (95% CI 59 to 79) percent.

Figure 8

Forest plot of sensitivity and specificity: 50 g OGCT (different thresholds) by ADA (2000–2010) 75 g criteria. ADA = American Diabetes Association; CDA = Canadian Diabetes Association; OGCT = oral glucose challenge test; OGTT = oral glucose tolerance (more...)

The studies by Rey105 and Yachi139 are at high or unclear risk of selection bias due to their screening practices. The study by Buhling,125 is at high risk for partial verification bias as not all women received a confirmatory OGTT.

World Health Organization Criteria

Description of Included Studies

Four studies used the WHO criteria to confirm a diagnosis of GDM (Appendix D).62,70,73,157 The studies were conducted in Netherlands,62 Sri Lanka,70 Malaysia,157 and Thailand.73 The number of patients enrolled ranged from 188 to 1,301. Mean maternal age ranged from 25.7 to 30.8 years. Mean BMI, as reported in two studies, was 22.4 and 24.2. All studies performed the OGCT screening at 24-28 weeks with OGTT performed the following 1 to 2 weeks.

Results

The accuracy of the test ranged from 73 percent to 88 percent (Table 6). Figure 9 presents the sensitivities and specificities for the individual studies. The results were not pooled. The prevalence of GDM ranged from 3.7 to 15.7 (Table 6). The PPV ranged from 5 to 20; the NPV ranged from 94 to 99 (Table 6). The prevalence of GDM ranged from 3.7 to 50.0 (Table 6). The PPV ranged from 17.8 to 76.2; the NPV ranged from 78.9 to 98.7

Table 6. Prevalence and diagnostic test characteristics for 50 g OGCT by WHO diagnostic criteria.

Table 6

Prevalence and diagnostic test characteristics for 50 g OGCT by WHO diagnostic criteria.

Figure 9 is a forest plot depicting the sensitivity and specificity of the 50 g OGCT screening test by WHO criteria. GDM was confirmed with one or more elevated values of ≥7.8mmol/L (140mg/dL) at 2-hours or ≥7.0mmol/L (126mg/dL) at fasting. For each study, sensitivity and specificity are as follows: Deerochanawong 43 (95% CI 34 to 53) and 94 (95% CI 92 to 96) percent; Siribaddana 85 (95% CI 72 to 94) and 73 (95% CI 69 to 76) percent; and van Leeuwen 70 (95% CI 55 to 83) and 89 (95% CI 87 to 91) percent. One study by Tan used a threshold of ≥ 137 mg/dL. Sensitivity was 92 percent (87 to 96) and sensitivity was 25 percent (20 to 30).

Figure 9

Forest plot of sensitivity and specificity: 50 g OGCT by WHO criteria. OGCT = oral glucose challenge test; WHO = World Health Organization

This section includes studies that examined FPG as a screening test. A diagnosis of GDM was confirmed using CC or ADA (2000-2010), WHO, NDDG, and CDA 75 g OGTT criteria.

Fasting Plasma Glucose and CC/ADA (2000-2010) Criteria

Description of Included Studies

Seven studies provided data on FPG at various thresholds as an alternative screening test to glucola-based screening with a diagnosis of GDM using CC and ADA (2000-2010) criteria (Appendix D).65,75,99,108,112,126,127 Three studies used a universal screening practice112 108,127 and the remaining studies used a selective, risk-based screening practice.65,75,99,126 All but one study75 performed the OGTT on all women regardless of OGCT value.

Studies took place in the United States,126,127 France,65,108 Iran,75 and the United Arab Emirates.99,112 The number of patients enrolled ranged from 123 to 11,545. Mean maternal age was reported in four studies and ranged from 27.8 to 32.8 years. Mean BMI was reported in three studies and ranged from 22.5 to 29.6. Most studies tested women after 24 weeks' gestation; one study tested women at 23 weeks.126

Results

The studies provided data to estimate the test characteristics of FPG at four common thresholds: ≥85 mg/dL (4.7 mmol/L), ≥90 mg/dL (5.0 mmol/L), ≥92 mg/dL (5.1 mmol/L), and ≥95 mg/dL (5.3 mmol/L). Figure 10 presents the sensitivities and specificities for the individual studies. The joint estimates of sensitivity and specificity, respectively for the different FPG threshold values are:

Figure 10 is a forest plot depicting the sensitivity and specificity of the fasting plasma glucose (FPG) screening test by CC/ADA (2000-2010) criteria. Results are presented for different FPG thresholds. For the FPG threshold of greater or equal to 85 mg/dL, sensitivity and specificity are as follows: Agarwal 2000: 91 (95% CI, 85 to 96) and 51 (95% CI, 45 to 58) percent;. Agarwal 2006: 90 (95% CI, 87 to 92) and 53 (95% CI, 51 to 55) percent; Agarwal 2000: 88 (95% CI, 85 to 91) and 53 (95% CI, 49 to 56) percent; Kashi 2007: 86 (95% CI, 75 to 93) and 69 (95% CI, 60 to 77) percent; Sacks 2003: 74 (95% CI, 69 to 79) and 48 (95% CI, 46 to 50) percent. For the FPG threshold of greater or equal to 90 mg/dL, sensitivity and specificity are as follows: Agarwal 2000: 85 (95% CI, 77 to 91) and 72 (95% CI, 66 to 78) percent;. Agarwal 2006: 83 (95% CI, 79 to 86) and 76 (95% CI, 75 to 77) percent; Agarwal 2000: 82 (95% CI, 78 to 86) and 75 (95% CI, 72 to 77) percent; Chastang 2003: 75 (95% CI, 64 to 85) and 76 (95% CI, 71 to 81) percent; Sacks 2003: 52 (95% CI, 46 to 58) and 77 (95% CI, 76 to 78) percent. For the FPG threshold of greater or equal to 92 mg/dL, sensitivity and specificity are as follows: Kashi 2007: 80 (95% CI, 66 to 88) and 92 (95% CI, 86 to 96) percent;. Kauffman 2006: 76 (95% CI, 55 to 91) and 90 (95% CI, 82 to 95) percent; Chevalier 2011: 26 (95% CI, 22 to 31) and 95 (95% CI, 94 to 96) percent. For the FPG threshold of greater or equal to 95 mg/dL, sensitivity and specificity are as follows: Agarwal 2000: 79 (95% CI, 71 to 86) and 91 (95% CI, 87 to 94) percent; Agarwal 2006: 69 (95% CI, 65 to 73) and 90 (95% CI, 89 to 91) percent; Agarwal 2000: 68 (95% CI, 62 to 73) and 94 (95% CI, 92 to 95) percent;. Sacks 2003: 34 (95% CI, 28 to 39) and 92 (95% CI, 91 to 93) percent; Chevalier 2011: 20 (95% CI, 16 to 24) and 98 (95% CI, 97 to 99) percent.

Figure 10

Forest plot of sensitivity and specificity: fasting plasma glucose by CC/ADA (2000–2010) criteria. ADA = American Diabetes Association; CC = Carpenter-Coustan; OGCT = oral glucose challenge test

  • ≥85 mg/dL: 87 percent (95% CI, 81 to 91) and 52 percent (95% CI, 50 to 55)
  • ≥90 mg/dL: 77 percent (95% CI, 66 to 85) and 76 percent (95% CI, 75 to77)
  • ≥92 mg/dL: 76 percent (95% CI, 55 to 91) and 92 percent (95% CI, 86 to 96) (median)
  • ≥95 mg/dL: 54 percent (95% CI, 32 to 74) and 93 percent (95% CI, 90 to 96)

The prevalence of GDM ranged from 1.4 to 33.3 (median = 6.2) (Table 7). The PPV ranged from 12.0 to 45.8; the NPV ranged from 83.3 to 100 (Table 7).

Table 7. Prevalence and diagnostic test characteristics for fasting plasma glucose by CC/ADA (2000–2010) diagnostic criteria.

Table 7

Prevalence and diagnostic test characteristics for fasting plasma glucose by CC/ADA (2000–2010) diagnostic criteria.

Fasting Plasma Glucose and Other Diagnostic Criteria

Description of Included Studies

Two studies used the WHO criteria to confirm a diagnosis of GDM,111,120 one used the NDDG criteria,127 and one each used the criteria from the national organizations from Canada105 and Japan.101 Different FPG thresholds were used: Maegawa et al.101 and Wijeyaratne et al.111 used ≥ 85 mg/dL, Kauffman et al.127 used ≥ 92 mg/dL, and Reichelt et al.120 used ≥ 89 mg/dL.

Results

Table 8 summarizes the prevalence and test characteristics of the studies.

Table 8. Prevalence and diagnostic test characteristics for fasting plasma glucose by NDDG-WHO and other diagnostic criteria.

Table 8

Prevalence and diagnostic test characteristics for fasting plasma glucose by NDDG-WHO and other diagnostic criteria.

Risk Factor-Based Screening and GDM Diagnosis

Description of Included Studies

Eight studies presented data on risk factor-based screening (Appendix D).63,99,111,114,115,119,151,160 One study was conducted in North America,160 four in Europe,115,119,151,162 two in the Middle East,114 111 and one in South America.63 The number of patients enrolled ranged from 532 to 4,918.

Results

Figure 11 presents the sensitivities and specificities for the individual studies. The results were not pooled because different diagnostic criteria were used across the studies (Table 9). The prevalence of GDM ranged from 1.7 to 16.9 (Table 9). The PPV ranged from 5 to 20; the NPV ranged from 94 to 99 (Table 9).

Figure 11 is a forest plot depicting the sensitivity and specificity of the risk factor screening by several diagnostic criteria. For each study, sensitivity and specificity are as follows: Agarwal 2000 95 (95% CI 92 to 97) and 94 (95% CI 92 to 95) percent using CC criteria; Wijeyaratne 93 (95% CI 88 to 97) and 22 (95% CI 19 to 25) percent using WHO criteria; Hill 86 (95% CI 73 to 94) and 50 (95% CI 46 to 54) percent using CC criteria; Ayach 85 (95% CI 55 to 98) and 47 (95% CI 42 to 53) percent using ADA criteria; Trihospital 83 (95% CI 72 to 91) and 83 (95% CI 81 to 85) using NDDG criteria; Jensen 81 (95% CI 73 to 87) and 65 (95% CI 63 to 66) percent using WHO criteria; Poyohonen-Alho 79 (95% CI 54 to 94) and 79 (95% CI 75 to 82) percent using author defined criteria; and, Ostlund 48 (95% CI 35 to 61) and 85 (95% CI 83 to 86) percent using WHO criteria.

Figure 11

Forest plot of sensitivity and specificity: risk factor screening by different diagnostic criteria (CC/ADA, NDDG, WHO). ADA = American Diabetes Association; CC = Carpenter-Coustan; NDDG = National Diabetes Data Group; WHO = World Health Organization (more...)

Table 9. Prevalence and diagnostic test characteristics for risk factor screening by different diagnostic criteria.

Table 9

Prevalence and diagnostic test characteristics for risk factor screening by different diagnostic criteria.

Other Screening Tests

Other studies examined point of care testing with a glucometer to measure capillary blood glucose,110,111,116,117,128 or other markers such as fasting plasma insulin,127,139 serum fructosamine,74,109 glycated hemoglobin (HbA1c),74,113 adiponectin levels,140 and glycosuria.125 The results are summarized in Table 10.

Table 10. Prevalence and characteristics of other screening tests by GDM diagnostic criteria.

Table 10

Prevalence and characteristics of other screening tests by GDM diagnostic criteria.

Comparison of Early and Late Screening Tests

One study (n = 749) conducted in Japan provided data on screening for GDM in the first and second trimesters.101 The authors used three different screening tests: FPG, HbA1c, and a casual 50 g, 1-hour OGCT. GDM was confirmed with Japan Society of Obstetrics and Gynecology criteria (75 g, 2-hour) 2 to 4 weeks after screening. Prevalence of GDM using a universal screening practice was 1.9 percent in the first trimester and 2.9 percent in the second trimester. Table 11 presents a summary of the test characteristics by screening test and time point. These results should be interpreted cautiously as the women diagnosed with GDM in the first trimester had pre-pregnancy body weight and BMI that were significantly higher than for women who did not have GDM.

Table 11. Prevalence and characteristics of various screening tests for screening in the first and second trimesters (Maegawa study).

Table 11

Prevalence and characteristics of various screening tests for screening in the first and second trimesters (Maegawa study).

Comparison of Different Diagnostic Criteria

Seven studies provided data on the comparability of two diagnostic tests in the same group of women. The diagnostic criteria were: 75 g, 2-hour versus 100 g, 3-hour criteria; IADPSG versus the two-step Australasian Diabetes in Pregnancy Society (ADIPS) criteria; FPG versus ADA 100 g, 3-hour criteria; and IADPSG FPG ≥92 mg/dL versus WHO 75 g criteria.

Four studies compared 75 g, 2-hour criteria with 100 g, 3-hour criteria as the reference standard; however, different populations were assessed (Figure 12). The study by Brustman (n = 32) was conducted in the United States and compared the results of a 75 g, 3 hour OGTT with a 100 g, 3 hour OGTT.143 Prevalence of GDM was 50 percent with NDDG criteria. The sensitivity was 29 percent (95% CI, 8 to 58) and the specificity was 89 percent (95% CI, 65 to 99); PPV and NPV were 100 (95% CI, 69 to 100) and 62 (95% CI, 43 to 72), respectively.

Figure 12 is a forest plot depicting the sensitivity and specificity of the 2 hour, 75 g OGTT compared with 3hr, 100 g OGTT. For each study, sensitivity and specificity are as follows: Deerochanawong 100 (95% CI 69 to 100) and 86 (95% CI 83 to 88) percent; Soonthornpun 33 (95% CI 7 to 70) and 100 (95% CI 89 to 100) percent; Brustman 29 (95% CI 8 to 58) and 98 (95% CI 65 to 99) percent; Mello early pregnancy (16 to 21 weeks) 27 (95% CI 14 to 43) and 98 (95% 95 to 99) percent; Mello late pregnancy (26 to 31 weeks) 18 (95% CI 10 to 30) and 96 (95% CI 94 to 98) percent.

Figure 12

Forest plot of sensitivity and specificity: 75 g OGTT by 100 g OGTT. OGTT = oral glucose tolerance test

The study by Deerochanawong was conducted in Thailand (n = 709).73 The prevalence of GDM was 1.4 percent with NDDG criteria and with WHO criteria it was 15.7 percent. Sensitivity was 100 percent (95% CI, 69 to 100) and specificity was 90 percent (95% CI, 92 to 96). PPV and NPV were 12 (95% CI, 7 to 21) and 100 (95% CI, 99 to100), respectively.

The study by Soonthornpun was also conducted in Thailand (n = 42).118 The prevalence of GDM using the CC criteria was 21 percent. Sensitivity was 33 percent (95% CI, 7 to 70) and specificity was 100 percent (95% CI, 89 to 100). PPV and NPV were 100 (95% CI, 53 to 100) and 85 (95% CI, 71 to 92), respectively.

The fourth study by Mello was conducted in Italy and assessed diagnosis of GDM in women during early pregnancy (16 to 21 weeks) (n = 227) and late pregnancy (26 to 31 weeks) (n = 484).153 For the early pregnancy group, the prevalence using CC criteria was 18 percent. Sensitivity was 27 percent (95% CI, 14 to 43) and specificity was 98 percent (95% CI, 95 to 99). PPV and NPV were 73 (95% CI, 48 to 89) and 86 (95% CI, 81 to 90), respectively. For the late pregnancy group the prevalence of GDM was 12 percent. Sensitivity was 18 percent (95% CI, 10 to 30) and specificity was 96 percent (95% CI, 94 to 98). PPV and NPV were 42 (95% CI, 25 to 61) and 89 (95% CI, 86 to 92), respectively.

An Australian study (n = 1,275) compared the diagnosis of GDM using IADPSG criteria with the ADIPS criteria as the reference standard.124 GDM prevalence was 13.0 percent with IADPSG criteria compared with 9.6 percent with ADIPS. The sensitivity of IADPSG was 82 percent (95% CI, 74 to 88) and specificity was 94 percent (95% CI, 93 to 96); the PPV and NPV were 61 percent (95% CI, 53 to 68) and 98 (95% CI, 97 to 99), respectively.

Two studies assessed FPG as a diagnostic test but used different reference standards. A Brazilian study (n = 341) compared FPG with the ADA 100 g, 3-hour criteria.63 The prevalence of GDM was 3.8 percent using ADA (2000-2010) 100 g criteria. The sensitivity was 84 percent (95% CI, 55 to 98) and specificity was 47 percent (95% CI, 42 to 53); PPV and NPV were 6 (95% CI, 3 to10) and 99 (95% CI, 56 to 100), respectively.

The second study, conducted in India (n = 1,463), compared IADPSG FPG criteria with the WHO 75 g criteria.107 The prevalence of GDM was 13.4 percent with WHO criteria and 3.2 percent with FPG (≥95 mg/dL). The sensitivity of FPG as a diagnostic test was 29 percent (95% CI, 29 to 36) and specificity was 89 percent (95% CI, 88 to 91); PPV and NPV were 76 (95% CI, 55 to 89) and 79 (95% CI, 58 to 87), respectively.

Key Question 2. What is the direct evidence on the benefits and harms of screening women for GDM to reduce maternal, fetal, and infant morbidity and mortality?

Description of Included Studies

Two studies met the inclusion criteria for Key Question 2.130,131 Both studies compared outcomes for women who underwent screening or diagnostic testing for GDM with women who were not screened or tested. The studies are described in Appendix D. The studies were published in 2004130 and 1996.131 The methods and outcomes differed between the studies, therefore no results were pooled.

Methodological Quality of Included Studies

The studies were of high and moderate methodological quality with 7 and 6 of a maximum of 9 points, respectively.130,131 The studies scored well for selection of the non-exposed cohort (same as exposed cohort), ascertainment of exposure and outcome, and adequacy of followup in terms of duration and attrition. Neither study controlled for potential confounding variables. Solomon et al., included a select population (i.e., nurses participating in a longitudinal study) that may not be representative of the general target population of this review.

Key Points

Only two retrospective cohort studies were relevant to Key Question 2. There were no RCTs available to answer questions about screening. Based on the small number of studies and sample sizes, the impact of screening women for GDM on health outcomes is inconclusive.

Detailed Synthesis

One retrospective cohort study examined 1,000 women receiving antenatal care and delivering at a single center in Thailand between October 2001 and December 2002.130 Women who presented with specific risk factors underwent screening with OGCT (n = 411), and subsequent OGTT if positive on the OGCT (n = 164). Among those screened, 29 cases of GDM were identified (7 percent of the screened group; 3 percent of the total population). Among those who did not undergo screening, 40 women at high risk for GDM were missed (4 percent) and there were two cases of pregestational DM (0.2 percent). High risk was determined based on a list of risk factors, the most commonly observed were age ≥ 30 years (53 percent of the 40 patients) and family history of type 2 diabetes mellitus (43 percent of the 40 patients). Appendix D lists the obstetric complications that were reported in decreasing frequency. Overall there were significantly more complications in the screened group (64/411 versus 63/589). The only individual obstetric complication that was different between groups was pregnancy-induced hypertension with significantly more cases in the screened group. The screened group was significantly older and had a higher average BMI than the group not screened. The pregnancy outcomes are listed in Appendix D. The only significant difference was in the incidence of cesarean deliveries which was greater in the screened group. The authors concluded that selective OGCT screening was highly effective in detecting GDM; however, the impact on outcomes was inconclusive due to small numbers. No information was provided on how women who screened positive were treated.

The second study involved a survey of a subset of participants in a large prospective cohort study involving 116,678 nurses age 25-42 years (the Nurses' Health Study II).131 Surveys were sent to 422 women who reported a first diagnosis of GDM between 1989 and 1991, as well as a sample of 100 women who reported a pregnancy but no diagnosis of GDM. The intent of the study was to determine the frequency of screening for GDM and the extent to which diagnosis is based on NDDG criteria. Only one outcome was reported that was relevant to this Key Question: the incidence of macrosomia (infant weight > = 4.3 kg) was the same in the screened and unscreened groups (7 percent each group). These results pertained to 93 eligible women who reported a pregnancy and no diagnosis of GDM, 77 of whom reported having a 1-h 50-g OGCT. No information was provided on how women who screened positive were treated. No relevant outcomes were reported for the group of women who reported a pregnancy and first diagnosis of GDM.

Key Question 3. In the absence of treatment, how do health outcomes of mothers who meet various criteria for GDM and their offspring compare to those who do not?

Description of Included Studies

Thirty-eight studies met the inclusion criteria for Key Question 3.3,54,67,78-94,102,103,106,132-137,142,145-147,149,150,152,154,155 The studies are described in Appendix D. Studies provided data for untreated women who met criteria for GDM, showed differing levels of glucose tolerance, or had no GDM. Most included studies were prospective or retrospective cohort studies published between 1995 and 2011 (median year 2004). Two studies were long-term followup studies of RCTs; however, only data from the untreated patients were included in the results for this Key Question.54,142 These studies had associated publications providing more detailed break-down of groups and outcomes.160,163 Fourteen studies were conducted in the U.S.,54,78,81,88-91,132,135,136,146,150,152 10 in Europe,80,86,87,93,102,106,133,145,149,154 2 in Canada,83,142 2 in Australia,3,85 and 11 from other countries67,79,82,84,92,94,103,134,137,147,155 (including Japan, Saudi Arabia, Turkey, Iran, China, and Taiwan). Populations analyzed in North American studies involved diverse ethnicities representative of the respective populations; studies from Europe or elsewhere most often included women of ethnic descent from the country of study origin. In one case, women analyzed were at risk for GDM;149 this study has been noted as potentially unrepresentative of all women eligible for screening.

We grouped studies according to the diagnostic criteria used; these included CC, NDDG, WHO, and IADPSG. CC values were endorsed by the ADA 2000-2010 as well as the 4th and 5th IWC on Gestational Diabetes. Most studies employing NDDG criteria provided comparison groups of women diagnosed with CC criteria. In most cases, the NDDG GDM group received treatment for GDM as it is commonly considered unethical in North America to not treat these women; therefore, these groups were not included in the results for this Key Question. One study compared unrecognized cases of NDDG GDM with a patient group with no GDM; the unrecognized cases were sixteen women diagnosed postpartum and therefore did not receive any treatment.152 CC groups were included; therefore, data from studies employing NDDG criteria with CC comparison groups, CC criteria, ADA, or 4th – 5th IWC criteria were included in the results. Table 1 provides an overview of these criteria.

Seventeen studies employed NDDG criteria (with treated groups excluded from this analysis), CC criteria, ADA, or 4th-5th IWC criteria with comparable groups. Groups included GDM diagnosed by CC criteria, no GDM by any criteria (normal), impaired glucose tolerance (IGT) defined as one abnormal glucose value (OAV), and false positive (positive OGCT, negative OGTT). Two studies had unique group selections and are described in the text below.

Six studies utilized NDDG criteria exclusively. Four of these presented consistent groups for analysis: normal (no GDM by any criteria) and false positive. One study retrospectively identified women with unrecognized GDM by NDDG criteria and compared this group with woman with normal glucose tolerance.

Eight studies presented data according to WHO criteria, four of which provided comparable groups. WHO criteria proved a significant challenge due to variability by year, studies providing insufficient groupings for comparison, and treatment of most IGT or OAV groups. One of the two included studies provided data for women diagnosed with IGT at 8.0-8.9 mmol/L (untreated) and the other provided a similar IGT diagnosis at 7.8-8.9 mmol/L, both at two hours post 75 g load. Studies were pooled for analysis as they were deemed to be sufficiently similar. One study compared WHO GDM (untreated) with no GDM, and was included in the analysis for macrosomia.84 Three studies comparing differing levels of WHO criteria were excluded from pooled analysis because they did not have comparable groups with other included studies.134,137,147

Three studies utilized IADPSG criteria for diagnosis and provided comparable groups for pooled analysis.78,79,93

Methodological Quality of Included Studies

The methodological quality of the included studies is described in Appendix C3. Quality was analyzed using the Newcastle-Ottawa Scale (NOS) with a possible total of 9 stars. The median quality score was 9 stars, with two studies receiving a score of 6/9, nine studies a score of 7/9, seven studies a score of 8/9, and twenty a score of 9/9. Studies receiving lower scores on the NOS most often did not control for potential confounding (e.g., due to BMI, age, race), and/or had an important proportion of patients lost to followup. Overall, the majority of studies were considered good quality (36 of 38, 95 percent).

Key Points

  • Thirty-eight studies provided data for this question that sought to examine health outcomes for women who meet various criteria for GDM and do not receive treatment. The majority of data came from cohort studies or the untreated groups from randomized trials.
  • A wide variety of diagnostic criteria and thresholds were compared across the studies. The most common groups reported and compared were GDM diagnosed by CC criteria, no GDM by any criteria (normal), impaired glucose tolerance defined as OAV, and false positive (positive OGCT, negative OGTT). The following criteria were used: CC (19 studies), NDDG (6 studies), WHO (8 studies), and IADPSG (3 studies).

Maternal Outcomes

  • A methodologically strong study showed a continuous positive relationship between increasing glucose levels and the incidence of primary cesarean section. This study also found significantly fewer cases of preeclampsia and cesarean section among women without GDM compared with those meeting IADPSG criteria.
  • For preeclampsia, significant differences were found for CC versus patients with no GDM (3 studies) with fewer cases among the patients with no GDM, and for CC GDM versus false-positive groups (2 studies) with fewer cases among the false positives. The strength of evidence for these comparisons was low. No differences were found for NDDG false positive versus no GDM (2 studies), NDDG 1 abnormal OGTT versus no GDM (1 study), and IGT WHO versus no GDM (3 studies); the strength of evidence for these findings was insufficient.
  • For maternal hypertension, significant differences were found for eight of 16 comparisons; five of these comparisons were based on single studies. Patient groups with no GDM showed lower incidence of maternal hypertension when compared with CC GDM, CC false positives, CC 1 abnormal OGTT, IADPSG impaired fasting glucose (IFG), IADPSG double impaired glucose tolerance (IGT-2), and IADPSG IGT IFG. Other comparisons showing significant differences were CC GDM versus false positives (lower incidence for false positives), IADPSG IGT versus IGT IFG (lower incidence for IGT), and IADPSG IFG versus IGT IFG (lower incidence for IFG).
  • There were 21 comparisons for cesarean section with nine significant differences. Patient groups with no GDM showed fewer cesarean sections when compared with CC GDM (9 studies), CC 1 abnormal OGTT (4 studies), CC false positives (5 studies), NDDG false positives (4 studies), NDDG 1 abnormal OGTT (1 study), and WHO IGT (4 studies). Four studies compared CC GDM versus false positives and showed lower incidence for the false positives. Single studies compared IADPSG IFG and IADPSG IGT IFG versus no GDM, respectively, and both showed fewer cases for the patient groups with no GDM.
  • Based on single studies, no differences were observed for maternal birth trauma for CC GDM versus no GDM , CC GDM versus false positives, NDDG GDM (unrecognized) versus no GDM.
  • For maternal weight gain, significant differences were found for three of 12 comparisons: IADPSG IGT versus no GDM (favored IGT), IADPSG IFG versus no GDM (favored IFG), IADPSG IGT-2 versus no GDM (favored IGT-2). All comparisons were based on single studies and strength of evidence was considered insufficient.
  • For maternal mortality/morbidity, single studies compared CC GDM versus no GDM, CC 1 abnormal OGTT versus no GDM, IADPSG GDM versus no GDM. No differences were found except for the latter comparison that showed lower mortality/morbidity for the patient groups with no GDM.
  • No studies provided data on long-term maternal outcomes, such as type 2 diabetes mellitus, obesity and hypertension.

Fetal/Neonatal/Child Outcomes

  • Two methodologically strong studies showed a continuous positive relationship between increasing glucose levels and the incidence of macrosomia. One of these studies also showed significantly fewer cases of shoulder dystocia and/or birth injury, clinical neonatal hypoglycemia, and hyperbilirubinemia among women without GDM compared with women meeting IADPSG criteria.
  • The most commonly reported outcome was macrosomia >4,000 g. Eleven comparisons were made of which six showed a significant difference. Fewer cases were observed among patient groups with no GDM compared with CC GDM (10 studies), CC 1 abnormal OGTT (7 studies), NDDG GDM (unrecognized) (1 study), NDDG false-positives (4 studies), and WHO IGT (1 study). Fewer cases were found for women with false-positive results compared with CC GDM (5 studies). The strength of evidence for these findings was low to insufficient.
  • Data for macrosomia >4,500 g were available for four comparisons and showed significant differences in two cases: patient groups with no GDM had fewer cases compared with women with CC GDM and with unrecognized NDDG GDM. The strength of evidence for these findings was low and was insufficient, respectively.
  • For shoulder dystocia, significant differences were found for 7 of 17 comparisons; all but 1 comparison was based on single studies (insufficient strength of evidence). Patient groups with no GDM showed lower incidence of shoulder dystocia when compared with CC GDM (5 studies; low strength of evidence), NDDG GDM (unrecognized), NDDG false positive, WHO IGT, IADPSG IFG, and IADPSG IGT IFG. The other significant difference showed lower incidence among the false-positive group compared with CC 1 abnormal OGTT.
  • For fetal birth trauma/injury, four studies compared CC GDM, NDDG GDM, and WHO IGT with no GDM. No differences were observed except for NDDG GDM which favored the patient group with no GDM. Strength of evidence was insufficient for all comparisons.
  • Only one difference was found for neonatal hypoglycemia with fewer cases among patient groups with no GDM compared with those meeting CC criteria. No differences were found for other comparisons, including CC GDM versus 1 abnormal OGTT (1 study), CC 1 abnormal OGTT versus no GDM (4 studies), NDDG GDM versus no GDM (1 study), NDDG false positive versus no GDM (1 study), NDDG 1 abnormal OGTT versus no GDM (1 study), and WHO IGT versus no GDM (3 studies). Strength of evidence was insufficient for all comparisons.
  • There were 16 comparisons for hyperbilirubinemia; the majority were based on single studies. Three comparisons showed significant differences between groups: patient groups with no GDM had fewer cases compared with CC false positive, IADPSG IGT, and IADPSG IGT-2, respectively.
  • No differences were found for fetal morbidity/mortality for any of 8 comparisons which may be attributable to small numbers of events within some comparisons. Most comparisons were based on few studies, except for CC GDM versus no GDM which showed no difference based on 6 studies.
  • Based on single studies, significant differences were found in prevalence of childhood obesity for CC GDM versus groups with no GDM (lower prevalence for no GDM) and CC GDM versus false positives (lower prevalence for false positives). No differences, based on single studies, were found for CC GDM versus 1 abnormal OGTT, CC false positive versus no GDM, CC false positive versus 1 abnormal OGTT, or CC 1 abnormal OGTT versus no GDM. No other studies provided data on long-term outcomes, including type 2 diabetes mellitus and transgenerational GDM.

Detailed Synthesis

Overview

Detailed results are described by outcome in the sections that follow. We first describe the maternal outcomes, followed by fetal/neonatal/child outcomes. We present meta-graphs when two or more studies were pooled. These are displayed after the description of results for each outcome. A detailed table of results and a table summarizing the strength of evidence are presented at the end of each of the maternal and fetal/neonatal/child sections (Table 12 and Table 13; Table 14 and Table 15, respectively). The results reported below are based on unadjusted data from the relevant studies. We have reported adjusted results, where available from relevant studies, in Appendix G. In the majority of cases, the adjusted results would not have changed the pooled estimates or overall conclusions. Six studies met inclusion criteria and provided relevant outcomes but were not comparable with other studies and are described here.3,91,134,137,147

Table 12. Evidence summary table: maternal outcomes.

Table 12

Evidence summary table: maternal outcomes.

Table 13. Strength of evidence summary table: maternal outcomes.

Table 13

Strength of evidence summary table: maternal outcomes.

Table 14. Evidence summary table: fetal/neonatal outcomes.

Table 14

Evidence summary table: fetal/neonatal outcomes.

Table 15. Strength of evidence summary table: fetal/neonatal outcomes.

Table 15

Strength of evidence summary table: fetal/neonatal outcomes.

In 1995, Sacks et al. published a prospective cohort study of 3,505 unselected pregnant women; the authors sought to determine glucose threshold distributions for the 2 hr, 75 g OGTT, and to define the relationship between glucose intolerance values and neonatal macrosomia. The methodological quality of the study was good receiving a score of 8/9 points. Study participants were not analyzed by groups, rather regression analyses were conducted to identify a threshold level that predicted greater risk for macrosomia. The study did not identify a specific threshold for fasting or 1-2 hour levels that could discriminate between women who were more likely to have infants with macrosomia. Moreover, across all thresholds the ability to predict macrosomia was relatively consistent.

The HAPO (Hyperglycemia and Adverse Pregnancy Outcomes) study, published in 2008, examined the effect of less severe hyperglycemia on pregnancy outcomes; therefore, all groups fell below the common diagnostic thresholds for GDM. The study involved 23,316 pregnant women from 15 centers in nine countries. The methodological quality was good with a score of 9/9 points. Women were tested employing the 75 g OGTT at 24-32 weeks. Fasting plasma glucose values were divided into seven categories: ≥100 mg/dL (5.6 mmol/L), 95-99 (5.3-5.5), 90-94 (5.0-5.2), 85-89 (4.8-4.9), and <85. The last category (<85 mg/dL) was further subdivided into three levels: <75 mg/dL (4.2 mmol/L), 75-59 (4.2-4.4), and 80-84 (4.5-4.7). The study found a continuous positive association with increasing glucose levels and macrosomia (or birthweight >90th percentile), primary cesarean section, neonatal hypoglycemia, and cord-blood serum c-peptide >90th percentile. The associations were strongest for macrosomia and blood serum c-peptide levels; moreover, associations for neonatal hypoglycemia were not consistently significant. In unadjusted analyses, preeclampsia, cesarean delivery, shoulder dystocia and/or birth injury, clinical neonatal hypoglycemia, and hyperbilirubinemia were statistically significantly less frequent for women without GDM compared with those with GDM based on the IADPSG criteria (data from Appendix, Table B available at care.diabetesjournals.org/cgi/content/full/dc09-1848/DC1). The study did not identify a clear glucose threshold for increased risk in clinically important outcomes.24

Two studies134,147 conducted in China utilized 1980 WHO criteria on a 2 hr OGTT but did not provide similar groups for comparison. One retrospective cohort study published in 2003 involving 2,149 women compared six glucose values: <6.0 mmol/L, 6.0-6.9, 7.0-7.9, 8.0-8.9, 9.0-10.9, and ≥11.0.147 The latter 3 groups were treated for GDM; the former were untreated. There was no significant difference between groups in the incidence of macrosomia (≥4,000 g) or cesarean deliveries. The methodological quality of the study was good with 8/9 points. The second study published in 2001 was prospective and involved 487 women. The study compared a control group, an “at risk” but normal OGTT group, and a treated GDM group.134 There were no significant differences between groups in preeclampsia or birthweight. There were significantly more cesarean deliveries in the normal OGTT compared with the control group although the comparison did not control for age and BMI (women in the normal OGTT group were older and more obese). The methodological quality was fair scoring 6/9 points.

One study137 conducted in Malaysia used 1999 WHO criteria on a 2 hr OGTT in conjunction with a 50 g OGCT. As WHO criteria rarely utilize an OGCT, this study did not provide comparable groups for pooled analysis as they were based upon OGCT test results. The study found significantly more cases of cesarean delivery, postpartum hemorrhage, and macrosomia (>4,000 g) among OGCT-positive versus OGCT-negative women.

A study conducted in Turkey between 2003 and 2009 employed CC criteria on a 50 g OGCT as well as a 3 hr, 100 g OGTT.94 Groups were determined according to abnormal fasting, 1 hr, 2 hr, and 3 hr glucose values, which did not provide comparison to included studies. The study did not find a significant difference between groups in mean neonatal birthweight. There were significantly more cases of macrosomia (>4,000 g) among women with increased serum glucose at 2 hours.

Maternal Outcomes

Short Term

A summary of the evidence for short-term maternal outcomes is provided in Table 12. A summary of the strength of evidence is in Table 13. The sections that follow describe the results by outcome.

Preeclampsia

Ten studies presented data on preeclampsia (Table 12).81,82,88-90,103,133,149,155,160 Definitions of preeclampsia were only reported in two of the ten studies, and the definitions differed. Three studies compared women who met CC criteria for GDM with women who had no GDM and found a significant difference with fewer cases among the no GDM group (Figure 13).81,89,160 Two studies compared women who met CC criteria for GDM with women who were false positive and demonstrated a significant difference with fewer cases in the false-positive group (Figure 14).90,160 The strength of evidence for these two comparisons was low. The following three comparisons showed no differences between groups: 1 abnormal OGTT by NDDG versus no GDM (1 study),103 false positive NDDG versus no GDM (2 studies, Figure 15),82,88 and IGT by WHO criteria versus no GDM (3 studies, Figure 16).133,149,155 The strength of evidence for these three comparisons was insufficient.

Figure 13 is a metagraph comparing the rates of preeclampsia in women with a CC GDM diagnosis with those who do not have GDM. Three studies contributed data from 17,380 patients. The pooled result showed a statistically significant difference with fewer cases of preeclampsia in the no GDM group (RR=1.50, 95% CI 1.07, 2.11; I2=0%).

Figure 13

CC GDM versus no GDM: preeclampsia. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel

Figure 14 is a metagraph comparing the rates of preeclampsia in women with a CC GDM diagnosis to those who screened false-positive for GDM. Two studies contributed data from 4,272 patients. The pooled result showed a significant difference in favor of the false-positive group (RR=1.51, 95% CI 1.17, 1.93; I2=0%).

Figure 14

CC GDM versus false positive: preeclampsia. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel

Figure 15 is a metagraph depicting rates of preeclampsia in women with a false-positive NDDG diagnosis in comparison to those with no GDM. Two studies contributed data from 3,583 patients. The pooled result showed no significant difference between groups (RR=1.10, 95% CI 0.67, 1.83; I²=0%).

Figure 15

NDDG false positive versus no GDM: preeclampsia. CI = confidence interval; GDM = gestational diabetes mellitus, NDDG = National Diabetes Data Group; M-H = Mantel-Haenszel

Figure 16 is a metagraph depicting rates of preeclampsia in women with a diagnosis of impaired glucose tolerance by WHO criteria with no GDM. Three studies contributed data from 3,903 patients. The pooled result showed no significant difference between groups (RR 1.47, 95% CI 0.62, 3.52; I²=63%).

Figure 16

WHO impaired glucose tolerance versus no GDM: preeclampsia. CI = confidence interval; GDM = gestational diabetes mellitus, IGT = impaired glucose tolerance; M-H = Mantel-Haenszel; WHO = World Health Organization

Maternal Hypertension

Nine studies presented data on maternal hypertension (Table 12).78,80,90,92,93,102,106,133,163 Four studies compared women who met CC criteria for GDM with women without GDM and showed significantly fewer cases in the no GDM group (Figure 17).92,93,102,163 Two studies comparing women who met CC criteria for GDM with women who were false positive showed a significant difference with fewer cases in the false-positive group (Figure 18).90,102 Two studies compared one abnormal OGTT by CC criteria with no GDM and showed a significant difference with fewer cases in the group with no GDM (Figure 19).80,106 No differences were found for the following comparisons: CC false positive versus no GDM (1 study),102 WHO IGT versus no GDM (1 study),133 and IADPSG GDM versus no GDM (1 study).93 A single study of IADPSG criteria78 made comparisons across six different groups and found significant differences for: IADPSG IFG versus no GDM, IADPSG double impaired glucose tolerance (IGT-2) versus no GDM, IADPSG IGT IFG versus no GDM (all favoring no GDM); IADPSG IGT versus IGT IFG (favoring IGT); and IADPSG IFG versus IGT IFG (favoring IFG).

Figure 17 is a metagraph comparing rates of maternal hypertension between women with CC GDM and those without a diagnosis of GDM. Four studies contributed data from 20,023 patients. The pooled result showed a significant difference in favor of the no GDM group (RR 1.64, 95% CI 1.11, 2.42; I²=45%).

Figure 17

CC GDM versus no GDM: maternal hypertension. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel

Figure 18 is a metagraph comparing maternal hypertension rates between women with CC GDM and women with a false-positive diagnosis of GDM. Two studies contributed data from 5,678 patients. The pooled result showed a significant difference in favor of the false positive group (RR 1.53, 95% CI 1.11, 2.11; I²=0%).

Figure 18

CC GDM versus false positive: maternal hypertension. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel

Figure 19 is a metagraph depicting rates of maternal hypertension between women with 1 abnormal value on an OGTT and those with no GDM. Two studies contributed data from 1,015 patients. The pooled result showed a significant difference in favor of the no GDM group (RR 2.96, 95% CI 1.84, 4.77; I²=0%).

Figure 19

CC 1 Abnormal OGTT versus no GDM: maternal hypertension. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel ; OGTT = oral glucose tolerance test

Cesarean Delivery

Twenty-six studies presented data for cesarean delivery (Table 12).67,78,80,81,83,85-90,92,93,102,103,132,133,135,145,146,149,150,152,154,155,160 Nine studies compared CC GDM with no GDM and found a significant difference with fewer cases for the no GDM group (Figure 20).81,86,89,92,93,102,146,150,160 Four studies compared CC GDM with false-positive results and showed significantly fewer cases in the false-positive group (Figure 21).90,102,150,160 Four studies compared CC 1 abnormal OGTT versus no GDM and found fewer cases in the group with no GDM (Figure 22).80,86,106,135 Five studies compared CC false positives with no GDM and found fewer events among patient groups with no GDM (Figure 23).87,102,145,150,160 One study compared NDDG with 1 abnormal OGTT with women without GDM and found fewer events for the no GDM group.103 Four studies comparing NDDG false positives versus no GDM showed a significant difference with fewer events for the no GDM group (Figure 24).67,88,132,152 Four studies compared WHO impaired glucose tolerance with no GDM, a significant difference was found in favor of the no GDM group (Figure 25).133,149,154,155 One study compared IADPSG IFG versus no GDM, and the same study compared IADPSG IGT IFG versus no GDM with both showing significant differences with fewer cases in the no GDM group.78 There were no differences between groups for the remaining comparisons (Table 12; Figure 26).

Figure 20 is a metagraph depicting rates of cesarean delivery between women with a CC GDM diagnosis and no GDM. Nine studies contributed data from 51,740 patients. The pooled result showed a significant difference in favor of the no GDM group (RR 1.32, 95% CI 1.17, 1.48; I²=63%).

Figure 20

CC GDM versus no GDM: cesarean delivery. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel

Figure 21 is a metagraph comparing rates of cesarean delivery between groups of women with a CC GDM diagnosis and women with a false-positive screen for GDM. A total of four studies contributed data from 7,593 patients. Pooled results showed a significant difference in favor of the false-positive group (RR 1.16, 95% CI 1.05, 1.29; I² =0%).

Figure 21

CC GDM versus false positive: cesarean delivery. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel

Figure 22 is a metagraph comparing rates of cesarean delivery between groups of women with 1 abnormal OGTT value by CC criteria and women with no GDM. Four studies contributed data from 7,124 patients. Pooled results showed a significant difference in favor of the no GDM group (RR 1.40, 95% CI 1.21, 1.63; I²=0%).

Figure 22

CC, 1 abnormal OGTT versus no GDM: cesarean delivery. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, IGT = impaired glucose tolerance; M-H = Mantel-Haenszel ; OGTT = oral glucose tolerance test

Figure 23 is a metagraph comparing rates of cesarean delivery between false-positive women and women with no GDM. Five studies contributed data from 20,849 patients. Pooled results showed a significant difference between groups with fewer events among the group with no GDM (RR 1.15, 95% CI 1.07, 1.23; I²=0%).

Figure 23

CC false positive versus no GDM: cesarean delivery. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel

Figure 24 is a metagraph depicting cesarean delivery rates between women with a false-positive NDDG screen and women with no GDM. Four studies contributed data from 4,501 patients. Pooled results showed a significant difference favoring the no GDM group (RR 1.17, 95% CI, 1.08, 1.28; I²=0%).

Figure 24

NDDG false-positive versus no GDM: cesarean delivery. CI = confidence interval; GDM = gestational diabetes mellitus, M-H = Mantel-Haenszel; NDDG = National Diabetes Data Group

Figure 25 is a metagraph depicting cesarean delivery rates in women with impaired glucose tolerance by WHO criteria and women with no GDM. Four studies contributed data from 8,560 patients. Pooled results demonstrated a significant difference in favor of the no GDM group (RR 1.18, 95% CI, 1.01, 1.37; I²=22%).

Figure 25

WHO impaired glucose tolerance versus no GDM: cesarean delivery. CI = confidence interval; GDM = gestational diabetes mellitus, IGT = impaired glucose tolerance; M-H = Mantel-Haenszel; WHO = World Health Organization

Figure 26 is a metagraph comparing cesarean delivery rates in women with 1 abnormal OGTT value by CC criteria with women obtaining a false-positive screening result. Two studies contributed data from 529 patients, but results were not pooled due to substantial heterogeneity (I²=79%). The results for the individual studies were: Kwik 2007 RR 0.95 (95% CI, 0.69, 1.31) and Lapolla 2007 RR 1.60 (95% CI, 1.14, 2.25).

Figure 26

CC, 1 abnormal OGTT versus false positive: cesarean delivery. CC = Carpenter-Coustan; CI = confidence interval; M-H = Mantel-Haenszel; OGTT = Oral glucose tolerance test

Birth Trauma

Three studies presented data for maternal birth trauma (Table 12).81,90,152 Two studies employed CC GDM and compared with no GDM and a false-positive group, respectively.81,90 In both studies birth trauma was defined as third or fourth degree perineal laceration. Neither study found a significant difference between groups. One study compared unrecognized NDDG GDM with no GDM and showed no difference in rectal injury between groups.152

Weight Gain

Three studies presented data for maternal weight gain (Table 12).78,135,155 One study compared 1 abnormal glucose tolerance value by CC criteria with no GDM and found no difference between groups.135 One study compared impaired glucose tolerance by WHO criteria with no GDM; no significant difference was found between groups.155 One study compared varying degrees of glucose intolerance by IADPSG criteria.78 Significantly less weight gain was found in the IGT, IFG, and IGT-2 groups in comparison with no GDM. No significant differences were noted between any other IADPSG glucose tolerance groups.

Maternal Morbidity/Mortality

Two studies presented data for maternal mortality or morbidity (Table 12).93,135 One study compared CC GDM as well as IADPSG GDM with no GDM.93 No significant difference was found between the CC and no GDM groups, while a significant difference favoring no GDM was found in comparison with the IADPSG group. One study compared one abnormal glucose value by CC criteria with no GDM, with no significant difference noted between groups.135

Long Term

No studies provided data on long-term maternal outcomes, such as type 2 diabetes mellitus, obesity and hypertension.

Fetal/Neonatal/Child Outcomes

Short Term

A summary of the evidence for short and long term fetal, neonatal, and child outcomes is found in Table 14. The strength of evidence is presented in Table 15. The sections that follow describe the results by outcome.

Macrosomia (>4,000 g)

Twenty-one studies presented data for macrosomia (over 4,000 g) (Table 14).79,80,84-86,88-90,92,93,102,106,132,133,135,136,145,146,150,152,160 There were significantly fewer cases of macrosomia in the patient groups with no GDM compared with CC GDM (10 studies, Figure 27).86,89,92,93,102,132,136,146,150,160 CC 1 abnormal OGTT (7 studies, Figure 28),80,86,106,132,135,136,145 NDDG GDM (1 study),152 NDDG false positives (4 studies, Figure 29),83,86,88,132 and WHO IGT (1 study).133 Significantly fewer cases of macrosomia were observed among women with false-positive results compared with CC GDM (5 studies, Figure 30).90,102,132,150,160 There was no significant difference in other comparisons involving other CC groups (Figure 31, Figure 32, Figure 33). One study compared WHO GDM with no GDM; no significant difference was observed between groups.84 Two studies compared women who met IADPSG criteria for GDM with a no GDM group; no difference was observed between groups (Figure 34).79,93 The strength of evidence for this outcome was low to insufficient due to risk of bias (all observational studies), inconsistency across studies, and/or imprecision in effect estimates (Table 15).

Figure 27 is a metagraph comparing rates of macrosomia in women with a CC GDM diagnosis and women with no GDM. Ten studies contributed data from 42,874 patients. Pooled results demonstrated a significant difference in favor of women with no GDM (RR 1.61, 95% CI, 1.35, 1.92; I²=42%).

Figure 27

CC GDM versus no GDM: macrosomia (>4,000 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 28 is a metagraph comparing rates of macrosomia in women with 1 abnormal OGTT value with women without GDM. Seven studies contributed data from 16,063 patients. Pooled results demonstrated a significant difference in favor of the no GDM group (RR 1.44, 95% CI, 1.13, 1.82; I²=14%).

Figure 28

CC, 1 abnormal OGTT versus no GDM: macrosomia (>4,000 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; OGTT = oral glucose tolerance test

Figure 29 is a metagraph comparing rates of macrosomia between women with NDDG false-positive results and women with no GDM. Four studies contributed data from 4,501 women. Pooled results showed a significant difference in favor of no GDM (RR 1.44, 95% CI, 1.10, 1.89). Statistical heterogeneity was I²=0% and Chi2 p=0.95.

Figure 29

NDDG false positive versus no GDM: macrosomia (>4,000 g). CI = confidence interval; GDM = gestational diabetes mellitus; NDDG = National Diabetes Data Group; M-H = Mantel-Haenszel

Figure 30 is a metagraph depicting rates of macrosomia in groups of women with CC GDM and false-positive diagnoses. Five studies contributed data from 8,241 patients. Pooled results demonstrated a significant difference in favor of the false-positive group (RR 1.36, 95% CI, 1.10, 1.68; I²=45%).

Figure 30

CC GDM versus false positive: macrosomia (>4,000 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 31 is a metagraph comparing rates of macrosomia between groups of women with CC GDM and women with 1 abnormal OGTT value. Three studies contributed data from 1,101 women. Pooled results showed no significant difference between groups (RR 0.98, 95% CI, 0.69, 1.41; I²=0%).

Figure 31

CC GDM versus 1 Abnormal OGTT: macrosomia (>4,000 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; OGTT = oral glucose tolerance test

Figure 32 is a metagraph depicting rates of macrosomia between women with a false-positive test result and women with no GDM. Five studies contributed data from 14,852 women. Pooled results demonstrated no significant difference between groups (RR 1.02, 95%CI, 0.85, 1.24; I²=31%).

Figure 32

CC false positives versus no GDM: macrosomia (>4,000 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 33 is a metagraph depicting rates of macrosomia between groups of women with 1 abnormal value on the OGTT and women with a false-positive screening test. Three studies contributed data from 1,873 patients. Pooled results showed no significant difference between groups (RR 1.40, 95% CI, 0.89, 2.20; I²=48%).

Figure 33

CC, 1 Abnormal OGTT versus False positives: macrosomia (>4,000 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 34 is a metagraph depicting macrosomia rates between women meeting criteria for IADPSG GDM and those without GDM. Two studies contributed data from 2,130 women. Pooled results showed no significant difference between groups (RR 2.09, 95%CI, 0.39, 11.33; I²=39%).

Figure 34

IADPSG GDM versus No GDM: macrosomia (>4,000 g). CI = confidence interval; GDM = gestational diabetes mellitus; IADPSG = International Association of the Diabetes in Pregnancy Study Groups; M-H = Mantel-Haenszel

Macrosomia (>4,500 g)

Four studies presented data on macrosomia (over 4,500 g) (Table 14).81,150,152,160 Three studies showed a significant difference favoring the group with no GDM compared with CC GDM (Figure 35). The strength of evidence for this finding was low. No significant difference was found for CC GDM compared with false positives (2 studies; Figure 36) and CC false positives versus groups with no GDM (2 studies; Figure 37). One study compared NDDG GDM with a no GDM group, and found a significant difference in favor of the no GDM group.152 The strength of evidence for these three findings was insufficient (Table 15).

Figure 35 is a metagraph comparing macrosomia rates >4500g between women meeting CC GDM criteria and women with no GDM. Three studies contributed data from 21,549 patients. Pooled results demonstrated a significant difference in favor of the no GDM group (RR 2.52, 95% CI, 1.65, 3.84; I²=0%).

Figure 35

CC GDM versus no GDM: macrosomia (>4,500 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 36 is a metagraph comparing macrosomia rates >4500g between women meeting CC GDM criteria and women with a false-positive screen test. Two studies contributed data from 1,391 patients. Pooled results showed no significant difference between groups (RR 1.71, 95% CI, 0.56, 5.24; I²=63%).

Figure 36

CC GDM versus false positive: macrosomia (>4,500 g). CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 37 is a metagraph comparing rates of shoulder dystocia between women meeting CC GDM criteria and those without GDM. Five studies contributed data from 27,473 patients. Pooled results showed a significant difference in favor of the no GDM group (RR 2.86, 95% CI, 1.81, 4.51; I²=0%).

Figure 37

CC GDM versus no GDM: shoulder dystocia. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Shoulder Dystocia

Twelve studies presented data on shoulder dystocia (Table 14).54,78,81,85,88-90,92,106,133,146,152 Five studies compared women who met CC criteria for GDM with no GDM and found a significant difference in favor of the no GDM group (Figure 37); the strength of evidence was rated low (Table 15).81,89,92,146,163 One study compared CC GDM with a false-positive group, no significant difference was noted.90 One study compared one abnormal OGTT by CC criteria with no GDM and no significant difference was found between groups.106 One study compared women with 1 abnormal OGTT value by CC criteria with a false-positive group with a significant difference noted in favor of the false-positive group.85 One study compared unrecognized GDM by NDDG criteria with a no GDM group;152 another study compared a false-positive group with no GDM.88 Both studies noted a significant difference in favor of the groups with no GDM. A single study compared IGT by WHO criteria and no GDM; a significant difference was found in favor of group with no GDM.133 One study compared varying degrees of glucose intolerance by IADPSG criteria and no GDM;78 significant differences were observed when no GDM was compared with IFG and IGT and fasting glucose combined. No GDM was favored in both cases. The remaining groups demonstrated no significant differences (Table 14). The strength of evidence for all comparisons based on single studies was rated insufficient (Table 15).

Clavicular Fracture

No studies provided comparable data on clavicular fracture. However, this outcome was often a composite outcome within birth injury or fetal birth trauma.

Brachial Plexus Injury

No studies provided comparable data on brachial plexus injury, also often a composite of birth injury or fetal birth trauma.

Fetal Birth Trauma or Birth Injury

Four studies presented data for fetal birth trauma or traumatic delivery (Table 14).81,149,152,155 Birth trauma was undefined in two studies,149,155 one comparing WHO IGT with no GDM. Another defined birth trauma as a composite of brachial plexus injury, facial nerve palsy, clavicular fracture, skull fracture, and head laceration; this study compared CC GDM and no GDM.81 No significant difference was observed in any comparison. Brachial plexus injury, cranial nerve palsy, and clavicular facture were also components of birth trauma in one study.152 This study compared women with unrecognized NDDG GDM and no GDM and showed a significant difference in favor of the no GDM group. Strength of evidence for all comparisons was insufficient.

Hypoglycemia

Twelve studies presented data on neonatal hypoglycemia (Table 14).67,80,86,89,103,106,133,135,146,149,152,155 Two studies did not define hypoglycemia,67,125 while all other studies defined hypoglycemia with varying glucose threshold criteria or by necessity of intravenous glucose. Three studies compared women meeting CC criteria for GDM with groups without GDM. Results were not pooled due to substantial heterogeneity across studies (I2=94%) (Figure 38); however, all three studies individually showed fewer cases of hypoglycemia among the patient groups with no GDM.86,89,146 The difference in results may be explained in part by the methods of assessing for neonatal hypoglycemia (e.g., biochemical vs. clinical). Posthoc analysis showed that the magnitude of association between glucose intolerance and neonatal hypoglycemia was affected by the definition used (i.e., clinical or biochemical). Many of the observational studies included did not routinely apply the same biochemical screening procedure to the non-GDM groups and glucose intolerant women. No significant difference was found for remaining comparisons. One study compared women meeting CC criteria for GDM with women demonstrating one abnormal OGTT value,86 and four studies compared women meeting CC criteria on one abnormal OGTT value with no GDM (Figure 39).80,86,106,135 One study compared women who met NDDG criteria for GDM with no GDM,152 one study compared NDDG false positive with no GDM,67 and another study compared NGGD 1 abnormal OGTT versus no GDM.103 Three studies compared women meeting WHO criteria for IGT with no GDM (Figure 40).133,149 Strength of evidence for all comparisons was insufficient.

Figure 38 is a metagraph comparing rates of hypoglycemia among women with a GDM diagnosis by CC criteria and those without GDM. Three studies contributed data from 7,966 patients, results were not pooled due to substantial heterogeneity (I²=94%). Results for the individual studies were: Chico 2005 RR 1.56 (95%CI, 1.02, 2.37), Langer 2005 RR 9.52 (95%CI, 6.02, 15.08), and Pennison 2001 RR 3.21 (95%CI, 1.18, 8.76).

Figure 38

CC GDM versus no GDM: hypoglycemia. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 39 is a metagraph comparing the rates of hypoglycemia among women with 1 abnormal value on an OGTT compared to those with no GDM. Four studies contributed data from 7,124 women. Pooled results showed no significant difference between groups (RR 1.29, 95% CI, 0.88, 1.91; I²=0%).

Figure 39

CC, 1 abnormal OGTT versus no GDM: hypoglycemia. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; OGTT = oral glucose tolerance tests

Figure 40 is a metagraph comparing the rates of hypoglycemia between women with impaired glucose tolerance by WHO criteria and those with no GDM. Three studies contributed data from 3,895 patients. Pooled results showed no significant difference between groups (RR 1.00, 95% CI, 0.49, 2.07; I²=0%).

Figure 40

WHO impaired glucose tolerance versus no GDM: hypoglycemia. CI = confidence interval; GDM = gestational diabetes mellitus; IGT = impaired glucose tolerance; M-H = Mantel-Haenszel; WHO = World Health Organization

Hyperbilirubinemia

Eight studies presented data for hyperbilirubinemia or neonatal jaundice (Table 14).67,78,86,87,106,133,146,149 Plasma bilirubin values for the diagnosis of hyperbilirubinemia varied amongst studies. Of the seven studies, four studies compared differing CC criterion, including CC GDM with no GDM (Figure 41),86,146 CC GDM and one abnormal OGTT,86 CC 1 abnormal OGTT and no GDM,106 and CC false positive and no GDM.87 Results for CC GDM versus no GDM were not pooled due to substantial statistical heterogeneity (I2=94%). Possible sources of heterogeneity include differences in assessing outcomes across studies and uncontrolled differences between comparison groups. CC false positive versus no GDM showed a significant difference with fewer cases in the group with no GDM. The other comparison involving CC criteria (CC GDM vs. 1 abnormal OGTT) showed no significant difference between groups. One study compared women with a false-positive result by NDDG criteria with no GDM; no significant difference was found.67 Two studies compared women meeting WHO criteria for IGT with no GDM; no significant difference was found (Figure 42).133,149 One study compared various IADPSG thresholds for glucose intolerance.78 A significant difference was present in comparisons of IADPSG isolated (1 value above threshold) IGT and double-isolated (two values above threshold) IGT with no GDM, both favoring the no GDM group. No further differences were observed for any other IADPSG comparisons.

Figure 41 is a metagraph comparing rates of hyperbilirubinemia in offspring of women with a CC GDM diagnosis to women with no GDM. Two studies contributed data from 7,854 patients, results were not pooled due to substantial heterogeneity (I²= 94%). Results of the individual studies were: Chico 2005 RR 1.61 (95%CI, 0.99, 2.64) and Langer 2005 RR 6.78 (95%CI, 4.31, 10.68).

Figure 41

CC GDM versus no GDM: hyperbilirubinemia. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 42 is a metagraph comparing rates of hyperbilirubinema in offspring in women with impaired glucose tolerance by WHO criteria compared to those with no GDM. Two studies contributed data from 3,491 patients. Pooled results showed no significant difference between groups (RR 0.64, 95% CI, 0.38, 1.10; I²=0%).

Figure 42

WHO impaired glucose tolerance versus no GDM: hyperbilirubinemia. CI = confidence interval; GDM = gestational diabetes mellitus; IGT = impaired glucose tolerance; M-H = Mantel-Haenszel; WHO = World Health Organization

Morbidity/Mortality

Sixteen studies presented data for neonatal mortality or morbidity (Table 14).67,85-88,92,93,102,103,106,135,146,149,150,154,155 No studies demonstrated a significant difference between groups which may be due to small numbers of events within some comparisons. Six studies compared women meeting CC criteria for GDM with no GDM (Figure 43),86,92,93,102,146,150 two studies compared CC GDM with false positives (Figure 44),102,150 and one study compared women with CC GDM and those with one abnormal OGTT.86 Three studies compared one abnormal OGTT to no GDM (Figure 45),86,106,135 three studies compared women with false-positive results by CC criteria with no GDM (Figure 46),87,102,150 and one study compared CC false positive with one abnormal OGTT value.85 Two studies compared women with false-positive results by NDDG criteria with no GDM (Figure 47),67,88 one study compared NDDG 1 abnormal OGTT versus no GDM,103 three studies employed WHO criteria for IGT compared with no GDM (Figure 48),149,154,155 and another study followed IADPSG criteria for GDM diagnosis compared with no GDM.93

Figure 43 is a metagraph depicting rates of morbidity or mortality in offspring born to women with a GDM diagnosis by CC criteria compared to those without GDM. Six studies contributed data from 34,360 patients. Pooled results showed no significant difference between groups (RR 1.23, 95% CI, 0.46, 3.30; I²=40%).

Figure 43

CC GDM versus no GDM: morbidity/mortality. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 44 is a metagraph depicting rates of morbidity or mortality in offspring of women with a CC GDM diagnosis compared to women with a false-positive result by CC criteria. Two studies contributed data from 3,321 participants. Pooled results showed no significant difference between groups (RR 1.83, 95% CI, 0.11, 29.41; I²=49%).

Figure 44

CC GDM versus false positive: morbidity/mortality. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 45 is a metagraph comparing rates of morbidity or mortality in offspring born to women with 1 abnormal OGTT to those without GDM. Three studies contributed data from 6,348 patients. Pooled results demonstrated no significant difference between groups (RR 1.03, 95% CI, 0.61, 1.72; I²=0%).

Figure 45

CC, 1 abnormal OGTT versus no GDM: morbidity/mortality. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; OGTT = oral glucose tolerance test

Figure 46 is a metagraph comparing rates of morbidity or mortality in offspring born to women with a CC false-positive result on a screening test to those without GDM. Three studies contributed data from 16,867 patients. Pooled results showed no significant difference between groups (RR 0.80, 95% CI, 0.40, 1.61; I²=0%).

Figure 46

CC false positive versus no GDM: morbidity/mortality. CC = Carpenter-Coustan; CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel

Figure 47 is a metagraph comparing morbidity or mortality rates in offspring born to women with a NDDG false-positive result vs. those without GDM. Two studies contributed data from 2,541 patients. Pooled results showed no significant difference between groups (RR 2.24, 95% CI, 0.70, 7.14; I²=0%).

Figure 47

NDDG false positive versus no GDM: morbidity/mortality. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; NDDG = National Diabetes Data Group

Figure 48 is a metagraph depicting the morbidity or mortality rates in offspring born to women diagnosed with impaired glucose tolerance according to WHO criteria compared to those without GDM. Three studies contributed data from 5,659 patients. Pooled results showed no significant difference between groups (RR 1.42, 95% CI, 0.54, 3.75; I²=0%).

Figure 48

WHO IGT versus no GDM: morbidity/mortality. CI = confidence interval; GDM = gestational diabetes mellitus; IGT = impaired glucose tolerance; M-H = Mantel-Haenszel; WHO = World Health Organization

Long Term

One study presented data on long term health outcomes for infants and children (i.e., prevalence of childhood obesity).132

Prevalence of Childhood Obesity

Significant differences were found between women meeting thresholds for CC GDM in comparison with those without GDM, favoring the no GDM group.132 The CC false-positive group was favored compared with women meeting CC GDM criteria (Table 14). These findings should be interpreted cautiously because this study did not adjust for maternal BMI, one of the most important predictors of childhood obesity. No significant differences were found for the remaining comparisons (Table 14).

Key Question 4. Does treatment modify the health outcomes of mothers who meet various criteria for GDM and offspring?

Description of Included Studies

Eleven studies met the inclusion criteria for Key Question 4.50,54,92,95-98,146,148,152,160 The studies are described in Appendix D. All studies compared diet modification, glucose monitoring and insulin as needed with standard care. Five of the studies were RCTs,50,54,96-98 while six were retrospective cohort studies.92,95,146,148,152,160 The studies were published between 1996 and 2010 (median year 2005). Two studies had two associated publications reporting initial and longer term outcomes.50,54 Five studies were from the United States,54,95,98,146,152 two from Italy,97,148 two from Canada,96,160 and one each from Taiwan92 and Australia.50 The screening test used in most studies was OGCT with a 100 g OGTT assessed using CC criteria, except for the studies from Canada and Australia that used a OGCT with a 75 g OGTT. Diagnostic testing in all studies occurred at or after 24 weeks' gestation. Among these studies a variety of glucose inclusion criteria were used, varying from 50 g screen positive with nondiagnostic OGTTs to women who met National Diabetes Data Group criteria for a diagnosis of GDM. The two largest RCTs50,163 by Crowther et al. and Landon et al. used different glucose thresholds for entry in their trials: WHO and CC criteria with a fasting glucose <95 mg/dL (5.3 mmol/L), respectively; however, the mean glucose levels of women at study entry were remarkably similar between these two studies.

Methodological Quality of Included Studies

The methodological quality of the included studies is described in Appendix C3. The risk of bias for the RCTs was low for one trial,50 unclear for three trials,54,97,98 and high for one trial.96 The trials that were unclear most commonly did not report detailed methods for sequence generation and allocation concealment. The trial assessed as high risk of bias was due to lack of blinding for outcome assessment and incomplete outcome data.

The six cohort studies were all considered high quality, with overall quality scores of 7 to 9 on a 9-point scale. Three studies received full scores of 9.54,152,160 One study received a score of 8/9 because the study population was a selected (non-representative) group (i.e., participants at a diabetic center).148 Two studies received a score of 7/9. One study obtained this score due to the study population considered only “somewhat” representative (all women were cared for under a single health plan); as well as a lack of control for potential confounders including age, race, BMI, previous GDM, or family history of DM.95 The absence of control for any potential confounders was also the reason for the lower score in the second study. 92

Key Points

  • A variety of glucose threshold criteria were used for inclusion across studies. For outcomes where results were inconsistent between studies, different study glucose threshold entry criteria did not explain the variation.
  • Results for some outcomes were driven by the two largest RCTs, the Maternal Fetal Medicine Unit (MFMU)54 and the Australian Carbohydrate Intolerance in Pregnancy Study (ACHOIS),50 which had unclear and low risk of bias, respectively.

Maternal Outcomes

  • There was moderate evidence from 3 RCTs showing a significant difference for preeclampsia with fewer cases in the treated group.
  • There was inconsistency across studies in terms of differences in maternal weight gain (4 RCTs and 2 cohort studies). The strength of evidence was considered insufficient due to inconsistency across studies and imprecision in effect estimates.
  • No differences between groups were found for cesarean section (5 RCTs, 6 cohorts) or unplanned cesarean section (1 RCT, 1 cohort).
  • There was inconsistency across studies in terms of induction of labor with no difference found for the 2 RCTs overall and a significant difference favoring the treatment group among the one cohort study included.
  • Only one RCT reported on BMI at delivery and showed a significant difference with lower BMI in the treated group.
  • Only one cohort study reported maternal birth trauma (i.e., postpartum hemorrhage) and showed no difference between groups.
  • There was no evidence for long-term maternal outcomes such as type 2 diabetes mellitus, obesity, and hypertension.

Short-Term Outcomes in the Offspring

  • There was insufficient evidence for birth injury. There was inconsistency across studies with the 2 RCTs showing no difference and the one cohort study showing a difference in favor of the treated group. The low number of events and participants across all studies resulted in imprecise estimates.
  • The incidence of shoulder dystocia was significantly lower in the treated groups, and this finding was consistent for the 3 RCTs and 4 cohort studies. Overall, the evidence for shoulder dystocia was considered moderate showing a difference in favor of the treated group.
  • For other injury outcomes, including brachial plexus injury (1 RCT, 1 cohort), and clavicular fractures (1 RCT, 1 cohort), the results were inconsistent across designs with the RCTs showing no differences between groups and the cohort study showing a significant difference in favor of the treated group.
  • There was low evidence of no difference between groups for neonatal hypoglycemia based on four RCTs and 2 cohort studies.
  • For outcomes related to birthweight (including macrosomia >4,000 g, macrosomia >4,500 g, actual birthweight, and large for gestational age), differences were often observed favoring the treated groups. The strength of evidence was moderate for macrosomia >4,000 g suggesting a benefit of treatment.
  • There was no difference in hyperbilirubinemia for the 3 RCTs, while the one cohort study showed a significant difference in favor of the treated group.
  • There were no differences observed across studies for perinatal death (3 RCTs, 3 cohorts). Two RCTs showed no difference between groups for respiratory distress syndrome, while one cohort study found a significant difference favoring the treated group for “respiratory complications.” Several studies assessed APGAR scores, and while differences were found in both the RCT and cohort study for APGAR at 1 minute, no differences were found among the 2 RCTs and 1 cohort study at 5 minutes.

Long-Term Outcomes in the Offspring

  • One RCT followed patients for 7 to 11 years and found no differences for impaired glucose tolerance or type 2 diabetes mellitus, although the strength of evidence was considered insufficient.
  • No differences were observed in single studies that assessed BMI >95 (7-11 year followup) and BMI >85 percentile (5-7 year followup). Overall, pooled results showed no difference in BMI and the strength of evidence was considered low.

Detailed Synthesis

Detailed results are described by outcome in the sections that follow. We first describe the maternal outcomes, followed by fetal/neonatal/child outcomes. We present meta-graphs when two or more studies were pooled. These are displayed after the description of results for each outcome. A detailed table of results is presented at the end of each of the maternal and fetal/neonatal/child sections (Table 16 and Table 17, respectively). The strength of evidence for key outcomes is presented in Table 18.

Table 16. Evidence summary for Key Question 4: maternal outcomes.

Table 16

Evidence summary for Key Question 4: maternal outcomes.

Table 17. Evidence summary for Key Question 4: infant outcomes.

Table 17

Evidence summary for Key Question 4: infant outcomes.

Table 18. Strength of evidence for Key Question 4: maternal and infant outcomes.

Table 18

Strength of evidence for Key Question 4: maternal and infant outcomes.

Maternal Outcomes

Short Term
Cesarean Delivery

All studies provided data on cesarean delivery (Table 16).50,54,92,95-98,146,148,152,160 There was no significant difference in the pooled estimates for the RCTs (risk ratio [RR] 0.90, 95% CI 0.79 to 1.01, n = 2,613) or for the cohort studies (RR 1.09, 95% CI 0.90 to 1.31, n = 3,110; Figure 49). The results were statistically homogeneous across all studies. One RCT50 and one cohort study95 reported emergency cesarean deliveries and found no difference (RCT, RR 0.81, 95% CI 0.62 to 1.05, n = 1,000; cohort, RR 0.83, 95% CI 0.33 to 2.06, n = 126).

Figure 49 is a metagraph comparing cesarean delivery rates between women treated for GDM and those left untreated. Five RCTs contributed data from 2,613 patients, pooled results showed no significant difference between groups (RR 0.90, 95%, 0.79, 1.01; I²=0%). Additionally, six cohort studies contributed data from 3,110 patients. Pooled results demonstrated no significant difference between groups (RR 1.09, 95% CI, 0.90, 1.31; I²=23%).

Figure 49

Effect of treatment on outcomes of women with GDM: cesarean delivery. CI = confidence interval; GDM = gestational diabetes mellitus; RCT = randomized controlled trial; M-H = Mantel-Haenszel

Induction of Labor

Three studies provided data on induction of labor50,54,146 but results differed significantly across the studies (Table 16). Two RCTs showed no significant difference overall (RR 1.16, 95% CI 0.91 to 1.49, n = 1,931), although there was important statistical heterogeneity between studies (I2 = 69%). One RCT showed a significant difference favoring no treatment,50 while the other study showed no difference (Figure 50).54 Different study protocols may account for the heterogeneity of results between studies. In the study that showed more inductions of labor in the treatment group, no recommendations were provided regarding obstetrical care, thus replicating usual clinical care of women with GDM. In the other study, antenatal surveillance was reserved for standard obstetrical indications. In contrast the one cohort study showed a significant difference with fewer inductions in the treatment group (RR 0.63, 95% CI 0.55 to 0.72, n = 1,665).146 Baseline differences in the study populations and regional practices may have accounted for the different results between studies. Further, the comparison group in the cohort study was women who presented late for obstetrical care which confounds the relationship between induction and GDM treatment. Furthermore, the cohort study protocol was to deliver these women within one week of GDM diagnosis so the outcome of induction was substantially confounded by different delivery protocols between treatment and nontreatment groups.

Figure 50 is a metagraph comparing rates of induction of labor between women treated for GDM vs. those left untreated. Two RCTs contributed data from 1,931 participants. Pooled results showed no significant difference between groups (RR 1.16, 95% CI, 0.91, 1.49; I²=69%). One cohort study contributed data from 1,665 patients and showed a significant difference favoring the treated group (RR 0.63, 95% CI, 0.55, 0.72).

Figure 50

Effect of treatment on outcomes of women with GDM: induction of labor. CI = confidence interval; GDM = gestational diabetes mellitus; RCT = randomized controlled trial; M-H = Mantel-Haenszel

Preeclampsia

Three RCTs and one cohort study provided data on preeclampsia (Table 16).50,54,98,160 Pooled estimate for the RCTs showed a significant difference favoring the treated group (RR 0.62; 95% CI, 0.43 to 0.89, n = 2,014) with minimal statistical heterogeneity across studies (I2 = 16%; Figure 51). The strength of evidence was considered moderate (Table 18). One of the studies also reported preeclampsia or gestational hypertension as a combined outcome,54 and also showed a significant difference favoring the treatment group (RR 0.63; 95% CI, 0.44 to 0.92, n = 931). In all three trials, there was no significant difference between groups in gestational age at delivery.

Figure 51 is a metagraph comparing rates of preeclampsia between women treated for GDM and those who were not. Three RCTs contributed data from 2,014 participants. Pooled results favored treatment (RR 0.62, 95% CI, 0.43, 0.89; I²=16%). In addition, one cohort study contributed data from 258 patients, demonstrating no significant difference between groups (RR 0.97, 95% CI, 0.43, 2.15).

Figure 51

Effect of treatment on outcomes of women with GDM: preeclampsia. CI = confidence interval; GDM = gestational diabetes mellitus; RCT = randomized controlled trial; M-H = Mantel-Haenszel

Birth Trauma

One study provided data on maternal birth trauma (postpartum hemorrhage).92 No significant difference was observed between groups (Table 16).

Weight Gain

Six studies provided data on weight gain (Table 16).50,54,95-97,152 Pooled results for the RCTs are not presented due to substantial heterogeneity (I2=88%). Two RCTs showed no significant difference,96,97 while two large RCTs showed a significant difference with less weight gain in the treatment group (Figure 52).50,54 Given the high BMIs of the women studied in these large RCTs, less gestational weight gain in the treatment group could be interpreted as a beneficial finding. Pooled results for the cohort studies showed no significant difference between groups (mean difference [MD] -1.04; 95% CI, -2.89 to 0.81, n = 515). The strength of evidence was considered insufficient for this outcome (Table 18).

Figure 52 is a metagraph comparing weight gain (continuous) in women treated for GDM vs. those who were not. Four RCTs contributed data from 2,530 participants, results were not pooled due to substantial heterogeneity (I²=88%). Results for the individual studies were: Bonomo 2005 MD 0.50 (95%CI, -0.43, 1.43), Crowther 2005 MD -1.70 (95% CI, -1.74, -1.66), Garner 1997 MD 0.83 (95%CI -4.97, 3.31), and Landon 2009 MD -2.20 (95%CI -2.71, -1.69). Two cohort studies contributed data from 515 patients. Results for individual studies are Adams 1998 -1.98 (95% CI -4.49, 0.53) and Fassett 2007 -0.09 (95% CI -2.61, 2.43). The metagraph does not show the pooled results. The pooled results are reported in the text and showed no significant difference between groups (MD -1.04, 95% CI, -2.89, 0.81; I²=8%).

Figure 52

Effect of treatment on outcomes of women with GDM: weight gain. CI = confidence interval; GDM = gestational diabetes mellitus; IV = inverse variance; RCT = randomized controlled trial; SD = standard deviation

BMI at Delivery

Only one RCT reported BMI at delivery and showed a significantly lower BMI in the treated group compared with the untreated group (mean BMI 31.3 vs. 32.3; mean difference -1.00; 95% CI, -1.67 to -0.33, n = 931) (Table 16).54

Fetal/Neonatal/Child Outcomes

Short Term
Birthweight

All studies reported birthweights for the infants (Table 17).50,54,92,95-98,146,148,152,160 Pooled estimate for the RCTs showed significantly lower incidence of birthweights >4,000 g among infants in the treated groups (RR 0.50, 95% CI, 0.35 to 0.71; Figure 53); however, there was moderate heterogeneity across studies. Pooled estimates were not reported for the cohort studies because of substantial heterogeneity (I2=86%). Three of the studies96,152,160 also reported the incidence of birthweights >4,500 g and showed no significant differences between groups. In terms of actual birthweight (Figure 54), the five RCTs showed significantly lower mean birthweights among the treated group (MD -120.8; 95% CI, -163.4 to -78.2, n = 2,670). The two cohort studies showed substantial heterogeneity with one study showing a significantly lower mean birthweight in the treated group and the second cohort study showing no difference between groups.

Figure 53 is a metagraph comparing the number of macrosomic (>4,000g) infants born to women treated for GDM with those who were not. Five RCTs contributed data from 2,643 participants; pooled results significantly favored the treatment group (RR 0.50, 95% CI, 0.35, 0.71; I²=50%). Six cohort studies contributed data from 3,426 participants. Results were not pooled due to substantial heterogeneity (I²=86%). Results for the individual studies were: Adams 1998 MD 0.40 (95% CI, 0.22, 0.73), Bonomo 1997 RR 0.14 (95% CI, 0.01, 2.35), Chou 2010 RR 2.08 (95% CI, 1.24, 3.47), Fassett 2007 RR 1.03 (95% CI, 0.44, 2.44), Langer 2005 RR 0.42 (95% CI, 0.32, 0.56), Naylor 1996 RR 0.37(95% CI, 0.21, 0.64).

Figure 53

Effect of treatment on outcomes for offspring of women with GDM: birthweight >4,000 g. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Figure 54 is a metagraph comparing the effect on birthweight as a continuous outcome between women treated for GDM and those who were untreated. Five RCTs contributed data from 2,670 patients. Pooled results significantly favored the treatment group (MD -120.81, 95% CI, -163.40, -78.23; I²=2%). Two cohort studies contributed data from 515 participants; however results were not pooled due to substantial heterogeneity (I²=77%). Results for the individual studies were: Adams 1998 MD -354.8 (95%CI, -708.25, -1.35) and Fassett 2007 MD 87.30 (95%CI, -126.21, 300.81).

Figure 54

Effect of treatment on outcomes for offspring of women with GDM: birthweight (continuous). CI = confidence interval; GDM = gestational diabetes mellitus; IV = inverse variance; RCT = randomized controlled trial; SD = standard deviation

Large for Gestational Age (LGA)

There was a significant difference in LGA with the treatment group having fewer cases among both the three RCTs50,54,97 (RR 0.56; 95% CI, 0.45 to 0.69, n = 2,261) and the four cohort studies (RR 0.43; 95% CI, 0.27 to 0.70, n = 2,294) (Table 17).95,148,152,152 The results for the cohort studies showed moderate statistical heterogeneity (I2 = 58%) (Figure 55). One study appeared to be an outlier,95 and when removed from the analysis there was no heterogeneity.

Figure 55 is a metagraph comparing large for gestational age births between treated and untreated women. Three RCTs contributed data from 2,261 patients. Pooled results significantly favored the treatment group (RR 0.56, 95% CI, 0.45, 0.69; I²=0%). Four cohort studies contributed data from 2,294 patients; pooled results significantly favored the treatment group (RR 0.43, 95% CI, 0.27, 0.70; I²=58%).

Figure 55

Effect of treatment on outcomes for offspring of women with GDM: large for gestational age (LGA). CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Shoulder Dystocia

Seven studies provided data on shoulder dystocia (Table 17).50,54,92,95,98,146,152 Pooled estimates from three RCTs50,54,98 showed a significant difference favoring the treated group (RR 0.42; 95% CI, 0.23 to 0.77, n = 2,044; (Figure 56). The four cohort studies92,95,146,152 also showed a significant difference favoring the treated group (RR 0.38; 95% CI, 0.19 to 0.78, n = 3,054). There was no statistical heterogeneity across studies. Overall, the strength of evidence was considered moderate showing a difference in favor of the treated group. Shoulder dystocia was reduced for all studies combined; however, individual studies that included women with milder forms of glucose intolerance (i.e., OGCT screen positive OGTT negative, one RCT 98 and one cohort study95) showed no differences.

Figure 56 is a metagraph depicting the effect of treatment compared to no treatment on the outcome of shoulder dystocia in infants. Three RCTs contributed data from 2,044 patients. Pooled results significantly favored the treatment group (RR 0.42, 95% CI, 0.23, 0.77; I²=0%). Four cohort studies contributed data from 3,054 patients; pooled results significantly favored treatment groups (RR 0.38, 95% CI, 0.19, 0.78; I²=20%).

Figure 56

Effect of treatment on outcomes for offspring of women with GDM: shoulder dystocia. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Brachial Plexus Injury

One RCT50 and one cohort study152 provided data for brachial plexus injury (Table 17). The RCT found no significant difference between groups (RR 0.15; 95% CI, 0.01 to 2.87, n = 1,000), while the cohort study showed a significant difference favoring the treated group (RR 0.04; 95% CI, 0 to 0.66, n = 389).

Clavicular Fracture

The same two studies50,152 reported clavicular fractures with no difference for the RCT 50 (RR 0.35; 95% CI, 0.01 to 8.45, n = 1,030), and a significant difference favoring the treated group in the cohort study152 (RR 0.02; 95% CI, 0 to 0.22, n = 389; Table 17).

Birth Trauma

Three studies reported birth trauma.54,96,152 Birth trauma was defined as brachial plexus palsy or clavicular, humeral, or skull fracture in one study.54 Brachial plexus injury, cranial nerve palsy, and clavicular facture were components of birth trauma in one study.152 In the third study birth trauma or injury included fractures and neurologic sequelae. One of the RCTs found no incidents in either group;96 the second RCT54 showed no significant difference between groups (RR 0.48; 95% CI, 0.12 to 1.90, n = 1,230; Figure 57). One cohort study showed a significant difference favoring the treated group (RR 0.02; 95% CI, 0.00 to 0.11, n = 389) (Table 17).152 Overall, the strength of evidence was insufficient for this outcome (Table 18).

Figure 57 is a metagraph comparing fetal birth trauma outcomes between women treated for GDM and women not treated. Two RCTs contributed data from 1,230 patients. Pooled results showed no significant difference between groups (RR 0.48, 95% CI, 0.12, 1.90). One cohort study contributed data from 389 patients; results significantly favored the treatment group (RR 0.02, 95% CI, 0.00, 0.11).

Figure 57

Effect of treatment on outcomes for offspring of women with GDM: birth trauma. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Hypoglycemia

Six studies provided data on hypoglycemia (Table 14).50,54,96,97,146,152 The pooled results from four RCTs showed no significant difference between groups (RR 1.18; 95% CI, 0.92 to 1.52, n = 2,367) and no statistical heterogeneity (Figure 58). The two cohort studies showed different results: one study showed no significant difference, while the second study showed a significant difference favoring the treated group (overall RR 0.55; 95% CI, 0.10 to 2.97, n = 2,054). The different results may be due in part to different definitions of hypoglycemia used across the studies. Overall, the strength of evidence was low suggesting no difference between groups in the incidence of hypoglycemia (Table 15).

Figure 58 is a metagraph comparing infant hypoglycemia between women treated for GDM with those who were not. Four RCTs contributed data from 2,367 patients. Pooled results showed no significant difference between groups (RR 1.18, 95% CI, 0.92, 1.52; I²=0%). Two cohort studies contributed data from 2,054 patients. Pooled results showed no significant difference between groups (RR 0.55, 95% CI, 0.10, 2.97; I²=49%).

Figure 58

Effect of treatment on outcomes for offspring of women with GDM: hypoglycemia. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Hyperbilirubinemia

Four studies provided data on hyperbilirubinemia (Table 14).54,96,97,146 Three RCTs showed no significant difference between groups54,96,97 (RR 0.79; 95% CI, 0.56 to 1.10, n = 1,467), while one cohort study showed a significant difference favoring the treated group146 (RR 0.26; 95% CI, 0.18 to 0.37, n = 1,665; Figure 59).

Figure 59 compares rates of hyperbilirubinemia between women treated for GDM and women not treated. Three RCTs contributed data from 1,467 women. Pooled results showed no significant difference between groups (RR 0.79, 95% CI, 0.56, 1.10; I²=0%). One cohort study contributed data from 1,665 women; results significantly favored the treatment group (RR 0.26, 95% CI, 0.18, 0.37).

Figure 59

Effect of treatment on outcomes for offspring of women with GDM: hyperbilirubinemia. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Mortality

Six studies provided data on perinatal deaths (Table 14).50,54,92,96,146,152 No significant differences were found between groups for the three RCTs50,54,96 (RD 0; 95% CI, -0.01 to 0.01, n = 2,287) or for the three cohort studies92,146,152 (RD 0; 95% CI, -0.01 to 0.01, n = 2,928; Figure 60). There was heterogeneity among the three RCTs with one study showing a significant difference in favor of the treatment group.

Figure 60 compares perinatal deaths between groups of untreated women and women treated for GDM. Three RCTs contributed data from 2,287 women. Pooled results showed no significant difference between groups (RD -0.00, 95% CI, -0.01, 0.01; I²=66%). Three cohort studies contributed data from 2,928 women. No significant difference was found between groups (RD -0.00, 95% CI, -0.01, 0.01; I²=0%).

Figure 60

Effect of treatment on outcomes for offspring of women with GDM: perinatal deaths. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Respiratory Complications

Two RCTs50,54 reported on respiratory distress syndrome and showed no significant difference between groups (RR 1.05; 95% CI, 0.48 to 2.28, n = 1,962; Table 17, Figure 61). One cohort study146 reported respiratory complications and showed a significant difference favoring the treated group (RR 0.16; 95% CI, 0.10 to 0.26, n = 1,665).

Figure 61 compares the number of respiratory complications between groups of women treated and untreated for GDM. Two RCTs presented data from 1,962 women. Pooled results showed no significant difference between groups (RR 1.05, 95% CI, 0.48. 2.28; I²=58%). One cohort study contributed data from 1,665 women; results showed a significant difference in favor of the treatment group (RR 0.16, 95% 0.10, 0.26).

Figure 61

Effect of treatment on outcomes for offspring of women with GDM: respiratory complications. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

APGAR

One RCT50 and one cohort study95 compared APGAR scores at 1 minute (Table 17). Both showed a significant difference favoring the treatment group, although the results were more dramatic for the cohort study (RCT MD -0.30; 95% CI, -0.56 to -0.04, n = 83; cohort MD -1.00; 95% CI, -1.54 to -0.46, n = 126; Figure 56). Another cohort study92 reported the number of infants with APGAR scores <7 at 1 minute and showed no difference between groups (RR 0.76, 95% CI, 0.47 to 1.25). Two RCTs97,98 and one cohort study95 compared APGAR scores at 5 minutes and no overall differences were found (Figure 62). There was substantial statistical heterogeneity between the two RCTs with one RCT showing no difference and the second showing a significant difference favoring the untreated group. The cohort study showed no difference (n = 126). One study50 reported APGAR scores <7 at 5 minutes and found no difference between groups (n = 1,030).

Figure 62 compares fetal APGAR scores at 5 minutes between groups of women treated with GDM and those who were not. Two RCTs contributed data from 383 women, however results were not pooled due to substantial heterogeneity (I²=77%). Results for individual studies were: Bevier 1999 MD 0.00 (95%CI, -0.15, 0.15) and Bonomo 2005 MD 0.20 (95% CI, 0.09, 0.31). One cohort study contributed data from 126 women. The MD was 0.00 (95% CI, -0.27, 0.27).

Figure 62

Effect of treatment on outcomes for offspring of women with GDM: APGAR scores, 5 minutes. CI = confidence interval; GDM = gestational diabetes mellitus; IV = inverse variance; RCT = randomized controlled trial; SD = standard deviation

Other Infant Outcomes

Single studies reported on elevated cord blood c-peptide level,54 preterm delivery,54 length,97 ponderal index,97 any serious perinatal complication,50 and abnormal fetal heart rate.98 Significant differences were found for ponderal index (MD -0.09; 95% CI, -0.16 to -0.02, n = 300) and any serious perinatal complication (RR 0.32; 95% CI, 0.14 to 0.73, n = 1,030). Both results favored the treated group (Table 17).

Long Term
Type 2 Diabetes Mellitus

One small study reported 7 to 11 year followup and showed no significant difference in the incidence of type 2 diabetes mellitus among the offspring (RR 1.88; 95% CI, 0.08 to 44.76, n = 89).96 The same study reported impaired glucose tolerance at 7-11 year followup.96 Overall no difference was found (Table 17) (RR 5.63; 95% CI, 0.31 to 101.32, n = 89). The strength of evidence for both type 2 diabetes mellitus and impaired glucose tolerance was considered insufficient (Table 18).

BMI

One small study reported the incidence of BMI >95 percentile at 7 to 11 year followup and showed no significant difference between groups (RR 1.58; 95% CI, 0.66 to 3.79, n = 85; Table 17).96 The original RCT96 showed no differences in mean birth weight or macrosomia (birthweight >4,000 g and birthweight >4,500 g). A followup study9 reporting outcomes at 4 to 5 years following the initial RCT reported BMI >85 percentile and also found no difference between groups (RR 1.19; 95% CI, 0.78 to 1.82, n = 199), despite a clear difference in macrosomia rates between treatment and control group (5% vs. 22%, respectively). When the two studies were pooled, the results showed no difference (RR 1.26; 95% CI, 0.86, 1.84, n = 284, Table 17) and the strength of evidence was considered low (Table 18).

Key Question 5. What are the harms of treating GDM and do they vary by diagnostic approach?

Description of Included Studies

Five of the studies included in Key Question 4 also provided data for Key Question 5.50,54,95,97,98 The studies are described in Appendix D. All studies compared diet modification, glucose monitoring and insulin as needed with standard care. Four of the studies were randomized controlled trials,50,54,97,98 while one study was a retrospective cohort.95 The studies were published between 1999 and 2009 (median year 2005). Two studies had two associated publications reporting initial and longer term outcomes.163,164 Three studies were from the United States,54,95,98 and one each from Italy97 and Australia.50 The screening test used in most studies was OGCT with a 100 g OGTT assessed using CC criteria, except for the study from Australia that used a OGCT with a 75 g OGTT. Timing of diagnosis of GDM occurred at or after 24 weeks' gestation. Among these studies a variety of glucose threshold criteria were used for inclusion, varying from 50 g screen positive with nondiagnostic oral glucose tolerance tests to WHO criteria for a diagnosis of GDM. The 2 largest RCTs by Crowther et al. and Landon et al.50,54 used different glucose thresholds for entry in their trials: WHO and CC criteria with a fasting glucose <95 mg/dL (5.3 mmol/L), respectively. The mean fasting glucose levels at study entry were similar between these 2 trials.

Methodological Quality of Included Studies

Among the four RCTs, one had low50 and three54,97,98 had unclear risk of bias. The trials that were unclear most commonly did not report detailed methods for sequence generation and allocation concealment. Two trials54,97 were unclear with respect to blinding of participants. One trial had incomplete reporting of outcome data.98 The cohort study was high quality (7/9 points);95 the primary limitation was not controlling for potential confounders.

Key Points

  • There was no evidence for some of the outcomes stipulated in the protocol including costs and resource allocation. There was limited evidence for harms and the evidence related to anxiety and depression. There was also limited evidence for number of prenatal visits and admissions to NICU. Results are detailed below.

Maternal Outcomes

  • A single RCT assessed depression and anxiety at 6 weeks after study entry and 3 months postpartum. There was no significant difference between groups in anxiety at either time point, although there were significantly lower rates of depression in the treatment group at 3 months postpartum.

Outcomes in the Offspring

  • Four RCTs reported small for gestational age and found no significant difference.

Health System Outcomes

  • Three RCTs and one cohort study provided data on admission to NICU and showed no significant differences overall. One trial was an outlier as it showed a significant difference favoring the no treatment group. This difference may be attributable to site-specific policies and procedures.
  • Two RCTs reported on the number of prenatal visits and generally found significantly more visits among the treatment groups. The same two studies showed a lower incidence of patients requiring insulin therapy in the untreated groups.
  • There was inconsistency across studies in terms of induction of labor with no difference found for the 2 RCTs overall and a significant difference favoring the treatment group among the one cohort study included. Among the RCTs, one showed a significant difference with fewer cases in the group receiving no treatment,50 while the other study showed no difference.54 In the RCT that showed more inductions of labor in the treatment group, no recommendations were provided regarding obstetrical care, thus replicating usual clinical care of women with GDM. In the other RCT, antenatal surveillance was reserved for standard obstetrical indications.
  • No differences between groups were found for cesarean section (5 RCTs, 6 cohorts) or unplanned cesarean section (1 RCT, 1 cohort).

Detailed Synthesis

Maternal Outcomes

Depression and Anxiety

One RCT assessed depression and anxiety at 6 weeks after study entry and 3 months postpartum.50 Depression was assessed using the Edinburgh Postnatal Depression Score and anxiety was assessed using the Spielberger State-Trait Anxiety Inventory. There was no significant difference between groups in anxiety at either time point, although there were significantly lower rates of depression in the treatment group 3 months postpartum (Table 19). The authors of the primary study note that the findings regarding anxiety and depression should be interpreted cautiously because they were based on a subgroup of the women included in the trial.

Table 19. Evidence summary for Key Question 5.

Table 19

Evidence summary for Key Question 5.

Fetal/Neonatal/Child Outcomes

Small for Gestational Age (SGA)

SGA was reported in four RCTs50,54,97,98 and overall no significant difference was found between groups (RR 1.10; 95% CI, 0.81 to 1.48; Table 19, Figure 63).

Figure 63 compares the number of small for gestational age infants, as an adverse effect of GDM treatment, between groups of treated and untreated women. Four RCTs contributed data from 2,345 women. Pooled results showed no significant difference between groups (RR 1.10, 95% CI, 0.81, 1.48; I²=0%).

Figure 63

Effect of treatment on adverse effects for infants of mothers with GDM: SGA. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; SGA = small for gestational age

Society/Health Care System Outcomes

Admission to NICU

Three RCTs50,54,97 and one cohort study95 provided data on admission to the NICU (Table 19). Among the three RCTs there was no significant difference overall (RR 0.96; 95% CI, 0.67 to 1.37, n = 2,262; Table 19, Figure 64), although there was substantial statistical heterogeneity (I2 = 61%). One study was an outlier as it showed a significant effect favoring the untreated group (RR 1.15; 95% CI, 1.05 to 126, n = 1,030). Removing this study from the analysis reduced the heterogeneity to 0% and the result remained non-significant. One cohort study also showed no significant difference in NICU admissions (RR 0.66; 95% CI, 0.19 to 2.35, n = 126).95

Figure 64 compares the number of NICU admissions between mothers treated for GDM and those untreated. Three RCTs contributed data from 2,262 patients. Pooled results showed no significant difference between groups (RR 0.96, 95% CI, 0.67, 1.37; I²=61%). One cohort study presented data on 126 patients and showed no significant difference between groups (RR 0.66, 95% CI, 0.19, 2.35).

Figure 64

Effect of treatment on adverse effects for infants of mothers with GDM: NICU admissions. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; NICU = neonatal intensive care unit; RCT = randomized controlled trial

Number of Prenatal Visits

Two RCTs reported on the number of prenatal visits.50,54 Landon et al.54 reported an average of seven prenatal visits in the treatment group versus five in the control group (p<0.001). Crowther et al.50 reported the median number of antenatal clinic visits and physician clinical visits after enrolment. The intervention group had fewer antenatal clinic visits (median 5.0 [inter-quartile range (IQR) 1-7] vs. 5.2 [IQR 3-7], p<0.001); whereas they had more physician clinic visits (median 3 [IQR 1-7] vs. 0 [IQR 0-2]). The intervention group also had significantly more visits with a dietician (92 percent vs. 10 percent, p<0.001) and with a diabetes educator (94 percent vs. 11 percent, p<0.001).

Induction of Labor

[Note: This outcome was presented under Key Question 4. It is also presented here as it may be considered a harm in terms of more resource use and more invasive management.] Three studies provided data on induction of labor50,54,146 but results differed significantly across the studies (Table 19). Two RCTs showed no significant difference overall (RR 1.16, 95% CI 0.91 to 1.49, n = 1,931), although there was important statistical heterogeneity between studies (I2 = 69%). One RCT showed a significant difference favoring no treatment,50 while the other study showed no difference (Figure 65).54 Different study protocols may account for the heterogeneity of results between studies. In the study that showed more inductions of labor in the treatment group, no recommendations were provided regarding obstetrical care, thus replicating usual clinical care of women with GDM. In the other study, antenatal surveillance was reserved for standard obstetrical indications. In contrast the one cohort study showed a significant difference with fewer inductions in the treatment group (RR 0.63, 95% CI 0.55 to 0.72, n = 1,665).146 Baseline differences in the study populations and regional practices may have accounted for the different results between studies. Further, the comparison group in the cohort study was women who presented late for obstetrical care which confounds the relationship between induction and GDM treatment. Furthermore, the cohort study protocol was to deliver these women within one week of GDM diagnosis so the outcome of induction was substantially confounded by different delivery protocols between treatment and nontreatment groups.

Figure 65 is a metagraph comparing rates of induction of labor between women treated for GDM vs. those left untreated. Two RCTs contributed data from 1,931 participants. Pooled results showed no significant difference between groups (RR 1.16, 95% CI, 0.91, 1.49; I²=69%). One cohort study contributed data from 1,665 patients and showed a significant difference favoring the treated group (RR 0.63, 95% CI, 0.55, 0.72).

Figure 65

Effect of treatment on outcomes of women with GDM: induction of labor. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial

Cesarean Delivery

[Note: This outcome was presented under Key Question 4. It is also presented here as it may be considered a harm in terms of more resource use and more invasive management.] All studies provided data on cesarean delivery (Table 19).50,54,92,95-98,146,148,152,160 There was no significant difference in the pooled estimates for the RCTs (RR 0.90, 95% CI 0.79 to 1.01, n = 2,613) or for the cohort studies (RR 1.09, 95% CI 0.90 to 1.31, n = 3,110; Figure 66). The results were statistically homogeneous across all studies. One RCT50 and one cohort study95 reported emergency cesarean deliveries and found no difference (RCT, RR 0.81, 95% CI 0.62 to 1.05, n = 1,000; cohort, RR 0.83, 95% CI 0.33 to 2.06, n = 126).

Figure 66 is a metagraph comparing cesarean delivery rates between women treated for GDM and those left untreated. Five RCTs contributed data from 2,613 patients, pooled results showed no significant difference between groups (RR 0.90, 95%, 0.79, 1.01; I²=0%). Additionally, six cohort studies contributed data from 3,110 patients. Pooled results demonstrated no significant difference between groups (RR 1.09, 95% CI, 0.90, 1.31; I²=23%).

Figure 66

Effect of treatment on outcomes of women with GDM: cesarean delivery. CI = confidence interval; GDM = gestational diabetes mellitus; M-H = Mantel-Haenszel; RCT = randomized controlled trial