NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Guise JM, Eden K, Emeis C, et al. Vaginal Birth After Cesarean: New Insights. Rockville (MD): Agency for Healthcare Research and Quality (US); 2010 Mar. (Evidence Reports/Technology Assessments, No. 191.)

Cover of Vaginal Birth After Cesarean: New Insights

Vaginal Birth After Cesarean: New Insights.

Show details


Topic Development

Analytic Framework and Key Questions

The Planning Committee for the National Institutes of Health (NIH) Consensus Development Conference on Vaginal Birth After Cesarean (VBAC): New Insights determined the key questions for this evidence report. Key questions examine 1) a chain of evidence about factors that may influence VBAC, 2) maternal and infant benefits and harms of attempting a VBAC versus an elective repeat cesarean delivery (ERCD), and 3) factors that may influence maternal and infant outcomes. Figure 4 presents an analytic framework that illustrates the clinical logic and contextual factors that underlie the key questions of this report. An analytic framework is intended to illustrate relevant clinical logic and other influencing factors, in this case relating to VBAC. It is meant to clarify the context in which decisions about route of delivery are made, clarify direct and indirect associations, and clarify assumptions and disagreements that underlie clinical controversies. Thus, the analytic framework serves as a central conceptual model for what information is being sought (key questions), what the literature tells us, and the information gaps between the two.

Figure 4. Analytic framework.

Figure 4

Analytic framework. Key Questions 1. Among women who attempt a trial of labor after prior cesarean, what is the vaginal delivery rate and the factors that influence it?

The framework starts with the population of interest, in this case women with a prior cesarean delivery. It explicitly aims to understand a woman’s initial intended route of delivery and the factors that influence that initial intention. During this evolving decisionmaking process, there may be adverse outcomes that arise from discordance between an initial preference and the actual choices available, or there may be unforeseen benefits. The routes of delivery are listed in some detail with respect to features that may contribute uniquely to risks and benefits. The framework then clarifies the relationship among the route of actual delivery, intermediate outcome measures, and maternal and infant health outcomes. This framework represents both what might be found in the literature and also important considerations for consumers, clinicians, payers, policymakers, and future research. Studies that measure health outcomes, such as maternal and infant mortality, are emphasized over studies of intermediate outcomes (e.g., nonreassuring fetal tracing). Studies providing evidence of a direct association between an intervention (e.g., ERCD) and health outcomes are said to provide direct evidence and are given greater weight than are studies that provide indirect evidence.

Technical Expert Panel and Expert Reviewers

A technical expert panel (TEP) (Appendix A) was assembled at the start of the evidence report process to provide input from experts and clinicians in the field to ensure that the scope of the project addressed important clinical questions and issues. The panel included obstetrician/gynecologists, internists, pediatricians, family physicians, and researchers. The panel convened for periodic conference calls during the course of the project. Expert reviewers (Appendix B), including several panel members, provided comments on the draft evidence report.

Literature Search and Strategy

Relevant studies were identified from searching MEDLINE, Database of Abstracts of Reviews of Effectiveness (DARE), and the Cochrane Database of Systematic Reviews and Controlled Trials (1966 to September 2009) multiple times over the course of the project (Appendix C). Additional articles were obtained from recent systematic reviews, reference lists, reviews, editorials, hand searching, Web sites, and by consulting experts. Retrieved abstracts were entered into an electronic database (EndNote®).

A total of 3,134 unique citations were reviewed from the searches. Two investigators reviewed a random set of titles and abstracts to select articles for full text review. When an appropriate level of reliability was reached for inclusion/exclusion of studies, the remaining titles and abstracts were divided up and reviewed by one investigator. A research assistant tracked the inclusion status and names of reviewers for each abstract reviewed. The full text articles of citations that had original data about maternal and infant outcomes relevant to a key question in one or more topic area were retrieved.

Inclusion and Exclusion Criteria

The target population includes women of reproductive age in the United States (U.S.) with a prior cesarean delivery who are eligible for a trial of labor (TOL) or ERCD. Settings that were applicable to a U.S. population were included. Therefore, studies were not limited to the U.S.; foreign studies were included if originating from a developed country (Appendix D). This was believed to offer the broadest range of information on maternal and infant outcomes applicable to the U.S. population. The evidence report emphasizes the patient’s perspective in choice of mode of delivery, interventions needed for induction and/or augmentation of labor, and potential adverse effects of a specific mode of delivery on maternal and infant outcomes. It also considers the generalizability of efficacy studies performed in controlled settings.

The Planning Committee for the NIH Consensus Development Conference identified their ideal population of interest as term infants (greater than or equal to 37 weeks gestational age [GA]). However, after initial review of the searches, there was concern about the lack of data on term-only infants. Two cohort studies that compared outcomes of term infants with preterm infants were reviewed and showed no difference.60, 61 For both of these reasons, general population studies of women with a prior cesarean delivery who delivered at any GA (preterm and term) were included as were studies that focused exclusively on women delivering at term. While this approach was thought to be reasonable for maternal outcomes, infant outcomes are affected by prematurity, and the scope of studies for this topic remained limited to term studies, except where noted.


For all key questions, full text studies with data on women with a prior cesarean delivery eligible for a TOL or ERCD and maternal and/or infant outcomes were initially reviewed. They were subsequently included if they met eligibility criteria: 10 or more subjects, participants represented the target population, and data on benefits and harms to the mother or infant given either mode of delivery. Exclusions included studies of women without a prior cesarean delivery, nulliparous patients, breech delivery, exclusive focus on preterm delivery, low birth weight, studies of pregnancies including twins or abortions, studies begun or published before the 1980 NIH Consensus Conference on VBAC, and studies focusing on patients with particular conditions such as gestational diabetes, human immunodeficiency virus, preeclampsia, etc. (Appendix E). Non-English language papers, editorials, letters, studies available exclusively in abstract form, and studies of animals or cadavers were also excluded. Case-control and case series studies meeting similar inclusion/exclusion criteria were examined and included if they reported relevant data. Given that the focus of this report is intended for a U.S. obstetric population, studies conducted in undeveloped or developing countries were excluded (Appendix D). If the authors described their country as “developing” in either the abstract or the article, it was excluded. For a full listing of excluded studies, please see the excluded studies list in Appendix F.

Investigators read the full text version of the retrieved papers and re-applied the initial eligibility criteria. For all topics, articles were excluded if they did not provide sufficient information to determine the methods for selecting subjects and for analyzing data. For some topics, additional criteria were applied to select studies that were systematically reviewed.

For the purposes of this report, the term “rate” is used to describe the proportion of women who experienced a given event (i.e., VBAC, ERCD, etc.). Though this is not always technically correct when expressing summary statistics, it is a term that is used throughout the literature and easily understood by patients and clinicians.

Among women who attempt a trial of labor after prior cesarean, what is the vaginal delivery rate and the factors that influence it?

For this question, full text randomized controlled trials (RCTs) were reviewed in addition to observational studies. This key question was limited to studies with data on women with a prior cesarean delivery having a TOL and vaginal delivery rates and/or factors influencing the delivery rate. For evaluating the rates of TOL and VBAC, studies were included if they explicitly stated the inclusion and exclusion criteria and provided data for computing TOL or VBAC rates.

To evaluate the effect of induction of labor (IOL) on women with a prior cesarean delivery, only studies that reported the number of women who were induced and the corresponding number with VBAC, uterine rupture, or other relevant outcome were included. Of particular interest were those studies that stratified data by the method of IOL, or whether oxytocin was used for IOL or augmentation or both. We included RCTs and observational studies, preferring cohort designs with an inception cohort but also including less rigorous designs (case series) because data from RCTs and cohort studies were sparse.

What are the short- and long-term benefits and harms to the mother of attempting trial of labor after prior cesarean versus elective repeat cesarean delivery, and what factors influence benefits and harms?

For this question, the ideal comparison group was intended VBAC versus intended ERCD. However, as no studies of health outcomes measured intent, the primary comparison groups were TOL and ERCD, unless otherwise noted.

An important concern for patients, providers, hospitals, and policymakers regarding VBAC is the potential for uterine rupture, which can have severe maternal and infant morbidity and mortality. To determine how frequently uterine rupture occurs, it is important to have a clear definition for uterine rupture. A prior systematic review of VBAC and uterine rupture found that studies varied widely in their definition and use of terminology surrounding uterine rupture.62 This report uses the anatomic definition of uterine rupture that was proposed by the prior evidence report and restated by the TEP for this project. For the purposes of this report, uterine rupture is defined as:

  • Complete Uterine Rupture – separation through the entire thickness of the wall including visceral serosa (with or without extrusion of part of all of fetal-placental unit)
  • Incomplete Uterine Rupture – separation that was not completely through all layers of the uterine wall (e.g., serosa intact)62

To evaluate the effect of IOL on the outcome of uterine rupture, the definitions given above were used in the primary analysis. Because the number of studies using these definitions was small, the scope of literature was expanded to enable examination of individual factors predicting uterine rupture, predictive tools, and imaging to predict uterine rupture.

What are the short- and long-term benefits and harms to the baby of maternal attempt at trial of labor after prior cesarean versus elective repeat cesarean delivery, and what factors influence benefits and harms?

For this question, all studies were limited to term infants (greater than or equal to 37 weeks GA), with the exception of fetal macrosomia and perinatal mortality. Studies addressing the influence of fetal macrosomia used fetal weight (greater than or equal to 4,000 grams) for the inclusion criteria instead of GA.

To measure the frequency of perinatal mortality and the corresponding subsets of perinatal death, we used the definitions accepted by the National Center for Vital Statistics.63 The definition of perinatal death (perinatal II) included infants less than 28 days of age and fetal deaths of 20 weeks or more GA. To study the frequency of stillbirth (antepartum and intrapartum) we used both the intermediate and late fetal definitions of fetal death. Intermediate (20–27 weeks GA) and late fetal death (greater than or equal to 28 weeks GA) referred to the intrauterine death of a fetus before delivery. Although most fetal deaths occur early in pregnancy (less than 20 weeks GA), most countries, and in particular the U.S., only report intermediate and late fetal death. Neonatal (infant) mortality was defined as death in the first 28 days of life.63 Rates for perinatal, fetal, and neonatal mortality were reported per 1,000 live births.

To better understand the relationship between perinatal morbidity and mortality with fetuses or infants born to women with a prior cesarean delivery, we excluded studies that did not exclude cases with congenital or lethal anomalies (before or after analyses). If it could be determined in the analysis or discussion section of a study which perinatal deaths were owed to congenital anomalies, we retained the study for inclusion (excluding the deaths attributed to anomalies). To reduce the effects of prematurity on the neonatal mortality rate, we limited our analyses of neonatal mortality to term infants.

What are the critical gaps in the evidence for decisionmaking, and what are the priority investigations needed to address these gaps?

For this question, investigators reviewed the synthesis of their results and compiled a list of relevant areas that were lacking in evidence. During the expert review process, reviewers were asked to prioritize the gaps identified by the investigators. Specifically, experts were asked if stated topic areas were of low, medium, or high priority for future research and to provide additional clarification on their positions. Areas rated as highest priority, meaning 50 percent of the experts rated the domain as high, are discussed in this section.

Special Considerations

There are topics of interest that do not easily fall into the key questions, these include: effect of maternal obesity, multiple cesarean deliveries, and direction of cesarean scar on outcomes. For the effect of maternal obesity on outcomes, full text papers were excluded if the prior cesarean delivery group was not broken out in TOL analysis, VBAC rates were not provided by body mass index (BMI) or weight categories, or if BMI or weight were used as one of many predictors (e.g., regression model, modeling study) without other usable analysis. For the multiple cesarean deliveries, studies were included that specified maternal outcome by number of cesareans. Outcome by exact number of prior cesareans were identified when possible. For the direction of cesarean scar, studies were limited to those that identified direction of scar and specified outcomes by scar direction.

Data Extraction and Synthesis

All eligible studies were reviewed and a “best evidence” approach was applied, in which studies with the highest quality and most rigorous design are emphasized.64 Data were extracted from each study, entered directly into evidence tables, and summarized descriptively. Benefits and adverse effects of mode of delivery were considered equally important and both types of outcomes were abstracted.

Studies were included in the synthesis of the evidence report if they achieved a good or fair quality rating as determined by study design, methods, and analysis. When possible, original data were used as presented in the article. When necessary, raw numbers were calculated from given percentages. Data were pooled from studies evaluating the same outcomes of interest. All results are reported as percentages to allow the reader to make direct comparisons of frequency. Because many of the adverse outcomes are rare, percentages were also translated into rates consistent with those reported in vital statistics, for example maternal death is reported per 100,000, while hysterectomy, infection, fever, transfusion, incidence of placenta previa by number of prior cesareans, neonatal mortality, and perinatal mortality are reported per 1,000. Several included studies came from the National Institute of Child Health and Human Development Maternal-Fetal Medicine Units Network (MFMU) cohort (Appendix G). For synthesis, the most appropriate study is included for the outcome being discussed.

Studies that did not use an anatomic definition for uterine rupture were excluded from analysis of rates of uterine rupture. For other topic areas where studies that used an anatomic definition were not available (predictors, imaging to predict uterine rupture, and timing from symptom to delivery as a predictor of infant outcome), information was provided from existing studies.

Quality Rating of Individual Studies

Reviewers rated the quality of RCTs, cohort, case-control, and case series studies using criteria specific to different study designs developed by the U.S. Preventive Services Task Force (U.S.PSTF) and additional criteria developed by the National Health Service Centre for Reviews and Dissemination, based at the University of York in England (Appendix H).65, 66 Two reviewers independently reviewed a small portion of the studies. When reviewers disagreed, a final rating was reached through consensus. When a Kappa of at least 0.60 was met between reviewers, a single reviewer rated the remaining studies.67 The Kappa between reviewers for this evidence report was 0.743 (95 percent CI: 0.537 to 0.949). Studies reporting several different outcomes may have different quality ratings for each outcome depending on how accurate the measure used was and how completely it controlled for potential confounders in multi-variable models (Appendix I). Studies determined to be poor quality were not included in the analyses, unless no studies of better quality were available for a given topic.

In addition to the quality criteria for each study design, the evaluation of prediction modeling studies required that they provided a clear definition of prognostic factors. The most important criteria for these studies were comparable groups that included clear inclusion and exclusion criteria; clear definitions of the prognostic factors; and adjustment (as needed, for studies without comparable groups) for confounders. To achieve a rating of good, minimally, the study had to meet these three ratings. A study with comparable groups and no need for adjustment could still meet this standard. For studies that only met two of these three criteria, the highest quality rating they could achieve was fair.

Development of Evidence Tables and Data Abstraction Process

The following information about the patient population, study design, study outcomes, and study quality was extracted from full text, published studies of VBAC and TOL, IOL, ERCD, or uterine rupture and was used to construct evidence tables showing: identifying information (study name, years of observation); setting (population-based, referral clinic-based, other); study design (randomized trial, prospective, etc.); interventions (induction, augmentation medications); outcomes studied (infant, maternal, cost, etc.); length of followup; statistical methods for handling confounders (statistical adjustment, stratification, none) and attrition; numbers of subjects recruited, included, and completing study; and characteristics of the sample (demographic variables, number of previous births, other risk factors). Data were abstracted by one reviewer and verified by a second.

For prediction studies, odds ratios are routinely presented unless noted by another measure of risk (relative risk [RR]). When possible all results are presented using the same reference and outcome. In the prediction sections, the results are presented for predicting TOL and for predicting VBAC. If a study, for example, predicted repeat cesarean delivery after a TOL, the odds ratios were inverted so that all studies predicted VBAC.

Strength of Available Evidence

We assessed the overall strength of the body of evidence for each key question using the methods described in the Methods Reference Guide for Effectiveness and Comparative Effectiveness Reviews used by the Evidence-based Practice Centers.65 The purpose of grading the strength of the whole body of evidence is to provide information beyond study design hierarchy and internal validity assessment to other factors that are important to the application of information for clinical practice and policy. Parameters thought important to policymakers include quantity of evidence, assessment of risk of bias, precision, directness, and consistency. Risk of bias was assessed in two ways: first by describing the study designs used and assuming a hierarchy of designs in terms of risk of bias (randomized controlled trials inherently having lower risk of bias than do observational study designs) and secondly by reporting the cumulative internal quality rating of the included studies. In order to have low risk of bias, the studies would be both RCTs and also have good internal quality ratings overall, and bodies of evidence that are based on observational studies with poor internal quality ratings would be determined to have a high risk of bias. Moderate risk of bias would be the many combinations of factors that would fall in between these two extremes. Consistency was evaluated by determining if the majority of study results were trending in a similar direction, such that their point estimates and confidence intervals (CI) may vary, but the overall conclusions are similarly evaluable. The directness of the evidence was assessed by whether there was a direct link between the interventions studied and the outcomes of interest; for example if mortality due to uterine rupture is the outcome of interest, did the studies evaluate these in the same study or were separate bodies of evidence required to answer the question. Precision refers to how sure one can be of the point estimate of effect and was assessed by examining the narrowness of CIs of studies or CI of the point estimate resulting from pooled analysis. The body of evidence was graded for the evidence surrounding the most important outcomes in the report. A table was created presenting the ratings for each of these domains for the following maternal outcomes: VBAC rate, IOL rate (with subcategories of VBAC rate, uterine rupture rate, and other harms–stratified by intervention), maternal mortality, rate of uterine rupture, hysterectomy, transfusion, hemorrhage and blood loss, effect of IOL on hemorrhage, infection, and long-term sequelae (adhesions, pelvic pain, and reproductive health). The following infant outcomes are also captured on the table: perinatal, fetal, and neonatal mortality; transient tachypnea of the newborn (TTN); respiratory morbidity with bag-and-mask ventilation; respiratory morbidity with intubation for meconium; hypoxic-ischemic encephalopathy/asphyxia; neonatal intensive care unit (NICU) admissions; neurologic sequelae (short- and long-term); sepsis; trauma; and breastfeeding (Appendix J). From the assessments of the domains described above, an overall grade of the strength of the body of evidence was determined (high, medium, low, or insufficient). A high strength of evidence reflects a high degree of confidence that the body of evidence presents the true effect and suggests that additional studies and future research would have a low likelihood of changing the estimate. A moderate strength of evidence suggests that the confidence in the body of evidence is moderate and that additional studies may change the estimate. A low strength of evidence suggests that the confidence that the body of evidence is reflecting the true estimate is low and that it is likely that new studies may change the estimate. An insufficient strength of evidence suggests that there is either no evidence or that the body of evidence does not permit estimating the true effect.

Data Synthesis

In addition to discussion of the findings of the studies overall, meta-analyses were conducted to summarize data and obtain more precise estimates on main outcomes for which studies were homogeneous enough to provide a meaningful combined estimate. Otherwise, the data are summarized qualitatively.

For common events, e.g., TOL and VBAC, where normal approximation applies, estimates of rates and their standard errors were calculated from each study and directly combined. A random effects model68 was used to combine the studies while incorporating variations among studies. Statistical heterogeneity was assessed by using the standard Q-test and the I2 statistic (the proportion of variation in study estimates due to heterogeneity rather than sampling error).69 Based on the Cochrane handbook, a rough guide to interpret I2 is as follows:

  • 0 to 40 percent: might not be important;
  • 30 to 60 percent: may represent moderate heterogeneity;
  • 50 to 90 percent: may represent substantial heterogeneity;
  • 75 to 100 percent: considerable heterogeneity.

Furthermore, the importance of the observed value of I2 depends on (i) magnitude and direction of effects and (ii) strength of evidence for heterogeneity (e.g., P value from the chi-squared test).69 The proportion of women in the induced groups who achieved VBAC or who had a uterine rupture was combined using MetaAnalyst (Beta 3.13; Tufts Medical Center).70 For the other outcomes, the rates were combined using STATA 10.1® (StataCorp, College Station, Texas, 2009).

For rare or relatively rare events—e.g., the number of ruptures, maternal deaths and infant deaths, etc.—normal approximation does not apply well to estimates of rates directly, and we used two slightly different methods to combine them. When studies did not report zero events in the group, we first logit-transformed the rates before combining the studies as the distribution for the logits of rates were usually approximately normal. The studies were then combined using a random effects model,68 and the combined rates were obtained by transforming the combined logit-rates to its original scale. Statistical heterogeneity (Q-test and I2 statistic) was assessed based on the logits of rates for these outcomes. These analyses were performed by using STATA 10.1® (StataCorp, College Station, Texas, 2009). When there are studies that reported zero events, a logistic random effects model71, 72 was used to include studies without events. This model also applies the logit-transformation of the rates to achieve better statistical property. In this case, statistical heterogeneity was assessed using Fisher’s exact test, and analyses were performed using the NLMIXED procedure in SAS v9.2 (SAS Institute Inc., Cary, NC, USA).

Risk ratio and/or risk difference were used to compare various rates between TOL and ERCD groups. Again the studies were combined by using a random effects model68 and statistical heterogeneity was assessed using Q-test and I2 statistic.

Forest plots were presented to graphically summarize the study results and the pooled results.73 To explore heterogeneity, we performed subgroup analyses and meta-regression74, 75 to evaluate whether the summary estimates differ by study level characteristics.

Size of Literature

Of the 3,134 citations reviewed from the searches, 2,171 met exclusion criteria at the abstract level and were not reviewed further. After the abstract review process, 963 full text papers were retrieved and reviewed for inclusion. An additional 37 full text papers were retrieved during the peer review process. After applying inclusion/exclusion criteria, a total of 203 full text papers met inclusion. Investigators quality rated included studies, and those rated good or fair quality are discussed in this report. As mentioned previously, poor quality studies are not discussed unless no studies of better quality were available for a given topic. For the topics presented, 71 studies provided data on TOL and VBAC rate, 27 on IOL or augmentation, 28 on predictors of TOL and VBAC, 14 on scored models for predicting VBAC, 41 on maternal outcomes, 28 on uterine rupture, 11 on infant outcomes, 19 on abnormal placentation, seven on obesity, 12 on multiple cesarean deliveries, and seven on direction of cesarean delivery scar (Figure 5).

Figure 5. Search and selection of literature.

Figure 5

Search and selection of literature. * Databases searched include MEDLINE, Cochrane and DARE (see Appendix C for search strategies) See Appendix E for details on inclusion and exclusion criteria


Appendixes and evidence tables cited in this report are available at http://www​​/pub/evidence​/pdf/vbacup/vbacup.pdf.

PubReader format: click here to try


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...