Figure 1: Diagrammatic representation of causal pathway
Myers ER, Blumrick R, Christian AL, et al. Management of prolonged pregnancy. Evidence Report/Technology Assessment No. 53 (Prepared by Duke Evidence-based Practice Center, Durham, NC, under Contract No. 290-97-0014). AHRQ Publication No. 02-E018. Rockville, MD: Agency for Healthcare Research and Quality. May 2002.
The Agency for Healthcare Research and Quality (AHRQ, formerly the Agency for Health Care Policy and Research, AHCPR), through its Evidence-based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematicallyreview the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.
To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.
AHRQ expects that the EPCevidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.
We welcome written comments on this evidence report. Theymay be sent to: Director, Center for Practice and Technology Assessment, Agency for Healthcare Research and Quality, 6010 Executive Blvd., Suite 300, Rockville, MD 20852.
| Carolyn M. Clancy, M.D. | Robert Graham, M.D. |
| Acting Director, | Director, |
| Agency for Healthcare Research | Center for Practice |
| and Quality | and Technology Assessment |
| The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services of a particular drug, device, test, treatment, or other clinical service. |
Approximately 18 percent of pregnancies in the United States extend beyond 41 weeks gestation, 7 percent beyond 42 weeks. Risks of adverse perinatal and maternal outcomes increase with increasing gestational age beyond term. This report assesses the literature on the benefits, risks, and costs of different strategies for managing prolonged pregnancy in order to avoid adverse perinatal and maternal outcomes.
Published literature on the management of prolonged pregnancy was identified in MEDLINE, CINAHL, EMBASE, HealthSTAR, the Cochrane Database of Systematic Reviews, and the Database of Abstracts of Reviews of Effectiveness for the years 1980 through 2001. MeSH terms included "pregnancy, prolonged" and "post$ pregnan$.tw".
Study designs considered included randomized controlled trials, cohort studies, and large (n > 20) case series with or without controls. Studies were included if the study population included women with prolonged pregnancy and data were provided that were relevant to one or more of the key research questions. Studies were excluded from formal abstraction if they did not report on original research, the patient population did not include women with prolonged pregnancy, the study design was a single case report or small case series, or a 2-by-2 table could not be constructed (for studies of test characteristics).
Paired reviewers independently screened each abstract and article and performed the data abstraction. Included studies were graded for internal and external validity. Supplemental data were collected from the Nationwide Inpatient Sample.
Although there is no direct evidence that antepartum testing reduces perinatal mortality in prolonged gestation, retrospective data suggest that morbidity may be reduced. Selection of appropriate outcomes for evaluating antepartum testing is difficult since mortality and morbidity are rare, and commonly used surrogate markers have substantial weaknesses. All currently used tests and combinations of tests have better specificity than sensitivity but good negative predictive values. There are no definitive data supporting the superiority of any particular testing method.
Most studies of interventions for the induction of labor do not report results specifically for women induced because of prolonged pregnancy or its complications. In general, agents that result in more efficient induction of labor also have higher rates of fetal heart rate pattern changes associated with frequent uterine contractions.
Pooled analysis of randomized trials of planned induction versus expectant management with antepartum testing suggests that planned induction reduces the risk of perinatal death with no increase in other perinatal or maternal morbidity, including cesarean section. At least 500 inductions are needed to prevent one perinatal death.
There are virtually no data on patient values and preferences for management options. There also are no published data on potential differences in epidemiology or outcomes of prolonged pregnancy in racial, ethnic, or socioeconomic subgroups and no data allowing comparison of the cost-effectiveness of different strategies for managing prolonged pregnancy.
Induction of labor at 41 weeks or beyond results in fewer perinatal deaths compared with antepartum testing, but at least 500 inductions are necessary to prevent one death. There is insufficient evidence to recommend any specific induction agent in this setting. Additional high-quality research is needed.
The estimated date of confinement, or due date, for normal pregnancies is calculated as 38 weeks after conception, or 40 weeks after the first day of the last normal menstrual period (assuming a "normal" 28-day menstrual cycle). Prolonged pregnancy has traditionally been defined as a pregnancy that extends 2 weeks or more beyond the estimated day of confinement, or 42 weeks. Approximately 18 percent of pregnancies in the United States extend beyond 41 weeks, and 7 percent extend beyond 42 weeks.
It has long been known that pregnancies extending many weeks beyond the average length are at increased risk for adverse outcomes, both because certain fetal anomalies, such as anencephaly, are associated with prolonged pregnancy, and also because of an increased incidence of stillbirth among otherwise normal infants. The increasing availability of ultrasound has significantly improved the accuracy of pregnancy dating and detection of fetal anomalies, so that extremely long gestations are rare. However, adverse outcomes continue to be associated with prolonged gestation.
In some cases, these risks appear to be due to uteroplacental insufficiency, resulting in eventual fetal hypoxia. Data from large registries show that the risk of perinatal death, especially of antepartum stillbirth, increases with advancing gestational age. If risk is calculated based on the number of ongoing pregnancies, gestational-age-specific stillbirth risk reaches a nadir at 37-38 weeks and then begins to increase slowly. Risks increase substantially after 41 weeks; however, the absolute risk is still low (between 1 and 2 per 1,000 ongoing pregnancies between 41 and 43 weeks).
Other adverse outcomes associated with uteroplacental insufficiency include meconium aspiration, growth restriction, and intrapartum asphyxia. In other cases, continued growth of the fetus leads to macrosomia, increasing the risk of labor abnormalities, shoulder dystocia, and brachial plexus injuries. Potential maternal risks associated with prolonged gestation, besides the obvious emotional trauma accompanying an unexpected fetal death or serious complication, include potential increased risk of injury to the pelvic floor associated with difficult deliveries of macrosomic infants. Interventions intended to prevent adverse perinatal outcomes, such as induction of labor and cesarean section, may themselves carry iatrogenic risks, such as increased rates of infection, hemorrhage, or other complications.
Several strategies currently are used in practice to prevent adverse outcomes associated with advancing gestation. Testing methods developed for reducing perinatal morbidity and mortality in women with high-risk pregnancies because of diabetes, hypertension, or other complications of pregnancy have been applied to women with pregnancies extending beyond 40 weeks. Another strategy, induction of labor at a predefined gestational age, has been proposed and evaluated as a method of reducing perinatal mortality and other adverse outcomes associated with prolonged gestation. However, because the point at which the risk of adverse outcomes outweighs the risks and costs of active interventions is uncertain, controversy remains about the optimal timing and methods for managing increased risks to both fetus and mother associated with prolonged gestation.
Investigators at the Duke University Evidence-based Practice Center reviewed the evidence concerning the benefits, risks, and costs of commonly used tests, induction agents, and strategies for reducing the risks associated with prolonged gestation. Because of the inherent uncertainty in estimates of gestational age, variability in the length of otherwise uncomplicated pregnancies, and the lack of clear consensus on when risks of adverse outcomes outweigh risks of intervention, the researchers did not restrict the review to interventions performed only after a specified gestational age.
This summary and evidence report were prepared based on the Duke EPC review. The primary target audiences for the summary and evidence report are groups involved in writing guidelines or educational documents on management of prolonged pregnancy for health care professionals. Secondary audiences include health care professionals providing care for pregnant women (obstetricians, family physicians, nurse-midwives, nurses, childbirth educators, etc.); policymakers involved in payment decisions; agencies involved in funding basic, clinical, and health services research; media involved in dissemination and education about health issues; and patients with an interest in reviewing the medical literature concerning management of prolonged pregnancy.
Four key research questions were addressed:
What are the test characteristics (reliability, sensitivity, specificity, predictive values) and costs of measures used in the management of prolonged pregnancy (a) to assess risks to the fetus and mother of prolonged pregnancy and (b) to assess the likelihood of a successful induction of labor?
What is the direct evidence comparing the benefits, risks, and costs of planned induction versus expectant management at various gestational ages?
What are the benefits, risks, and costs of currently available interventions for the induction of labor?
Are the epidemiology and outcomes of prolonged pregnancy different for women in different ethnic groups, socioeconomic groups, or age groups (i.e., adolescents)?
The following interventions were considered:
Tests to determine risk of stillbirth or compromise related to prolonged gestation, including:
Maternal measurement of fetal movement.
Nonstress test (NST).
Contraction stress test (CST), using either nipple stimulation or oxytocin.
Amniotic fluid measurements: biophysical profile, using either five measures (reactive NST, breathing, tone, movement, amniotic fluid), or two measures (NST, amniotic fluid).
Doppler measurements of umbilical or fetal cerebral blood flow.
Tests to determine the risk of macrosomia, including estimation of fetal weight (maternal judgment, clinical examination, ultrasound).
Tests to estimate likely success of induction of labor, including:
Clinical estimation of cervical ripeness (Bishop score).
Fibronectin.
No intervention (either induction or testing).
Interventions to prevent prolonged pregnancy (scheduled sweeping of membranes).
Planned induction (either 41 weeks, 42 weeks, or later).
Testing for fetal well-being (using tests described above):
Varied time of initiation (40, 41, 42 weeks).
Varied frequency.
Amniotomy
Castor oil
Extra-amniotic saline instillation
Relaxin
Sweeping of the membranes
Foley catheter
Nipple stimulation
Oxytocin
Prostaglandins (prostaglandin E2 gel, tablets, and inserts; misoprostol)
Mifepristone
The researchers did not attempt to systematically review the basic and clinical research on the physiology of normal parturition, the role of routine ultrasound in early pregnancy, or interventions performed during labor and delivery to reduce the risks of adverse outcomes of conditions associated with, but not unique to, prolonged pregnancy (such as oligohydramnios or meconium-stained amniotic fluid).
The primary patient population considered in the review was pregnant women with a single fetus in the vertex position, approaching or past the estimated date of confinement, without any other medical or obstetrical complications (including prior cesarean section), where the only potential factor increasing the risk of an adverse perinatal or maternal outcome was advancing gestational age. The researchers also examined the potential interaction of this risk with age and race/ethnicity. The principal practice settings considered were hospitals, freestanding birthing centers, patients' homes, and prenatal clinics or other facilities where ambulatory prenatal care is delivered.
Outcomes considered varied depending on the study and the question being addressed, but the researchers focused primarily on clinically relevant outcomes. Data recorded included anatomic outcomes (changes in cervical dilation or Bishop score); perinatal and maternal mortality; surrogate markers of fetal compromise (nonreassuring changes in fetal heart rate patterns, meconium); mode of delivery (cesarean, vaginal, operative vaginal); other interventions (need for labor augmentation, need for labor induction); adverse outcomes (complications of vaginal and cesarean delivery, complications of interventions); and use of resources (time to delivery, length of stay, medication, and labor costs).
The primary sources of literature were the following databases (with search years shown in parentheses) MEDLINE (1980-December 2000), HealthSTAR (1980-December 2000), CINAHL (1983-December 2000), Cochrane Database of Systematic Reviews (CDSR) (Issue 4, 2000; Issue 1, 2001; and Issue 2, 2001), Database of Abstracts of Reviews of Effectiveness (DARE), and EMBASE (1980-Jan 2000). Searches of these databases were supplemented by secondary searches of reference lists in all included articles, especially Cochrane review articles, scanning of current issues of journals not yet indexed in the computerized bibliographic databases, and suggestions from an advisory panel.
The initial searches were performed in MEDLINE and then duplicated in other databases. All searches were limited to English-language articles published since 1980 involving human subjects. The cut-off threshold of 1980 was based on the lack of general availability of ultrasound prior to that date. It was judged that trials conducted and published prior to 1980 would be problematic both in terms of the accuracy of diagnosis and comparability with current testing and management strategies. Primary MeSH terms used in all searches included "pregnancy, prolonged/" and "post$ pregnan$.tw."
The searches yielded 701 English-language articles. Abstracts from these articles were reviewed against the inclusion/exclusion criteria by six physician investigators, with assistance from one senior medical student. A team of two investigators reviewed each abstract; when no abstract was available, the title, source, and MeSH words were reviewed. At this stage, articles were included if requested by one member of the team. At the full-text screening stage, two investigators independently reviewed each article, and disagreements were resolved through discussion.
Each screened article was coded according to three topic areas: (a) testing: two or more tests were compared in terms of accuracy or agreement of test results, or the test result was correlated with some health outcome; (b) management: the article addressed the relative effectiveness of planned induction versus expectant management or the relative effectiveness of an induction agent; and (c) testing and management: some combination of the above.
Included study designs were determined by the article's topic area. Study designs for articles on testing or testing and management included randomized controlled trials, cohort studies, and large case series (at least 20 subjects). The only study design included for management articles was the randomized controlled trial.
Studies of these types were included if they met the following criteria:
Study population included women with prolonged pregnancy.
Study provided data relevant to at least one of the four key questions described above.
Study reported health outcomes, use of health services, or economic outcomes related to the management of prolonged pregnancy.
Exclusion criteria included:
Article was not original research.
Article did not address prolonged pregnancy.
Study design was a single case report.
Study design was a small case series with fewer than 20 subjects.
Article evaluated testing, but data provided were insufficient to construct 2-by-2 tables of test sensitivity and specificity.
Teams of two investigators performed the data abstraction for eligible articles identified at the full-text screening stage. For each included article, one physician completed the data abstraction form, and the other served as an "over-reader." The information from the data abstraction form -- including details on study characteristics, patient population, outcomes, and quality measures -- was then summarized into evidence tables. Data abstraction assignments were made based on clinical and research interests and expertise.
Using criteria developed for prior evidence reports, the researchers evaluated each article for the presence or absence of factors influencing internal and external validity. These criteria were:
For management articles: Randomized allocation to treatment and appropriate methods of randomization; adequate description of the patient population to allow comparison with the intended patient population, including descriptions in terms of gestational age, criteria used to assign gestational age, and measurement of baseline cervical ripeness; description of criteria used to make management decisions associated with primary outcomes such as cesarean delivery; and recognition and discussion of important statistical issues such as sample size and use of appropriate tests.
For testing articles: The above criteria, plus description of an implicit or explicit reference standard, discussion of issues of verification bias, measurement of test reliability, and adequate description of the testing protocol.
The researchers also examined discharge data from the Healthcare Cost and Utilization Project (HCUP) Nationwide Inpatient Sample maintained by the Agency for Healthcare Research and Quality. This database contains administrative discharge data from over 1,000 hospitals in 22 States (at the time of the review), representing a stratified sample of 20 percent of U.S. hospitals. The researchers used these data to provide supplemental information on differences in the epidemiology and outcomes of prolonged pregnancy between ethnic and socioeconomic groups. Using ICD-9 codes, they divided all deliveries into "preterm" (644.2x), prolonged (645.x), and "term" (all other delivery codes). The researchers examined differences in outcomes between coded ethnic groups (white, black, Hispanic, Asian/Pacific Islander, American Indian, and other) and by insurance status (Medicare, Medicaid, private/health maintenance organization, self-pay/no insurance, "no charge," and "other") within these categories.
The principal findings of the report are summarized here.
The risk of antepartum stillbirth increases with increasing gestational age. Data from several large studies in the United Kingdom show that, when calculated as deaths per 1,000 ongoing pregnancies, antepartum stillbirth rates begin increasing after 40 weeks, with estimates of 0.86-1.08/1,000 between 40 and 41 weeks, 1.2-1.27/1,000 between 41 and 42 weeks, 1.3-1.9/1,000 between 42 and 43 weeks, and 1.58-6.3/1,000 after 43 weeks. Gestational-age-specific morbidity risks using the same methodology were not available.
There is no direct, unbiased evidence that antepartum testing reduces perinatal morbidity and mortality in prolonged gestation. Retrospective data suggest higher risks of morbidity in women who did not receive testing, but it is unclear whether other factors contributed to these excess risks.
As the sensitivity of antepartum testing for predicting surrogate markers of fetal compromise increases, specificity decreases. Testing strategies involving a combination of fetal heart rate monitoring and ultrasonographic measurement of amniotic fluid volume appear to have the highest levels of sensitivity. However, methodological issues and variability in specific tests and testing strategies prohibit definitive conclusions about which test or combination of tests has the best performance.
Qualitatively, there is a consistent trend seen in studies of antepartum testing: test sensitivity is worse than test specificity, yet test-negative predictive values are greater than test-positive predictive values. This suggests that the high negative predictive values observed are because of an overall low risk of adverse outcomes. Unless test sensitivity increases with increasing gestational age (for which the researchers found no evidence), the negative predictive value will decline as gestational age advances, since the risk of adverse outcomes increases with advancing gestational age. Declining negative predictive values mean higher rates of false-negative antepartum tests and potentially higher rates of perinatal complications.
Although the risk of antepartum stillbirth increases with increasing gestational age, there is no evidence that allows determination of the optimal time to initiate antepartum testing. Specifically, there is no evidence that testing prior to 41 weeks in otherwise uncomplicated pregnancies improves outcomes for either mother or infant.
Both ultrasound and clinical assessment are reasonably sensitive in predicting birthweights greater than 4,000 grams in prolonged pregnancy, but they perform less well at predicting the more clinically relevant weight of greater than 4,500 grams. Evidence from one randomized trial shows that induction of labor based on estimated fetal weight does not improve outcomes for either infant or mother. There also is no evidence that an antepartum diagnosis of birthweight greater than 4,000 grams improves outcomes.
Clinical examination of the cervix may help predict successful induction. However, individual components of the examination exhibit substantial inter- and intraobserver variability.
Published data do not allow estimation of the cost-effectiveness of tests of fetal well-being.
Although not statistically significant in most individual trials, there is a consistent finding that perinatal mortality rates are lower with planned induction at 41 weeks or later compared with expectant management, a finding confirmed by formal meta-analysis. Based on the observed absolute risk difference in the meta-analysis, at least 500 inductions are necessary to prevent one perinatal death. Whether this is an acceptable trade-off at either the policy or individual level is unclear.
Other perinatal outcomes did not appear to differ significantly between induction and expectant management groups.
Maternal outcomes did not differ between women managed with antepartum monitoring or with planned induction in the included studies. Specifically, overall rates of cesarean section did not differ, either globally or in subgroup analysis. Subgroup analysis of one large trial suggested this was due to very high rates of cesarean section in women managed with antepartum testing who were induced because of abnormal antepartum testing, reaching a predefined induction date, or other indications.
Only one large trial reported costs. Based on 1992 costs and care provided, the study found that planned induction at 41 weeks was less expensive than expectant management with antepartum testing. However, because of significant changes in the technologies used and the economics of medicine in the interim, additional research is needed to better understand the cost implications of these two strategies.
There is a remarkable lack of data on patient-oriented outcomes, such as quality of life or measures of patient preferences for different outcomes or for different processes to achieve those outcomes.
Castor oil given at term appears to be effective in promoting labor, with a consistent side effect of maternal nausea; whether other outcomes of interest are affected is unclear. Conclusions about safety cannot be drawn.
Manual nipple stimulation at term may promote labor, but effectiveness may depend on the protocol used and patient adherence to the protocol. Currently available data are insufficient to draw conclusions about either effectiveness or safety.
Data on the safety and effectiveness of electrical breast stimulation as a method for inducing labor in prolonged gestation are inconclusive because of small sample size and a low proportion of subjects induced for an indication of prolonged pregnancy.
Data on the safety and effectiveness of relaxin are limited, and no conclusions can be drawn.
Sweeping of the membranes at or near term is effective in promoting labor and reducing the incidence of induction for prolonged gestation. There is no increase in adverse maternal outcomes.
In general, there is a tradeoff between the effectiveness of induction agents in terms of achieving delivery and shortening the time to delivery, on the one hand, and risks of uterine tachysystole, hyperstimulation, and potential fetal compromise on the other. In increasing order of effectiveness, slow-dose oxytocin is followed by fast-dose oxytocin; PGE2 appears more effective than oxytocin; and misoprostol is more effective than PGE2. The heterogeneity of the patient populations in the published literature prohibits conclusions about the benefits and risks of these agents when used in the induction of labor in prolonged pregnancy, either for women induced electively or for women with abnormal fetal surveillance. All studies were underpowered to detect differences in many important outcomes related to safety of induction agents.
Mifepristone (RU-486) is consistently effective in reducing the time to labor and the time to delivery in women after 41 weeks. However, all three published trials reported nonsignificant trends toward higher rates of intermediate markers of fetal compromise, including abnormal fetal heart rate tracings and low Apgar scores.
Data on costs associated with the use of different methods for induction are insufficient to allow conclusions about cost-effectiveness.
The current published literature on the epidemiology and management of prolonged pregnancy does not provide information on the potential effects of race and ethnicity, socioeconomic status, or age on the incidence and outcomes of prolonged pregnancy.
Based on administrative data, the proportion of deliveries occurring after 42 weeks does not appear to differ between ethnic groups, despite clear differences in the proportions delivering at earlier gestations.
Based on administrative data, black women with prolonged pregnancy are more likely to have low birthweight infants than white or Hispanic women. Black women also are more likely to have diagnoses of intrauterine growth restriction and oligohydramnios during prolonged pregnancies.
Based on administrative data, women with prolonged pregnancies who are on Medicaid or have no insurance are more likely to have growth restriction and oligohydramnios compared with women who have private insurance.
Future research on the management of prolonged pregnancy should include the following:
Biomedical research into the mechanisms controlling the initiation of normal labor, the interaction of uterine contractile forces and the pelvic floor, and other factors involved in the process of labor and vaginal delivery is needed.
Estimates of the risk of perinatal morbidity and mortality in the United States need to be generated from a variety of complementary data sources. Ideally, an estimate of these risks by gestational age and in women without intervention can be generated and will inform future individual and policy decisionmaking.
Research is needed into the most effective and efficient ways of determining gestational age during prenatal care.
Surrogate markers for fetal compromise need to be identified that are less susceptible to bias and observer variability and more clinically relevant than current markers.
Study designs for evaluating fetal testing need to minimize the effects of verification bias and avoid outcomes that may be influenced by the test results.
Sample size estimates for studies of interventions to induce labor should be based on the power to detect clinically relevant outcomes. In particular, adequate power to determine safety is needed.
Studies of interventions designed to induce labor should provide data on the benefits and risks of these interventions in women induced solely because of advancing gestational age and in women followed with antepartum testing because of prolonged gestation who are induced because of abnormal test results.
Research is needed to identify markers that reliably and reproducibly predict the probability of successful induction.
Appropriate statistical measures of central tendency and of significance testing should be used in studies of both testing strategies and induction interventions.
Data on the medical and nonmedical costs associated with prolonged gestation and its management are needed. Research into economic outcomes should consider the effects of policy changes on issues such as staffing.
Data on patient preferences for management strategies and outcomes are needed.
This report presents the results of a systematic review of the available evidence on the benefits, risks, and costs of different strategies for managing prolonged pregnancy to avoid adverse perinatal and maternal outcomes. It was prepared for the Agency for Healthcare Research and Quality by investigators at the Duke Evidence-based Practice Center, Durham, NC.
The "normal" length of gestation has traditionally been defined as 40 weeks, or 280 days, after the first day of the last menstrual period. This figure is used to calculate the "estimated date of confinement" or "due date." Postterm pregnancy is defined by the American College of Obstetricians and Gynecologists (ACOG) as a gestation longer than 42 weeks, or 294 days, from the onset of the last menstrual period (Anonymous, 1997). It has long been recognized that the risk of adverse fetal outcomes, such as stillbirth, meconium aspiration, asphyxia, and the dysmaturity syndrome, is increased as gestational age progresses beyond 42 to 43 weeks (Mannino, 1988). However, the appropriate gestational age at which a pregnancy should be considered "high risk" for reasons of advancing gestation alone is unclear for several reasons. We discuss issues surrounding the concept of "normal" gestational age in this section, then review the data on risks associated with advancing gestational age.
The mechanisms involved in the onset of normal labor in humans are a complex interaction between the fetus, placenta, uterus, and cervix. The fetal central nervous system may play a key role. Changes in circulating hormones produced by the placenta, such as progesterone, and in local production of prostaglandin and other cytokines, intercellular communication between uterine smooth muscle cells, and changes in extracellular matrix in both the uterus and the cervix are all important, but the exact cascade of events involved remains to be elucidated. Given this complexity, normal variability in the length of otherwise uncomplicated pregnancies should be expected. Most women who have prolonged gestation likely represent one extreme of normal variability in gestational age; in other women, or in specific pregnancies in an individual woman, the mechanisms involved in preparing for labor or signaling the onset of labor may differ.
The most recent ACOG review of the subject of "postterm" pregnancy cites estimates of 3-14 percent of all pregnancies (Anonymous, 1997). Estimates of the proportion of pregnancies delivering after 41 or 42 weeks are subject to variability because of variable accuracy in dating. Randomized trials of routine screening with ultrasound in the second trimester have consistently shown that routine screening reduces the proportion of women induced for prolonged pregnancy when compared with selective screening (Crowley, 2000). Since routine ultrasound screening is not the standard of care in the United States, population-based estimates will necessarily be subject to error. The most recent available data from birth certificates (1999) suggest that 39.6 percent of all deliveries in the United States occur at 40 weeks or beyond, 18.7 percent at 41 weeks or beyond, and 7.4 percent at 42 weeks and beyond (Ventura, Martin, Curtin, et al., 2000). Because these data include women who delivered prematurely, either through spontaneous preterm labor or because of other pregnancy complications, and women who were induced for other reasons, the data cannot be used to estimate mean or median gestational age. Interestingly, the proportion of all births between 40 and 42 weeks is somewhat lower for black women compared with white or Hispanic women, reflecting the higher risk of preterm delivery in black women. However, the proportion of women delivering after 42 weeks is similar among all three ethnic groups. If errors in gestational dating are randomly distributed among the three groups, then this suggests that true "postterm" pregnancies may be due to true differences in the biological process initiating labor in these pregnancies, rather than representing the extremes of the distribution of normal gestational length.
Even the concept of "normal" pregnancy length is more complex than it first appears. One possibility is to define it as the mean, median, or mode for all pregnancies, perhaps stratified by parity and race, with some predefined range that captures the majority of the population. This value would inevitably be skewed by preterm deliveries, both spontaneous and induced for other complications; however, this length would still be "normal" in the sense that it conveys the expected length of the gestation for any woman at the beginning of the pregnancy. Since every woman has some nonzero risk of preterm delivery at the start of the pregnancy, "normal" length defined in this manner has some meaning.
Alternatively, "normal" length can be defined as the length of gestation in women who have uncomplicated pregnancies, labors, deliveries, and perinatal outcomes in the absence of any obstetric intervention. One could then divide pregnant women into three separate populations: (1) those with normal outcomes in the absence of intervention; (2) those requiring intervention and/or experiencing adverse outcomes associated with preterm delivery; and (3) those requiring intervention and/or experiencing adverse outcomes associated with late delivery. We did not identify any reports that characterized gestational length in this manner. Such an exercise might prove useful as an alternative method for discussing risks associated with prolonged gestation. In other words, most of the literature addresses the question: "Given gestational age, what is the likelihood of adverse outcomes?" Clinically, this is very reasonable. An alternative way to think about the problem when defining "normal" length of gestation is to ask the following two questions: "Given a good outcome without any intervention, what is the average gestational age?" And (for the two populations of preterm and term or later pregnancies): "Given an adverse outcome, what is the average gestational age?"
Prior to the ready availability of ultrasound in the 1980s, estimation of gestational age based on menstrual dates alone was often inaccurate. For example, women who conceived soon after stopping oral contraceptives were more likely to have prolonged gestations in one series (Keng and Eng, 1982). Even with accurate recall of dates, there will be some variability in gestational age estimation because the 40-week estimate is based on an assumption of an "ideal" 28-day menstrual cycle, with ovulation on day 14. Because the follicular phase is often quite variable (ranging from 7 to 21 days), this assumption (upon which most gestational age calculators are based) will inevitably lead to some over- or underestimation of gestational age and can lead to errors in understanding the relationship between gestational age, birthweight, and pregnancy outcome (Gjessing, Skjaerven, and Wilcox, 1999).
The availability of ultrasound in most sites in the United States has substantially improved the ability to estimate gestational age more precisely. Randomized trials of routine versus selective screening with ultrasound in the second trimester have consistently found a reduced incidence of induction of labor for prolonged pregnancy in the routine screening groups, presumably because of more accurate dating (Crowley, 2000). However, ultrasound itself has a nonnegligible degree of error. The error is approximately ± 1 week for scans done in the first trimester, ± 2 weeks for scans done in the second trimester, and ± 3 weeks for scans done in the third trimester (ACOG, 1997). Thus, even for women with early ultrasound dating, the "true" gestational age falls within a 14-day window of time; that is, some women with a recorded gestational age of 41 weeks will actually be 42 weeks, and some will actually be 40 weeks. In addition, because ultrasound dating is based on embryonic or fetal size, an association between size at the time of the ultrasound and later outcomes can create systematic bias in assessing gestational age-associated risk (Henriksen, Wilcox, Hedegaard, et al., 1995). For example, ultrasound dating will consistently overestimate the gestational age of larger than average fetuses. This early overestimation of gestational age could create a bias that would lead to an overestimation of the association of advanced gestational age and macrosomia. On the other hand, gestational age will be consistently underestimated for smaller than average fetuses. If some conditions that lead to low birthweight manifest themselves very early in pregnancy, then this will lead to an underestimation of the association of conditions associated with low birthweight and advancing gestational age.
The effects of uncertainty in dating pregnancy are not insignificant. Population-based estimates of the outcomes of pregnancy by gestational age, clinical trial data, and policy and clinical decisions based on these data are all dependent on the accuracy of the determination of gestational age.
The population of pregnant women with "prolonged" pregnancy thus likely represents at least two distinct groups:
Women in whom gestational age is overestimated because of the inherent error of all methods of dating.
Women whose pregnancies are correctly dated. Some of these women may represent the outer limits of normal variability. Others may have underlying defects in the mechanisms signaling the onset of labor.
It is likely that the risk of adverse outcomes varies among these groups. Many of the monitoring strategies discussed throughout this report are designed to identify fetuses at higher risk of adverse outcomes. The following section discusses the adverse outcomes associated with prolonged gestation, as well as the degree to which the risk of these outcomes is related to gestational age.
Adverse fetal outcomes associated with advancing gestation can be divided into two categories:
Those associated with decreased uteroplacental function, resulting in oligohydramnios, reduced fetal growth, passage of meconium, asphyxia, and, potentially, stillbirth.
Those associated with continued normal placental function, resulting in continued fetal growth, with a subsequent increased risk of trauma during birth, including shoulder dystocia with possible permanent neurologic injury.
Adverse physical consequences to the mother resulting from prolonged gestation include those associated with increased fetal size, including an increased risk of short-term trauma to the pelvic floor, vagina, and perineum (as well as a possible longer-term risk of pelvic floor dysfunction), and postpartum hemorrhage. Interventions performed to reduce the risk of perinatal morbidity and mortality, such as induction of labor or cesarean section, have iatrogenic risks, such as infection, hemorrhage, and surgical injury. In addition, any adverse outcome for an infant will obviously have significant emotional impact on the mother.
| Study | Location | Dates | N | Stillbirth Risk per 1,000 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Denominator: 1,000 Deliveries in Specified Gestational Week | Denominator: 1,000 Continuing Pregnancies | ||||||||||||||||
| 37 | 38 | 39 | 40 | 41 | 42 | 43 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | ||||
| Yudkin, Wood, and Redman, 1987 | Oxford | 1978-1985 | 40,888 | 2.14 | 0.43 | 1.24 | 0.42 | 0.29 | 1.24 | ||||||||
| Hilder, Costeloe, and Thilaganathan, 1998 | London | 1989-1991 | 171,527 | 6.2 | 3.8 | 2.2 | 1.5 | 1.7 | 1.9 | 2.1 | 0.35 | 0.56 | 0.57 | 0.86 | 1.27 | 1.55 | 2.12 |
| Cotzias, Paterson-Brown, and Fisk, 1999 | London | 1989-1991 | 171,527 | - | - | - | - | - | - | - | 1.55 | 1.37 | 1.19 | 1.08 | 1.21 | 1.30 | 1.58 |
| Smith, 2001 | Scotland | 1985-1996 | 700,878 | - | - | - | - | - | - | - | 0.4 | 0.4 | 0.5 | 0.9 | 1.2 | 1.9 | 6.3 |
Hilder, et al., examined data from 171,527 births from the North East Thames Region in London (Hilder, Costeloe, and Thilaganathan, 1998). Stillbirth rates calculated as a percentage of all deliveries declined from 6.2/1,000 at 37 weeks to 1.5/1,000 at 40 weeks, then began to increase again with advancing gestational age (1.7 at 41 weeks, 1.9 at 42 weeks, and 2.1 at 43 weeks or more). The pattern was slightly different when risk was estimated as stillbirths per 1,000 ongoing pregnancies: 0.34 at 37 weeks, 0.70 at 38 weeks, 0.83 at 39 weeks, 1.57 at 40 weeks, 1.48 at 41 weeks, 3.29 at 42 weeks, and 3.71 at 43 weeks and beyond.
Cotzias, Paterson-Brown, and Fisk (1999) performed a reanalysis of the data set used by Hilder's group. In addition to estimating the number of stillbirths in a given gestational age divided by the number of ongoing pregnancies, the authors also estimated the "prospective stillbirth risk," the total number of stillbirths at or beyond a given gestational age divided by the total number of pregnancies at or beyond that age, multiplied by 1,000. Other data sets were used to estimate the proportion of singleton births and the proportion of stillbirths occurring in singleton pregnancies, as well as the proportion of stillbirths that were unexplained by anomalies or other recognized fetal and maternal complications. Using this methodology, the risk for unexplained stillbirth in singleton pregnancies was highest at 37 weeks (1.55/1,000), declined to a low of 1.08/1,000 at 40 weeks, then increased again to 1.58/1,000 at 43 weeks. The high rates at lower gestational ages may reflect this methodology.
Most recently, Smith (2001) analyzed data from Scotland for the period 1985 through 1996. This analysis has several advantages over the previous ones. First, the number of deliveries is considerably larger, resulting in greater precision of risk estimates. Second, stillbirths are divided into antepartum and intrapartum stillbirths, a distinction that has clinical relevance, since clinical strategies for preventing each of these might be quite different. Third, congenital anomalies were explicitly excluded. Fourth, life table methods were used to account for censoring resulting from deliveries within a given observation period. Fifth, the time period is considerably later, making the results more likely to reflect current clinical management, at least in the United Kingdom. Finally, cumulative probabilities for stillbirth at each gestational age were estimated.
The advantage of cumulative probability is that it captures the risk of death in preceding gestational ages. Smith (2001) uses the metaphor of Russian roulette to explain the difference between conditional probability and cumulative probability: the risk with each pull of the trigger is 1 in 6, but the risk of death for someone taking his fifth shot is greater than for someone taking his first shot. For example, Smith estimated the conditional probability of stillbirth at 43 weeks as 6.3/1,000 ongoing pregnancies, while the cumulative probability was 11.5/1,000 ongoing pregnancies. This difference represents the effects of stillbirths occurring before 43 weeks. The potential clinical significance of this is that achieving the absolute minimum cumulative stillbirth probability may require interventions at earlier gestational ages.
Consistently, the risk of stillbirth in the above-described studies rises with advancing gestational age, and this increase appears to begin at 39-40 weeks when estimated using the number of ongoing pregnancies as the denominator. One limitation of these studies is that they were all performed in the United Kingdom, and the degree to which the risks would differ in a different population with different clinical management is unclear. Another limitation is that other potential causes of perinatal mortality, such as maternal diabetes or hypertension, are not explicitly accounted for in these data sets. Also, autopsy verification that fetal anomalies or other anatomic causes of death did not occur was not performed. However, a recent Norwegian case-control study of unexplained stillbirth, in which autopsy verification was performed and logistic regression was used to control for documented maternal disease, found that increasing gestational age remained a significant risk factor for unexplained stillbirth, along with maternal age, smoking, obesity, and low educational level. Interestingly, parity was not a risk factor in the multivariate analysis (Froen, Arnestad, Frey, et al., 2001).
It should be pointed out that the risk of stillbirth in these studies remains quite low at an absolute level. The point at which the risk becomes unacceptable and justifies intervention is unclear and is likely to be influenced by each couple's feelings about the tradeoffs between intervention and no intervention.
Two other studies provide additional indirect evidence of increased risk of death with prolonged gestation. Bastian, Keirse, and Lancaster (1998) compared outcomes of all planned home births in Australia from 1985 through 1990 with all Australian births in the same time period and home births in other countries. The planned home birth perinatal death rate was 6.4/1,000 (46/7,002 total home births). Of the 44 deaths with known gestational age, seven (15.9 percent) were greater than 42 weeks. On chart review, six of these deaths, or 28.6 percent of the total, were classified as due to intrapartum asphyxia; prolonged pregnancies represented 10.7 percent of all home births. Overall, the mortality rate for home births in infants over 42 weeks was twice that for other home births. The authors point out that other conditions associated with perinatal mortality are much less common in the home-birth population, so that the excess mortality observed is unlikely to be solely due to the confounding effects of other complications, such as preeclampsia or diabetes.
Mehl-Madrona and Madrona (1997) reviewed self-reported data from midwives in the western United States between 1970 and 1985. A total of 4,361 midwife-attended home births were compared with 4,107 family-practitioner-attended home births performed in California and Wisconsin during the same time period. Sampling frames and response rates were variable, as were the data collection instruments. Deliveries were matched by maternal age, insurance status, parity, and presence of risk factors. Midwives were significantly more likely to deliver postdate pregnancies, defined as gestational age greater than 42 weeks, than were family practitioners (midwives also were more likely to deliver breech and twin pregnancies). Mortality rates were significantly higher for midwives compared to family practitioners, a difference that was attributable entirely to more postdate, twin, and breech deliveries in the midwife group.
Both of these studies are limited by issues concerning accuracy of dating, completeness of reporting, confirmation of causes of death, and in the case of the Mehl-Madrona paper, a rather complicated sampling scheme and questions about the true comparability of groups. There also are concerns about generalizability in terms of current midwifery practice in the United States. However, patients who select home birth are, by definition, low-risk patients. They also are unlikely to have undergone antepartum testing. The excess mortality seen in women with prolonged pregnancy delivering at home in these two studies is consistent with an independent effect of increasing gestational age on perinatal mortality.
Analysis of data from the Medical Birth Registry of Norway from 1978 to 1987 found that the risk of perinatal death was over five times higher in infants below the 10th percentile of birthweight for their gestational age (odds ratio [OR], 5.68; 95 percent confidence interval [CI], 4.37 to 7.38) than in infants from the 10th to 90th percentile (Campbell, Ostbye, and Irgens, 1997), after adjustment for a variety of potential confounding variables, such as maternal complications like diabetes. Maternal age > 35 years was also a risk factor in multivariate analysis (OR, 1.88; 95 percent CI, 1.22 to 2.89). Infants above the 90th percentile in weight had a decreased mortality risk (OR, 0.51; 95 percent CI, 0.26 to 1.00). A similar relationship between perinatal mortality in prolonged pregnancy and low birthweight was found in a review of Swedish registry data from 1987 through 1992 (Divon, Haglund, Nisell, et al., 1998). These observations are consistent with a hypothesis that decreased uteroplacental function, leading to growth restriction, oligohydramnios, and eventually asphyxia, is one of the major risks of advancing gestational age, although changes in weight occurring after death and prior to delivery may explain some of this phenomenon. What is not clear is whether the decreasing uteroplacental function is an inevitable result of advancing gestational age, or whether failure to go into labor is somehow a marker for some forms of uteroplacental insufficiency.
The Norwegian data are limited by the population (results may not be generalizable to a more diverse U.S. population), accuracy of dating (gestational age in the registry is based on last menstrual period), and time (obstetric management has changed somewhat since 1987). However, the observed association between low birthweight and perinatal mortality in a genetically homogeneous population with a relatively high standard of living and level of access to prenatal care suggests that this is at least partly a reflection of changes in the biology of the uterus, placenta, and/or fetus associated with prolonged pregnancy.
Another issue that should be considered in reviewing recent population-based data on perinatal mortality is the degree to which observed perinatal deaths are preventable. It is unclear from population-based administrative data what proportion of unexplained stillbirths after 40 weeks gestation occurred in women undergoing some form of antenatal surveillance. This information is important for two reasons. First, in order to estimate the benefits of antenatal surveillance at different gestational ages quantitatively, the baseline gestational-age-specific risk, in the absence of surveillance, is needed. Second, if current mortality data reflect mostly women who are undergoing surveillance, then the limits of currently available technology may have been reached; in this case, the only strategy available for further reducing perinatal mortality would be elective induction of labor at a predefined gestational age. This is supported by the findings of a Cochrane meta-analysis (Crowley, 2000), which showed an excess of perinatal mortality in the testing arms. Conversely, if current mortality data reflect women who are not undergoing surveillance, then greater efforts are needed to ensure access to currently available technologies.
In the Norwegian database, risks for fetal distress in labor (relative risk [RR], 1.68; 95 percent CI, 1.62 to 1.72) and shoulder dystocia (RR, 1.31; 95 percent CI, 1.21 to 1.42) were significantly increased in infants born after 42 weeks compared with infants born between 39 and 42 weeks (Campbell, Ostbye, and Irgens, 1997). Others also have noted an association between prolonged pregnancy and increased fetal weight and/or shoulder dystocia (Acker, Sachs, and Friedman, 1985; Eden, Seifert, Winegar, et al., 1987; Nocon, McKenzie, Thomas, et al., 1993; Sarno, Hinderstein, and Staiano, 1991).
Data on longer term outcomes of infants born after prolonged gestations are relatively sparse. One Irish case-control study reported an association between prolonged pregnancy and neonatal seizures (Curtis, Matthews, Clarke, et al., 1988). In a study of British children with cerebral palsy, there was a strong association between maternal gestational age greater than 41 weeks and the presence of neonatal encephalopathy (defined as having both signs of neonatal neurological abnormalities and depression at birth, defined as a 1-minute Apgar score less than 6) (OR, 3.5; 95 percent CI, 1.0 to 12.1). This risk was particularly marked in primigravid women (OR, 11.0; 95 percent CI, 1.5 to 102.5). The infants studied also were more likely to have had induction of labor (indications not specified), long second stage of labor, meconium-stained amniotic fluid, and emergent cesarean section or operative vaginal delivery.
On the other hand, prospective studies have not shown an association between prolonged pregnancy and adverse physical or mental development at 1 or 2 years, even when stratified by presence or absence of the dysmaturity syndrome (Shime, Librach, Gare, et al., 1986).
In summary, available data are insufficient to quantify the degree of excess risk, if any, of perinatal morbidity (including neurological morbidity) associated with prolonged pregnancy.
Maternal risks of obstetric trauma and hemorrhage are increased in prolonged pregnancy compared with term pregnancy (Campbell, Ostbye, and Irgens, 1997). Labor abnormalities also are increased. All three of these may be related to an increased risk of macrosomia. Another potential reason, as stated above, is that some women who do not go into labor within the "normal" length of gestation have differences in the physiology of labor and delivery compared with women who begin labor earlier in gestation.
Interventions performed to prevent adverse outcomes associated with prolonged gestation have the potential for complications, most notably hyperstimulation resulting from too frequent uterine contractions, infection, bleeding, or organ injury from cesarean section.
Prolonged gestation is associated with an increased risk of perinatal death, as well as perinatal morbidities related to either uteroplacental insufficiency or fetal macrosomia. Direct maternal risks are potentially related to fetal macrosomia or to interventions used in the management of prolonged pregnancy. The gestational age at which the risk of adverse direct perinatal or maternal outcomes justifies the costs and potential complications of active intervention is unclear.
The purpose of this evidence report is to review the evidence regarding strategies to reduce the risks of adverse maternal and fetal outcomes associated with advancing gestational age. Because of the issues discussed above, we did not limit our review to interventions performed after a predefined gestational age cut-point. Although "postterm" pregnancy technically refers to gestations beyond 42 weeks, and "postdate" to pregnancies beyond 40 weeks, others have used the phrase "prolonged pregnancy." The appropriate gestational age range upon which this report should focus proved a lively topic for debate among the members of the project's advisory panel of technical experts. However, consensus was reached that the primary focus should be on managing those risks associated with advancing gestational age, with an attempt at quantifying the gestational-age-specific risk. Because of this scope, we use the term "prolonged pregnancy" throughout this report, to avoid confusion with terminology associated with specific gestational age definitions. We use "postterm" and "postdate" only when specifically referred to in articles under discussion.
There is an inherent uncertainty associated with any estimate of gestational age. However, risks of certain adverse outcomes for both mother and infant clearly increase as gestational age increases after 37-38 weeks. Strategies to minimize these risks may themselves carry certain risks. The ultimate goal of this report is to provide a framework for rationally comparing these competing risks, and to help patients, clinicians, and policymakers decide for themselves the best options for managing prolonged gestation in their particular situation.
The key research questions addressed in the report were developed by the Agency for Healthcare Research and Quality (AHRQ) and our report partner, ACOG, and refined in consultation with AHRQ, ACOG, and the project's advisory panel of technical experts. The questions were as follows:
What are the test characteristics (reliability, sensitivity, specificity, predictive values) and costs of measures used in the management of prolonged pregnancy to (a) assess risks to the fetus and mother of prolonged pregnancy, and (b) assess the likelihood of a successful induction of labor?
What is the direct evidence comparing the benefits, risks, and costs of planned induction versus expectant management at various gestational ages?
What are the benefits, risks, and costs of currently available interventions for the induction of labor?
Are the epidemiology and outcomes of prolonged pregnancy different for women in different ethnic groups, different socioeconomic groups, or in adolescent women? This question reflects AHRQ's programmatic interest in identifying health disparities attributable to age, race/ethnicity, and socioeconomic status.
Our approach to addressing each of these questions was to identify and evaluate the relevant literature and supplemental data (if any); report the results; and where evidence was lacking or methodological limitations in the available sources precluded drawing firm conclusions, identify the issues needing resolution in order to answer the question.
Because the primary focus of the report is on clinical issues surrounding advancing gestational age, we did not systematically review the basic science literature on the initiation of labor, the physiology of the gravid uterus and cervix, placental function, or any of the other topics critical to a comprehensive understanding of these issues. The Duke team, AHRQ, ACOG, and the advisory panel all agreed that the time, effort, and additional expertise required to systematically review this literature precluded their inclusion in this evidence report.
Based on the key research questions, our preliminary review of the literature, and discussions with the advisory panel, we considered the following interventions to reduce risks to the fetus or mother associated with advancing gestational age.
Testing:
Tests to determine risk of stillbirth or compromise related to prolonged gestation:
Maternal measurement of fetal movement.
Nonstress test (NST).
Contraction stress test (CST), using either nipple stimulation or oxytocin.
Amniotic fluid measurements.
Biophysical profile, using either five measures (reactive NST, breathing, tone, movement, amniotic fluid) or two measures (NST, amniotic fluid).
Doppler measurements of umbilical or fetal cerebral blood flow.
Tests to determine the risk of macrosomia.
Estimation of fetal weight:
- Maternal judgment.
- Clinical examination.
- Ultrasound.
Tests to estimate likely success of induction of labor.
Clinical estimation of cervical ripeness (Bishop score).
Fibronectin.
After discussion with the advisory panel, we did not include tests of fetal well-being that are no longer in widespread clinical use, such as estriol.
Management options other than testing:
No intervention (neither induction nor testing).
Interventions to prevent prolonged pregnancy:
- Scheduled sweeping of membranes.
Planned induction:
- 41 weeks.
- 42 weeks.
- Later timing
Testing for fetal well-being (using tests described above):
- Varied time of initiation (40, 41, 42 weeks).
- Varied frequency.
Specific agents/interventions used for the induction of labor:
Amniotomy.
Castor oil.
Extra-amniotic saline instillation.
Relaxin.
Sweeping of the membranes.
Foley catheter.
Nipple stimulation.
Oxytocin.
Prostaglandins:
- Prostaglandin E2 (gel, tablets, and inserts).
- Misoprostol.
Mifepristone.
We did not systematically review certain other interventions that may play a role in managing prolonged pregnancy. Although we discuss the effect of ultrasound estimation of gestational age on the diagnosis of prolonged pregnancy above, we did not attempt to systematically review the literature on the other potential benefits, risks, and costs of routine ultrasonography in early pregnancy. Attempting to place the potential benefits of accurate gestational dating for managing advancing gestational age in the context of the other possible outcomes associated with routine ultrasound screening was well beyond the scope of the report and beyond the resources available. Similarly, we did not systematically review the literature on intrapartum interventions used in the management of common complications of prolonged pregnancy (such as oligohydramnios or meconium-stained amniotic fluid) unless identified articles clearly included data on prolonged pregnancy.
The primary patient population considered in this report was pregnant women with a single fetus in the vertex position, approaching or past the estimated date of confinement, without any other medical or obstetrical complications, where the only potential factor increasing the risk of an adverse perinatal or maternal outcome was advancing gestational age. We also examined the potential interaction of this risk with age and race/ethnicity. Our findings are specifically not applicable to women with prior cesarean section, for several reasons:
Prior cesarean section was an exclusion criteria in the vast majority of the randomized trials of management strategies and induction agents; thus, we are unable to generalize these results.
Recent observational data (Blanchette, Nayak, and Erasmus, 1999; Lydon-Rochelle, Holt, Easterling, et al., 2001; Plaut, Schwartz, and Lubarsky, 1999) suggest that risk of uterine rupture is increased in women with prior cesarean section undergoing induction of labor, especially with prostaglandins. Incorporating an evaluation of this evidence into the report would have required an additional consideration of the general risks and benefits of vaginal birth after cesarean section, which is well beyond the scope of this report.
Practice settings where the interventions discussed in this report may potentially be considered for use include:
Hospitals.
Free-standing birthing centers.
Patients' homes.
Prenatal clinics or other facilities where ambulatory prenatal care is delivered.
The primary target audiences for the evidence report are groups involved in writing guidelines or educational documents on management of prolonged pregnancy for health care professionals. Secondary audiences include:
Health care professionals providing care for pregnant women (obstetricians, family physicians, nurse-midwives, nurses, childbirth educators, etc.).
Policymakers involved in coverage/payment decisions.
Agencies, foundations, and other groups involved in funding research.
Media involved in dissemination and education about health issues.
Patients with an interest in reviewing the state of the art of the medical literature concerning management of prolonged pregnancy.
In this chapter, we describe the basic methodology used to develop the evidence report, from topic assessment and refinement through the literature search, screening, and data abstraction process. Included are descriptions of the literature search strategies and results, literature sources, screening and grading criteria, quality control procedures, and supplemental data sources.
A national advisory panel of technical experts was convened to work with the Duke research team. The 11-member panel included representatives from obstetrics-gynecology, including maternal-fetal medicine; pediatrics; childbirth education; and midwifery. In addition to the American College of Obstetricians and Gynecologists (ACOG), other major interest organizations represented on the panel included the American College of Nurse Midwives and the Adolescent Pregnancy Prevention Coalition of North Carolina.
Prior to our first conference call, the advisory panel and the Task Order Officer at the Agency for Healthcare Research and Quality (AHRQ) received a document that summarized the incidence and prevalence of prolonged pregnancy, described the characteristics and size of the affected population, identified the most affected practice settings and providers, specified the interventions to be considered, and presented a diagram of the conceptual model/causal pathway. The panel also received the four key questions specified in the task order. Based on Duke's preliminary assessment of the literature and discussion with the advisory panel and AHRQ Task Order Officer, all parties agreed to refine the key questions as follows:
What are the test characteristics (reliability, sensitivity, specificity, predictive values) and costs of measures used in the management of prolonged pregnancy to assess: (a) risks to the mother and fetus of prolonged pregnancy and (b) the likelihood of a successful induction?
What is the direct evidence comparing the benefits, risks, and costs of planned induction versus expectant management at various gestational ages?
What are the benefits, risks, and costs of currently available interventions for induction of labor?
Are the epidemiology and outcomes of prolonged pregnancy different for women in different ethnic groups, different socioeconomic groups, or in adolescent women?
In addition to reaching consensus on the key questions, the advisory panel agreed on the patient population, practice settings, and target audiences of the report, as described in Chapter 1 of this report. The causal pathway is represented in Figure 1
The comprehensive review of the literature, from identification of databases through abstraction of individual articles into evidence tables, was a multi-step, sequential process.
The primary sources of literature were six of the most widely used computerized bibliographic databases: MEDLINE (1980-December 2000), HealthSTAR (1980-December 2000), CINAHL (1983-December 2000), the Cochrane Database of Systematic Reviews (CDSR) (Issue 4, 2000; Issue 1, 2001; and Issue 2, 2001), the Database of Abstracts of Reviews of Effectiveness (DARE), and EMBASE (1980-Jan 2000). Searches of these databases were supplemented by secondary searches of reference lists in all included articles, especially Cochrane review articles, and scanning of current issues of journals not yet indexed in the computerized bibliographic databases. Titles regularly scanned included the American Journal of Obstetrics and Gynecology, the British Medical Journal, the British Journal of Obstetrics and Gynaecology, the European Journal of Obstetrics and Gynecology and Reproductive Medicine, the International Journal of Gynecology and Obstetrics, the Journal of the American Medical Association, the Journal of Maternal-Fetal Medicine, the Journal of Obstetrics and Gynaecology, Obstetrics and Gynecology, the Lancet, and the New England Journal of Medicine. Suggestions regarding search terms and specific articles were solicited from the advisory panel during two conference calls in December 2000 and March 2001 and resulted in additions to the literature database.
We developed the basic search strategies using the National Library of Medicine's MeSH key word nomenclature developed for MEDLINE. The same strategies were used to search HealthSTAR and CINAHL. A Duke University Medical Center librarian checked the strategies and assisted with their translation to the key word structure used by EMBASE. Dr. Evan Myers searched the CDSR and DARE using "postterm pregnancy," "prolonged pregnancy," and similar terms.
The initial searches were performed in MEDLINE and then duplicated in other databases. All searches were limited to articles published since 1980, in the English language, and with human subjects. The cut-off threshold of 1980 was based on the general unavailability of ultrasound prior to that date. It was judged that trials conducted and published prior to 1980 would be problematic both in terms of the accuracy of diagnosis and comparability with current testing and management strategies. The decision to restrict the literature search to articles published since 1980 was agreed to by the members of the advisory panel.
|
|
Inclusion and exclusion criteria were developed for the literature searches so that the yield of articles would be appropriately focused. Empirical studies or review articles were excluded after screening based on the following criteria:
Article was not original research.
Article did not address prolonged pregnancy.
The study design was a single case report.
The study design was a small case series with fewer than 20 subjects.
Each screened article was coded as addressing one of three topic areas:
Testing: Two or more tests were compared in terms of the accuracy or agreement of test results or the test result was correlated with some health outcome.
Management: The article addressed the relative effectiveness of planned induction versus expectant management or the relative effectiveness of an induction agent.
Testing and management: Some combination of the above.
The criteria used to include articles were:
The study population must address prolonged pregnancy; ideally, results should be reported separately for patients with prolonged pregnancy. Because it is possible that the response of the cervix and uterus to induction agents would be quite different in different clinical scenarios (both in terms of labor patterns and potential maternal and fetal side effects), studies of induction agents that did not include any otherwise healthy women with prolonged pregnancy were excluded.
All original research or relevant reviews must relate to at least one of the four key questions described above.
Outcomes were included if they were health outcomes or health services use or economic outcomes related to the management of prolonged pregnancy.
We included only randomized controlled trials (RCTs) which used active or nonactive (i.e., placebo) controls for studies involving management topics. For testing articles, we included RCTs and those cohort and large case series that allowed construction of 2-by-2 tables for estimation of sensitivity and specificity. Articles that did not meet these criteria were not necessarily excluded from the review and often provided valuable background material. However, only articles meeting the inclusion criteria were formally abstracted into evidence tables.
Included study designs were determined by the article's topic area. Study designs initially included for testing articles and testing and management articles were case reports; small case series (< 20 subjects); medium to large case series (> 20 subjects); nonrandomized comparison studies (cohort or case series that used historical or concomitant nonrandomized controls); and RCTs. The study design of each screened article was coded in our literature database.
| Database | Number of unduplicated articles | Percent of total | Number of articles included | Percent of included articles |
|---|---|---|---|---|
| MEDLINE | 458 | 65.3 | 257 | 56.1 |
| EMBASE | 162 | 23.1 | 60 | 37.0 |
| CINAHL, HealthSTAR | 22 | 3.1 | 9 | 40.9 |
| Cochrane Database of Systematic Reviews, DARE | 25 | 3.6 | 15 | 60.0 |
| Other (e.g., manual screening of references lists) | 34 | 4.9 | 23 | 67.6 |
| Total | 701 | 100.0 | 364 | -- |
At the full-text screening stage, each article was independently reviewed by two investigators, who forwarded their decisions to Ms. Jane Kolimaga, the task order manager, for recording and comparison. If indicated, reviewers were asked to reconcile differences of opinion. Overall, the teams initially disagreed on about 25-35 percent of their decisions, and all disagreements were resolved by consensus. In the event that two investigators could not agree, Dr. Evan Myers, the principal investigator, was to be the arbiter, but this situation never arose.
| Description | No. of Unduplicated Articles Identified | Percent |
|---|---|---|
| Abstracts Number of titles and abstracts reviewed | 701 | |
| Number of abstracts included: Management Testing Testing & Management | 511 317 173 21 | 72.9 |
| Number of abstracts excluded: Management Testing Testing & Management | 190 169 11 10 | 27.1 |
| Full-Text Articles Number of full-text articles reviewed | 511 | |
| Number of full-text articles included: Management Testing Testing & Management | 364 231 107 26 | 71.2 |
| Classification of included articles: Management-RCT Management-Review Management-Other (e.g., epidemiology) Management-Basic Science Management-Background Testing-Case Report Testing-Small Case Series Testing-Large Case Series Testing-Non-Randomized Comparison Testing-RCT Testing-Review Testing-Other Testing-Basic Science Testing-Background Testing & Management -Large Case Series Testing & Management -Non-Randomized Comparison Testing & Management -RCT Testing & Management -Review Testing & Management -Background | 85 41 41 10 54 0 5 36 0 0 8 3 5 50 3 0 4 16 3 | |
| Number of full-text articles excluded: Management Testing Testing & Management Other (e.g., not English, inaccurate reference) | 147 80 60 4 3 | 28.8 |
Teams of two investigators performed the data abstraction for eligible articles identified at the full-text screening stage: one performed the primary data abstraction, and the second "over-read" the abstracted information. A data abstraction form was developed prior to initiation of the formal abstraction process. During the development of the form, draft forms were reviewed by the investigators and Dr. Rebecca Gray, a nonclinician abstractor/editor, for clarity and completeness; as the person who converted the abstraction forms into evidence tables, Dr. Gray helped to insure that all relevant information was captured. The two final iterations of the form were pretested by the investigators who used them to abstract relevant data from a sample article. The information from the data abstraction form was then summarized in evidence table format by Dr. Gray. The data abstraction assignments were made by Dr. Myers based on the investigators' clinical interests (e.g., management vs. testing). Copies of the data abstraction form and the evidence table template are provided in Appendixes 1 and 2, respectively.
Outcomes recorded included:
Direct health outcomes:
- Maternal mortality.
- Perinatal mortality.
- Maternal morbidity (specific measures varied between studies; included infection, hemorrhage, perineal trauma, etc.).
- Perinatal morbidity (meconium aspiration, postmaturity syndrome, shoulder dystocia, brachial plexus injury, admission to neonatal intensive care unit).
Surrogate measures:
- Neonatal umbilical artery pH, Apgar scores, meconium-stained amniotic fluid, nonreassuring fetal heart rate tracing.
- Cesarean section rates, overall and by specific indication.
Resource use:
- Costs.
- Time to delivery, proportion of vaginal deliveries within a prespecified time.
Test operating characteristics:
- Sensitivity, specificity, positive and negative predictive values for outcomes listed above.
We evaluated each study included in the evidence tables for factors affecting internal and external validity. For management articles, the elements of the quality scale were as follows:
Were patients randomly assigned to the intervention?
Was the method for randomization described, and if so, was it one shown to be associated with less bias (sealed envelopes) than others (alternating date or medical record number)?
Was the patient population similar to the likely patient population?
Were the intervention protocols clearly described or referenced?
Were the criteria used to make management decisions associated with primary outcomes (such as cesarean section) described?
Statistical issues: Were sample size and power issues discussed? Were the statistical tests used appropriate for the types of data analyzed?
Was the study population described in terms of:
- Gestational age?
- Criteria used to assign gestational age?
- Bishop score or other measure of cervical ripeness?
For testing articles, we used the above criteria plus:
Was an implicit or explicit reference standard defined?
Was the issue of possible verification bias (patients with positive test results more likely to receive the reference standard test or treatment) addressed?
Test reliability/variability: Was inter- or intrarater reliability of the test addressed?
Was the study population well characterized in terms of the absence of risk factors such as diabetes, hypertension, etc.?
Was the testing protocol described in sufficient detail to allow others to replicate it?
Scores on individual quality criteria were not aggregated into an overall score but were considered and reported individually. We preferred this approach for several reasons:
Previous work has shown that aggregated numeric scoring systems may not discriminate well between "high" and "low" quality studies, even for randomized trials (Jüni, Witschi, Bloch, et al., 1999; Moher, Jadad, and Tugwell, 1996)
Development and use of a new quality score would have required additional work for validation.
Identification of specific weaknesses in each study will be helpful in identifying trends, which in turn will assist with our recommendations for future research.
Our approach of describing key design components, rather than assigning a single aggregate score, is also consistent with recent recommendations from an expert panel on meta-analysis of observational studies (Stroup, Berlin, Morton, et al., 2000) and a recent review of the methodology of systematic reviews (Jüni, Altman, and Egger, 2001).
Summaries of the quality evaluation are provided in the evidence table entry for each abstracted article. A "+" indicates that a given criterion was met, a "−" signifies that the criterion was not met. The "+" and "−" notations were assigned by the primary abstractor and confirmed by the over-reader.
We employed quality-monitoring checks at every phase of the literature search, review, and data abstraction process to reduce bias, enhance consistency, and check the accuracy of screening:
Medical librarian review of the literature search strategy.
Review of literature search strategies by the advisory panel of technical experts.
Check on completeness of the literature search results through reference list checks by the screener of each article.
Reconciliation of all differences of opinion by reviewers on all full-text articles.
Agreement of two reviewers for all eligible studies.
Data abstractions completed by one investigator and reviewed (over-read) by another.
Additional checks of evidence table entries for completeness and accuracy by a nonphysician abstractor.
Solicitation of advice at key decision points from the advisory panel of technical experts.
In order to get additional information about possible racial and socioeconomic differences in the incidence and outcomes of prolonged pregnancy, we analyzed data from the 1997 Nationwide Inpatient Sample (NIS) (Nationwide Inpatient Sample [NIS], 1997). The NIS is part of AHRQ's Healthcare Cost and Utilization Project (HCUP) and collects discharge data from a stratified sample of approximately 20 percent of U.S. hospitals. Using ICD-9 codes, we divided all deliveries into "preterm" (644.2x), prolonged (645.x), and term (all other delivery codes). We examined differences in outcomes between coded ethnic groups (white, black, Hispanic, Asian/Pacific Islander, Native American, and "other") and by insurance status (Medicare, Medicaid, private/health maintenance organization, self-pay/no insurance, "no charge," and "other") within these categories.
At the start of every evidence report project, we evaluate the feasibility of and need for meta-analyses, decision analyses, cost-effectiveness analyses, or a combination of all three. A decision about whether to proceed with such analyses is made based on the key questions and the state of the literature, after discussion with AHRQ and the advisory panel. We decided not to perform any supplemental analyses for this report for the following reasons:
Studies of diagnostic and screening tests were too heterogeneous in terms of outcomes assessed to allow meaningful combination.
Studies of individual induction agents did not provide sufficient specific information on women in the population of interest. As with diagnostic test studies, there was considerable heterogeneity in terms of outcomes reported.
We did not identify any significant trials comparing induction to expectant management published subsequent to the most recent Cochrane review (Crowley, 2000). We also did not identify any disagreements with the methods or conclusions of that meta-analysis that were significant enough to justify repeating the analysis.
Lack of adequate cost data precluded cost-effectiveness analysis.
Although a decision-analytic model would be an excellent method for exploring the tradeoffs involved in decisionmaking for management of prolonged pregnancy, the considerations discussed above meant that there would be considerable uncertainty surrounding key parameter estimates. While development of such a model even in the setting of widespread uncertainty has considerable value, our past experience with exploratory models in situations where the literature had similar limitations has been that they are of somewhat limited value in further explaining the specific findings of the report.
The approach used by the Cochrane Collaboration differs from ours primarily in the consistent use of meta-analytic techniques to provide summary estimates of the effectiveness and risks of interventions considered. As stated above, we concluded that the state of the literature either could not support meaningful quantitative synthesis relevant to the specific patient population being considered, or that repeating an already well-done meta-analysis (Crowley, 2000) would not be worthwhile. Where relevant Cochrane reviews exist, we have compared their findings and conclusions with our own. Any differences between our findings and Cochrane analyses may represent different inclusion/exclusion criteria, different patient populations considered, or differences in outcomes considered. We have attempted to identify these potential sources of disagreement wherever possible.
This chapter presents the results of our review, organized around the key questions.
Question 1: What are the test characteristics (reliability, sensitivity, specificity, predictive values) and costs of measures used in the management of prolonged pregnancy to (a) assess risks to the fetus and mother of prolonged pregnancy, and (b) assess the likelihood of a successful induction of labor?
In Chapter 1, we discussed the evidence for increasing risk of adverse outcomes, especially perinatal death, as gestational age advances beyond 40 weeks. Although this risk is small in absolute terms, the trend towards increasing risk with increasing gestational age is consistent across studies. One approach to preventing these adverse outcomes would be to use testing to identify patients most likely to experience them.
However, most of the published literature consists of case series or cohort studies in which there is little or no variation in testing strategies (or variation is not reported). Such studies are less useful but still may contain valuable information concerning the association of test results with fetal and maternal outcomes.
| Test | Number of articles |
|---|---|
| Amniotic fluid volume | 11 |
| Nonstress test | 11 |
| Fetal weight estimation by ultrasound | 5 |
| Biophysical profile - simple | 4 |
| Biophysical profile - complex | 2 |
| Doppler umbilical artery flow | 2 |
| Fetal breathing movements | 2 |
| Fetal heart rate (bradycardia, tachycardia, variation) | 2 |
| Fibronectin | 2 |
| Amniocentesis (meconium staining) | 1 |
| Cervical exam | 1 |
| Combination | 1 |
| Contraction stress test | 1 |
| Fetal motion on ultrasound | 1 |
| Fetal weight estimation - clinical exam | 1 |
| Kick counts | 1 |
We additionally sought data on the reliability of tests, including interobserver variation, when these were available. If a test result is not reproducible when the test is performed by different examiners, or by the same examiner on different occasions, then the utility of the test is reduced, even if the "average" test characteristics (sensitivity, specificity) imply useful discrimination or prediction.
In certain cases, the association of one test result with another was reported without reference to outcomes.
We did not identify any randomized trials in which women with prolonged gestation were randomly assigned to antepartum surveillance or no testing. Of four randomized trials of antepartum cardiotocography versus no surveillance in "high-risk" pregnancies (Brown, Sawers, Parsons, et al., 1982; Flynn, Kelly, Mansfield, et al., 1982; Kidd, Patel, and Smith, 1985; Lumley, Lester, Anderson, et al., 1983) -- also the subject of a systematic review by Pattison and McCowan (2001) -- only one (Flynn, Kelly, Mansfield, et al., 1982) included patients who were being followed explicitly for prolonged gestation (classified as "suspect postmaturity syndrome" in the paper). In this trial, 100 of 300 subjects were being followed for this indication. All patients received either outpatient ("at intervals of not more than 1 week") or inpatient ("at least twice per week") NSTs. Patients were randomized to two groups: in one, clinicians taking care of the patients knew the results of the NST, while in the other group, NST results were not revealed. Although quantitative data were not reported on this, it appears that the majority of the patients with prolonged gestation received outpatient testing between 41 and 42 weeks, when induction was scheduled.
Although results were not reported separately for women with prolonged gestation, there were no statistically significant differences in stillbirths, neonatal deaths, or other adverse neonatal outcomes between the two groups. However, patients in the group in which caregivers knew the results were significantly more likely to be discharged from the hospital before delivery and significantly more likely to receive outpatient care. There also were nonsignificant trends towards fewer antenatal inpatient days and fewer elective cesarean sections in the group whose caregivers were aware of their results.
In this study (Flynn, Kelly, Mansfield, et al., 1982), a nonreactive NST had 100 percent sensitivity for stillbirths with nonlethal congenital abnormalities and a specificity of 88 percent; positive predictive value was nine percent, and negative predictive value 100 percent. None of the deaths were in the prolonged pregnancy group. Test characteristics for surrogates of fetal compromise were less favorable. For fetal distress in labor, sensitivity was 37 percent, specificity 88 percent, positive predictive value 18 percent, negative predictive value 93 percent. Similar trends were seen for meconium and admission to the neonatal intensive care unit: considerably lower sensitivity than specificity, poor positive predictive value, and good negative predictive value. These findings suggests that the effects on management observed in this trial -- consistent trend towards less aggressive observational strategies in the group where the results were revealed to clinicians -- reflect clinically appropriate interpretation of the test results. The high negative predictive values are evidence that a normal test does provide reassurance. Unfortunately, the paper does not allow estimation of test characteristics in the specific population of interest for this report, patients with prolonged pregnancy and no other risk factors.
We did identify two retrospective concurrent cohort studies comparing testing and no testing in women with prolonged pregnancy (Bochner, Williams, Castro, et al., 1988; Fleischer, Schulman, Farmakides, et al., 1985). Fleischer, et al., reported a retrospective cohort study comparing 228 women who had weekly NST monitoring beginning at 41 weeks with 30 women who had no antenatal monitoring (Fleischer, Schulman, Farmakides, et al., 1985). Reasons for women not receiving testing were not specified. Despite the small sample size of the no-testing group, the investigators observed significant differences in most of the outcome variables they reported, including low Apgar score (< 7) at 1 and 5 minutes, neonatal intensive care unit (NICU) admission rates, stillbirth rates, and cesarean section for fetal distress. The small sample of women with no monitoring, the retrospective nature of the study design, and the unusually high rates of adverse fetal and maternal outcomes all suggest that the no-testing group in this study may be dissimilar to the NST monitoring group in other ways besides whether an antenatal NST was conducted. This potential confounding probably exaggerates the effectiveness of NST monitoring.
Bochner, et al., described a comparison of large concurrent cohorts of women who underwent antenatal testing with amniotic fluid volume (AFV) and nonstress testing beginning at week 41 or 42 and those with no antenatal testing (Bochner, Williams, Castro, et al., 1988). They found an association with total number of adverse outcomes (testing, 0/512; no testing, 13/1807 [0.7 percent]; p < 0.05) and a trend toward higher cesarean section for fetal distress in the no- testing cohort (testing, 14/512 [2.7 percent]; no testing, 60/1807 [3.3 percent]; p = 0.07). When the results of testing were compared in the groups beginning testing at 41 weeks (n = 908) and those at 42 weeks (n = 352), the positive predictive value for a diagnosis of intrapartum fetal distress was significantly higher at 42 weeks (21.1 percent at 42 weeks vs. 11.9 percent at 41 weeks), with a concomitantly lower negative predictive value (98.5 percent at 42 weeks vs. 99.1 percent at 41 weeks). This is consistent with an overall increased risk of adverse outcomes with increasing gestational age, assuming that the sensitivity and specificity of the test are independent of gestational age (more on this below). It is unclear why the no-testing group did not receive testing, since women with "high risk factors" were excluded, and inclusion criteria required that women be seen prior to 20 weeks. Again, the possibility of confounding cannot be ruled out.
In summary, it is difficult to draw conclusions about the effectiveness of antepartum testing compared with no testing in prolonged pregnancy. The only randomized trial comparing testing with no testing is limited by a heterogeneous population (in terms of other risk factors), relatively small numbers of patients with prolonged pregnancy alone, failure to report results separately by indication for testing, and questions about the applicability of the results to current practice (Pattison and McCowan, 2001). The two nonrandomized studies identified suggest an excess risk of adverse outcomes in unmonitored pregnancies, but the failure to characterize the groups studied makes it impossible to rule out other factors as the cause of this excess risk.
We identified only one study that assessed the association of maternal sensation of fetal movement with postmaturity syndrome, defined as characteristic skin changes (desquamation, leather-like consistency, little subcutaneous fat) and a "long, lean body," with a ponderal index (weight in grams x 100/length in cubic centimeters) of 2.27 or less (10th percentile or less). Rayburn, et al., tested a group of 147 women at 42 weeks or more gestational age using the NST plus fetal movement charting plus urine estrogen-to-creatinine ratio (Rayburn, Motley, Stempel, et al., 1982). These tests were performed semi-weekly or weekly. If the NST was reactive (two adequate accelerations of baseline fetal heart rate [FHR] during a 20- to 40-minute period), then it was repeated on the next visit. If the NST was nonreactive, then the test was either repeated or a contraction stress test (CST) was given on the same day. Of the 147 cases studied, 32, or 22 percent, had postmaturity syndrome. However, none of the mothers recording kick counts noted reduced fetal movement (sensitivity, 0/32; specificity, 115/115 [100 percent]). The kick count measure was not useful for predicting postmaturity syndrome, with an undefined positive predictive value and negative predictive value of 78 percent. No studies documenting the reliability of this method (such as correlation between maternal sensation of movement and observed movements on ultrasound) were identified.
In summary, there are no data to suggest that maternal sensation of fetal movement is useful in predicting which infants are affected by postmaturity syndrome. There are no data at all to allow evaluation of maternal sensation of fetal movement as a predictor of other adverse outcomes associated with prolonged gestation.
We identified one randomized trial enrolling 287 patients comparing the NST alone with a simple biophysical profile (NST plus AFV, supplemented by estimates of fetal weight and placental function) (Arias, 1987). In this trial, 44 of 217 patients had abnormal results on antenatal testing, 14/112 in the NST alone group and 30/105 in the NST + AFV group. There were no significant differences in any outcome, including fetal distress or cesarean section for fetal distress, though slightly more inductions and cesarean sections for fetal distress occurred in the biophysical profile arm. Test characteristics of other components of this combination of tests (ultrasound for fetal weight alone, ultrasound for placental function alone, or ultrasound for AFV alone) were not reported. Sensitivity was similar for NST alone and NST + AFV; however, specificity was higher for NST alone than for NST + AFV. This study was rated positively for 9 of 12 quality assessment items, failing items for sample size and statistical analysis.
| Outcome | Number of 2-by-2 Tables |
|---|---|
| 1-minute Apgar score | 6 |
| 5-minute Apgar score | 7 |
| Any | 2 |
| Cesarean section for fetal distress | 5 |
| Fetal distress | 3 |
| Fetal growth restriction | 1 |
| Macrosomia | 2 |
| Meconium aspiration | 2 |
| Meconium staining | 1 |
| Neonatal mortality | 3 |
| NICU Admission | 3 |
| pH | 1 |
| Postmaturity syndrome | 2 |
| Stillbirth | 2 |
NICU = Neonatal intensive care unit
| Study | Result | Rate of Abnormal Tests | Outcome | Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|---|
| Fleischer, Schulman, Farmakides, et al., 1985 | Inconclusive | 0.17 | 1-min Apgar | < 7 | 0.07 | 0.40 | 0.85 | 0.16 | 0.95 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Abnormal | 0.04 | 1-min Apgar | < 7 | 0.07 | 0.27 | 0.97 | 0.40 | 0.95 |
| Tongsong and Srisomboon, 1993 | Abnormal | 0.20 | 1-min Apgar | < 7 | 0.07 | 0.41 | 0.81 | 0.14 | 0.95 |
| Eden, Gergely, Schifrin, et al., 1982 | Abnormal | 0.13 | 1-min Apgar | < 7 | 0.15 | 0.42 | 0.92 | 0.50 | 0.90 |
| Small, Phelan, Smith, et al., 1987 | Abnormal | 0.11 | 1-min Apgar | < 7 | 0.18 | 0.13 | 0.90 | 0.22 | 0.82 |
| Phelan, Platt, Yeh, et al., 1984 | Abnormal | 0.13 | 1-min Apgar | < 7 | 0.20 | 0.23 | 0.89 | 0.34 | 0.83 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Inconclusive | 0.17 | 5-min Apgar | < 7 | 0.03 | 0.29 | 0.84 | 0.05 | 0.97 |
| Tongsong and Srisomboon, 1993 | Abnormal | 0.20 | 5-min Apgar | < 7 | 0.02 | 0.33 | 0.80 | 0.04 | 0.98 |
| Small, Phelan, Smith, et al., 1987 | Abnormal | 0.11 | 5-min Apgar | < 7 | 0.02 | 0.11 | 0.89 | 0.02 | 0.98 |
| Devoe and Sholl, 1983 | Abnormal | 0.16 | 5-min Apgar | < 7 | 0.03 | 0.00 | 0.84 | 0.00 | 0.97 |
| Eden, Gergely, Schifrin, et al., 1982 | Abnormal | 0.13 | 5-min Apgar | < 7 | 0.10 | 0.50 | 0.91 | 0.40 | 0.94 |
| Phelan, Platt, Yeh, et al., 1984 | Abnormal | 0.13 | 5-min Apgar | < 7 | 0.03 | 0.33 | 0.87 | 0.06 | 0.98 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Abnormal | 0.04 | 5-min Apgar | < 7 | 0.03 | 0.14 | 0.96 | 0.10 | 0.97 |
| Weiner, Reichler, Zlozover, et al., 1993 | Abnormal | 0.00 | Any | Yes | 0.08 | 0.08 | 0.95 | 0.14 | 0.92 |
| Arias, 1987 | Abnormal | 0.13 | Any | Yes | 0.16 | 0.33 | 0.91 | 0.43 | 0.88 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Inconclusive | 0.17 | C-section for FD | Yes | 0.11 | 0.42 | 0.87 | 0.29 | 0.92 |
| Small, Phelan, Smith, et al., 1987 | Abnormal | 0.11 | C-section for FD | Yes | 0.04 | 0.21 | 0.90 | 0.08 | 0.96 |
| Phelan, Platt, Yeh, et al., 1984 | Abnormal | 0.13 | C-section for FD | Yes | 0.05 | 0.31 | 0.88 | 0.13 | 0.96 |
| Farmakides, Schulman, Winter, et al., 1988 | Abnormal | 0.43 | C-section for FD | Yes | 0.28 | 0.67 | 0.66 | 0.43 | 0.84 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Abnormal | 0.04 | C-section for FD | Yes | 0.11 | 0.23 | 0.98 | 0.60 | 0.91 |
| Tongsong and Srisomboon, 1993 | Abnormal | 0.20 | Fetal distress | Yes | 0.04 | 0.64 | 0.82 | 0.14 | 0.98 |
| Weiner, Farmakides, Schulman, et al., 1994 | Abnormal | 0.00 | Fetal distress | Yes | 0.12 | 0.28 | 1.00 | 0.92 | 0.91 |
| Devoe and Sholl, 1983 | Abnormal | 0.16 | Fetal distress | Yes | 0.17 | 0.28 | 0.87 | 0.31 | 0.85 |
| Farmakides, Schulman, Winter, et al., 1988 | Abnormal | 0.43 | FGR | Not defined | 0.11 | 0.60 | 0.59 | 0.15 | 0.93 |
| Small, Phelan, Smith, et al., 1987 | Abnormal | 0.11 | Macrosomia | > 4000 g | 0.21 | 0.05 | 0.88 | 0.10 | 0.78 |
| Phelan, Platt, Yeh, et al., 1984 | Abnormal | 0.13 | Macrosomia | > 4000 g | 0.22 | 0.08 | 0.85 | 0.13 | 0.77 |
| Eden, Gergely, Schifrin, et al., 1982 | Abnormal | 0.13 | Meconium aspiration | Yes | 0.08 | 0.17 | 0.88 | 0.10 | 0.93 |
| Phelan, Platt, Yeh, et al., 1984 | Abnormal | 0.13 | Meconium aspiration | Yes | 0.08 | 0.26 | 0.88 | 0.16 | 0.93 |
| Ramrekersingh-White, Farkas, Chard, et al., 1993 | Abnormal | 0.10 | Meconium staining | Yes | 0.09 | 0.33 | 0.93 | 0.31 | 0.93 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Abnormal | 0.04 | Neonatal mortality | Yes | 0.01 | 1.00 | 0.97 | 0.30 | 1.00 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Inconclusive | 0.17 | Neonatal mortality | Yes | 0.01 | 1.00 | 0.84 | 0.08 | 1.00 |
| Weiner, Farmakides, Schulman, et al., 1994 | Abnormal | 0.04 | Neonatal mortality | Yes | 0.06 | 0.50 | 0.97 | 0.08 | 1.00 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Abnormal | 0.04 | NICU Admission | Yes | 0.03 | 0.57 | 0.97 | 0.40 | 0.99 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Inconclusive | 0.17 | NICU Admission | Yes | 0.03 | 0.57 | 0.85 | 0.11 | 0.98 |
| Farmakides, Schulman, Winter, et al., 1988 | Abnormal | 0.43 | NICU Admission | Yes | 0.17 | 0.54 | 0.59 | 0.22 | 0.86 |
| Weiner, Farmakides, Schulman, et al., 1994 | Abnormal | 0.04 | pH | < 7.2 | 0.03 | 0.70 | 0.98 | 0.58 | 0.99 |
| Small, Phelan, Smith, et al., 1987 | Abnormal | 0.11 | Postmaturity syndrome | Yes | 0.07 | 0.13 | 0.89 | 0.08 | 0.93 |
| Phelan, Platt, Yeh, et al., 1984 | Abnormal | 0.13 | Postmaturity syndrome | Yes | 0.17 | 0.10 | 0.86 | 0.13 | 0.83 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Inconclusive | 0.17 | Stillbirth | Yes | 0.09 | 0.00 | 0.83 | 0.00 | 0.99 |
| Fleischer, Schulman, Farmakides, et al., 1985 | Abnormal | 0.04 | Stillbirth | Yes | 0.09 | 0.00 | 0.96 | 0.00 | 0.99 |
C-section = Cesarean section
FD = Fetal distress
FGR = Fetal growth restriction
NICU = Neonatal intensive care unit
| Study | Screening Test Threshold | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|---|
| Rayburn, Motley, Stempel, et al., 1982 | Abnormal | 0.02 | Postmaturity syndrome | Yes | 0.22 | 0.09 | 1.00 | 1.00 | 0.80 |
| Sherer, Onyeije, Binder, et al., 1998 | Bradycardia | 0.26 | 5-min Apgar | < 7 | 0.23 | 0.21 | 0.72 | 0.18 | 0.75 |
| Sherer, Onyeije, Binder, et al., 1998 | Bradycardia | 0.26 | Meconium aspiration | Yes | 0.06 | 0.25 | 0.74 | 0.05 | 0.94 |
| Sherer, Onyeije, Binder, et al., 1998 | Bradycardia | 0.26 | NICU Admission | Yes | 0.11 | 0.35 | 0.75 | 0.14 | 0.91 |
| Sherer, Onyeije, Binder, et al., 1998 | Tachycardia | 0.32 | 5-min Apgar | < 7 | 0.18 | 0.41 | 0.70 | 0.23 | 0.85 |
| Sherer, Onyeije, Binder, et al., 1998 | Tachycardia | 0.32 | Meconium aspiration | Yes | 0.03 | 0.67 | 0.69 | 0.06 | 0.98 |
| Sherer, Onyeije, Binder, et al., 1998 | Tachycardia | 0.32 | NICU Admission | Yes | 0.05 | 0.40 | 0.68 | 0.06 | 0.95 |
NICU = Neonatal intensive care unit
In summary, results of these studies suggest that a reactive nonstress test in prolonged pregnancy has good negative predictive value -- i.e., adverse outcomes are unlikely to occur in the setting of a reactive nonstress test -- but that the positive predictive values are low. Data from the one randomized trial comparing weekly NST beginning beyond 40 weeks to NST and amniotic fluid assessment suggest equivalent outcomes.
Knox, et al., compared the CST using oxytocin with amniocentesis for meconium staining in 187 women at 42 weeks gestation (Knox, Huddleston, and Flowers, 1979). The study was prospective, with women assigned to groups according to the last digit of hospital number. Amniocentesis was obtained on all women at entry into the study, and labor was induced immediately if meconium staining was observed. If no meconium staining was present on initial amniocentesis, then subsequent monitoring was as follows: women in the amniocentesis group received weekly amniocentesis and were induced if meconium staining was present; and women in the CST group received an immediate CST, repeated weekly if normal. Labor was induced in significantly more women in the amniocentesis group than the CST group (11/90 [12 percent] vs. 29/90 [2 percent], respectively; p < 0.005). There were no statistically significant differences between testing groups for any outcome, including Apgar score < 7 at 1 minute, Apgar score < 7 at 5 minutes, low birthweight (< 10th percentile), neonatal morbidity, perinatal death, cesarean sections, or abnormal labor (prolonged latent phase, primary dysfunctional labor, secondary arrest of dilatation, or arrest). However, the proportion of babies with Apgar scores less than 7 at 1 and 5 minutes was two-fold higher in the amniocentesis group; the study may have been underpowered to detect this difference.
| Study | Screening Test Threshold | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|---|
| Devoe and Sholl, 1983 | Abnormal | 0.09 | Fetal distress | Yes | 0.17 | 0.14 | 0.92 | 0.27 | 0.84 |
| Devoe and Sholl, 1983 | Abnormal | 0.09 | 5-min Apgar | <7 | 0.03 | 0.00 | 0.91 | 0.00 | 0.97 |
In summary, CST is at least equivalent to amniocentesis for meconium staining in terms of outcomes, with significantly fewer inductions; perhaps on the basis of this trial, amniocentesis is no longer used for this indication. In the setting of prolonged pregnancy, CST, when used sequentially for followup of abnormal NST, has good negative predictive value but poor positive predictive value, based on one observational study.
We did not identify any studies where nipple stimulation was the sole method for performing contraction stress tests in the management of prolonged pregnancy.
We identified one relevant randomized trial. Alfirevic, et al., compared two ultrasonographic measurements of oligohydramnios, namely amniotic fluid index (AFI) < 7.3 and maximum pool depth (MPD) < 2.1 cm, among 500 women at greater than 40 weeks gestation (Alfirevic, Luckas, Walkinshaw, et al., 1997). Both groups also had NST every 3 days. There were no differences in fetal outcomes between the two strategies; however, abnormal NST was more often an indication for induction in the AFI group than in the MPD group (15 percent vs. 8 percent; p = 0.04). The overall rates of induction of labor were not statistically different between groups (87/250 vs. 77/250; p = 0.39). There was a trend toward cesarean section for fetal distress being more common in the AFI group than in the MPD group (8 percent vs. 4 percent; p = 0.09). One possible explanation for this is a lower threshold for a diagnosis of fetal distress or for performing cesarean section in the presence of nonreassuring fetal heart rate tracings or abnormal antepartum NST results. Since such results were more common in the AFI group, it is not surprising that cesareans for fetal distress also were more common.
In a comparative cohort study, Eden, et al., reported a series of 585 patients managed in one of three ways (based on temporal changes in the protocol used): (1) weekly NST with CST for nonreactive NST (from November 1, 1978 through August 31, 1979); (2) semi-weekly NST with biophysical profile for nonreactive NST (from September 1, 1979 through December 31, 1980); or (3) semi-weekly NST with biophysical profile for nonreactive NST, plus weekly AFV measurement (from January 1, 1981 through August 31, 1981) (Eden, Gergely, Schifrin, et al., 1982). The groups employing the biophysical profile had lower incidences of low Apgar score at 5 minutes, meconium aspiration, stillbirth, fetal distress requiring intervention (persistent abnormal FHR patterns), and morbidity (defined as presence of any of following: fetal distress requiring intervention, 5-minute Apgar score < 7, neonatal resuscitation, postmaturity syndrome, or meconium aspiration). However, the rate of cesarean sections was significantly higher in the groups using the biophysical profile than in the group using NST + CST alone (NST + CST, 11.5 percent; NST + biophysical profile, 29.9 percent; NST + AFV + biophysical profile, 29.4 percent; 1 vs. 2, p < 0.05; 1 vs. 3, p < 0.05). This suggests that tests using the biophysical profile may be more sensitive at identifying fetuses at risk, but that subsequent induction resulted in higher cesarean section rates. Alternatively, as discussed above, physician thresholds for performing cesarean section may be quite different based on knowledge of antepartum test results. Despite the higher rates of cesarean section, the incidence of fetal distress requiring intervention was substantially lower in the groups using biophysical profile testing in addition to NST (NST + CST, 21.8 percent; NST + biophysical profile, 4.5 percent; NST + AFV + biophysical profile, 5.5 percent; 1 vs. 2, p < 0.05; 1 vs. 3, p < 0.05).
Tongsong and Srisomboon (1993) performed NST and AFV in 242 women at 42 weeks or more in gestational age. AFV was more accurate than NST in predicting intrapartum fetal distress (p < 0.05) (AFV: sensitivity, 73 percent; specificity, 91 percent; positive predictive value, 27 percent; negative predictive value, 99 percent; NST: sensitivity, 64 percent; specificity, 82 percent; positive predictive value, 14 percent; negative predictive value, 98 percent). Given that the definition of intrapartum fetal distress included moderate to severe variable decelerations, which would be more likely in a setting of oligohydramnios, which in turn would be more likely to be detected with ultrasound, these results are not surprising.
| Study | Screening Test Threshold | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|---|
| Montan and Malcus, 1995 | < 5 cm | 0.13 | 1-min Apgar | < 7 | 0.03 | 0.00 | 0.87 | 0.00 | 0.96 |
| Tongsong and Srisomboon, 1993 | Abnormal | 0.12 | 1-min Apgar | < 7 | 0.07 | 0.47 | 0.91 | 0.27 | 0.96 |
| O'Reilly-Green and Divon, 1996 | < 5 cm | 0.11 | 1-min Apgar | < 8 | 0.15 | 0.08 | 0.88 | 0.10 | 0.85 |
| Phelan, Platt, Yeh, et al., 1985 | < 1 cm | 0.03 | 1-min Apgar | < 7 | 0.21 | 0.12 | 0.99 | 0.86 | 0.81 |
| Phelan, Platt, Yeh, et al., 1985 | Decreased (subjective) | 0.19 | 1-min Apgar | < 7 | 0.21 | 0.37 | 0.86 | 0.40 | 0.84 |
| Eden, Gergely, Schifrin, et al., 1982 | Abnormal | 0.28 | 1-min Apgar | < 7 | 0.11 | 0.52 | 0.75 | 0.21 | 0.93 |
| O'Reilly-Green and Divon, 1996 | < 5 cm | 0.11 | 5-min Apgar | < 9 | 0.05 | 0.08 | 0.89 | 0.04 | 0.94 |
| Montan and Malcus, 1995 | < 5 cm | 0.13 | 5-min Apgar | < 7 | 0.02 | 0.00 | 0.87 | 0.00 | 0.97 |
| Tongsong and Srisomboon, 1993 | Abnormal | 0.12 | 5-min Apgar | < 7 | 0.02 | 0.33 | 0.89 | 0.07 | 0.98 |
| Sarkar and Duthie, 1997 | Decreased | 0.10 | 5-min Apgar | < 7 | 0.05 | 0.00 | 0.90 | 0.00 | 0.95 |
| Phelan, Platt, Yeh, et al., 1985 | < 1 cm | 0.03 | 5-min Apgar | < 7 | 0.03 | 0.25 | 0.98 | 0.29 | 0.97 |
| Phelan, Platt, Yeh, et al., 1985 | Decreased (subjective) | 0.19 | 5-min Apgar | < 7 | 0.03 | 0.38 | 0.82 | 0.07 | 0.97 |
| Eden, Gergely, Schifrin, et al., 1982 | Abnormal | 0.28 | 5-min Apgar | < 7 | 0.04 | 0.88 | 0.74 | 0.13 | 0.99 |
| O'Reilly-Green and Divon, 1996 | < 5 cm | 0.11 | Any | Yes | 0.06 | 0.14 | 0.89 | 0.08 | 0.94 |
| Leveno, Quirk, Cunningham, et al., 1984 | Abnormal | 0.39 | Any | Yes | 0.01 | 1.00 | 0.61 | 0.02 | 1.00 |
| Weiner, Reichler, Zlozover, et al., 1993 | Decreased | 0.08 | Any | Yes | 0.08 | 0.25 | 0.94 | 0.27 | 0.93 |
| Montan and Malcus, 1995 | < 5 cm | 0.13 | C-section | Yes | 0.14 | 0.08 | 0.87 | 0.09 | 0.86 |
| Tongsong and Srisomboon, 1993 | < 3 cm | 0.12 | C-section for FD | Yes | 0.04 | 0.73 | 0.91 | 0.27 | 0.99 |
| Leveno, Quirk, Cunningham, et al., 1984 | Abnormal | 0.39 | C-section for FD | Yes | 0.08 | 0.61 | 0.63 | 0.13 | 0.95 |
| Phelan, Platt, Yeh, et al., 1985 | Oligohydramnios | 0.03 | C-section for FD | Yes | 0.06 | 0.23 | 0.98 | 0.43 | 0.96 |
| Phelan, Platt, Yeh, et al., 1985 | Decreased (subjective) | 0.19 | C-section for FD | Yes | 0.06 | 0.69 | 0.84 | 0.20 | 0.98 |
| Sarkar and Duthie, 1997 | Decreased | 0.10 | C-section for FD | Yes | 0.20 | 0.17 | 0.92 | 0.33 | 0.82 |
| Monaghan, O'Herlihy, and Boylan, 1987 | Abnormal | 0.16 | C-section for FD | Yes | 0.02 | 0.33 | 0.84 | 0.03 | 0.99 |
| Crowley, O'Herlihy, and Boylan, 1984 | < 3 cm | 0.19 | C-section for FD | Yes | 0.03 | 0.78 | 0.82 | 0.11 | 0.99 |
| Leveno, Quirk, Cunningham, et al., 1984 | Abnormal | 0.39 | FGR | Yes | 0.08 | 0.50 | 0.61 | 0.10 | 0.94 |
| Monaghan, O'Herlihy, and Boylan, 1987 | Abnormal | 0.16 | FGR | < 10th centile | 0.12 | 0.48 | 0.88 | 0.34 | 0.93 |
| Sarkar and Duthie, 1997 | Decreased | 0.10 | FGR | Yes | 0.01 | 1.00 | 0.91 | 0.11 | 1.00 |
| Crowley, O'Herlihy, and Boylan, 1984 | < 3 cm | 0.19 | FGR | < 10th centile | 0.11 | 0.46 | 0.84 | 0.26 | 0.93 |
| Sarkar and Duthie, 1997 | Decreased | 0.10 | Intubation | Yes | 0.03 | 0.00 | 0.90 | 0.00 | 0.97 |
| Phelan, Platt, Yeh, et al., 1985 | Decreased (subjective) | 0.19 | Macrosomia | > 4000 g | 0.22 | 0.12 | 0.79 | 0.13 | 0.76 |
| Phelan, Platt, Yeh, et al., 1985 | < 1 cm | 0.03 | Macrosomia | > 4000 g | 0.22 | 0.00 | 0.96 | 0.00 | 0.77 |
| Eden, Gergely, Schifrin, et al., 1982 | Abnormal | 0.28 | Meconium aspiration | Yes | 0.03 | 0.80 | 0.73 | 0.08 | 0.99 |
| Monaghan, O'Herlihy, and Boylan, 1987 | Abnormal | 0.16 | Neonatal mortality | Yes | 0.01 | 0.00 | 0.84 | 0.00 | 0.99 |
| Sarkar and Duthie, 1997 | Decreased | 0.10 | NICU admission | Yes | 0.01 | 1.00 | 0.91 | 0.06 | 1.00 |
| Monaghan, O'Herlihy, and Boylan, 1987 | Abnormal | 0.16 | NICU admission | Yes | 0.09 | 0.17 | 0.84 | 0.09 | 0.91 |
| Crowley, O'Herlihy, and Boylan, 1984 | < 3 cm | 0.19 | NICU admission | Yes | 0.07 | 0.38 | 0.82 | 0.14 | 0.94 |
| Monaghan, O'Herlihy, and Boylan, 1987 | Abnormal | 0.16 | pH | < 7.25 | 0.07 | 0.23 | 0.84 | 0.09 | 0.94 |
| Rayburn, Motley, Stempel, et al., 1982 | Oligohydramnios | 0.56 | Postmaturity syndrome | Yes | 0.22 | 0.91 | 0.54 | 0.35 | 0.95 |
| Rayburn, Motley, Stempel, et al., 1982 | Pockets | 0.20 | Postmaturity syndrome | Yes | 0.22 | 0.75 | 0.96 | 0.83 | 0.93 |
C-section = Cesarean section; FD = Fetal distress; FGR = Fetal growth restriction; NICU = Neonatal intensive care unit
As part of an investigation of the value of ultrasound evaluation of amniotic fluid volume in predicting adverse outcomes, Crowley, et al., also evaluated the performance of clinical assessment of AFV by abdominal palpation. This technique had a false positive rate of 25 percent and a false negative rate of 43 percent for predicting "significant meconium staining or absent amniotic fluid" at the time of amniotomy (Crowley, O'Herlihy, and Boylan, 1984).
| Study | NST | Amniotic Fluid Measurement | Fetal Breathing Movements | Fetal Tone/Movements | Fetal Body Measurements | Uterine Artery Resistance | Fetal Dopplers | Fetal Reflexes (Magnitude and Speed) | Placental Grading | |
|---|---|---|---|---|---|---|---|---|---|---|
| Maximum Pool Depth | Amniotic Fluid Index | |||||||||
| Alfirervic and Walkinshaw, 1995: "Simple" | X | X | ||||||||
| Alfirervic and Walkinshaw, 1995: "Complex" | X | X | X | X | X | |||||
| Arias, 1987 | X | X | X | X | ||||||
| Bochner, Medearis, Ross, et al., 1987 | X | X | X | |||||||
| Bochner, Williams, Castro, et al., 1988 | X | X | X | |||||||
| Brar, Horenstein, Medearis, et al., 1989 | X | X | X | Umbilical | ||||||
| Eden, Gergely, Schifrin, et al., 1982: "modified biophysical profile" | X | X | X | X | ||||||
| Arabin, Snyjders, Mohnhaupt, et al., 1993: "traditional biophysical profile" | X | X | X | X | ||||||
| Arabin, Snyjders, Mohnhaupt, et al., 1993: "fetal assessment score" | X | X | X | Carotid | X | |||||
| Hann, McArdle, and Sachs, 1987 | X | X | X | X | X | |||||
| Gilson, O'Brien, Vera, et al., 1988 | X | X | X | X | ||||||
NST = Nonstress test
| Study | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|
| Bochner, Medearis, Ross, et al., 1987 | 0.23 | 1-min Apgar | < 7 | 0.39 | 0.25 | 0.79 | 0.43 | 0.63 |
| Bochner, Medearis, Ross, et al., 1987 | 0.23 | 5-min Apgar | < 7 | 0.02 | 1.00 | 0.79 | 0.07 | 1.00 |
| Brar, Horenstein, Medearis, et al., 1989 | 0.42 | 5-min Apgar | < 7 | 0.18 | 0.88 | 0.68 | 0.37 | 0.96 |
| Arias, 1987 | 0.29 | Any | Yes | 0.39 | 0.37 | 0.77 | 0.50 | 0.65 |
| Bochner, Medearis, Ross, et al., 1987 | 0.23 | C-section for FD | Yes | 0.21 | 0.85 | 0.94 | 0.79 | 0.96 |
| Brar, Horenstein, Medearis, et al., 1989 | 0.42 | C-section for FD | Yes | 0.29 | 0.69 | 0.69 | 0.47 | 0.85 |
| Bochner, Williams, Castro, et al., 1988 | 0.22 | Fetal distress | Yes | 0.06 | 0.81 | 0.82 | 0.22 | 0.99 |
| Bochner, Williams, Castro, et al., 1988 | 0.15 | Fetal distress | Yes | 0.03 | 0.70 | 0.87 | 0.12 | 0.99 |
| Bochner, Medearis, Ross, et al., 1987 | 0.23 | FGR | < 10th percentile | 0.11 | 0.29 | 0.78 | 0.14 | 0.90 |
| Bochner, Medearis, Ross, et al., 1987 | 0.23 | Meconium aspiration | Yes | 0.05 | 0.33 | 0.78 | 0.07 | 0.96 |
| Bochner, Medearis, Ross, et al., 1987 | 0.23 | Neonatal mortality | Yes | 0.00 | 0.77 | 0.00 | 1.00 | |
| Brar, Horenstein, Medearis, et al., 1989 | 0.42 | NICU Admission | Yes | 0.13 | 0.83 | 0.64 | 0.26 | 0.96 |
C-section = Cesarean section
FD = Fetal distress
FGR = Fetal growth restriction
NICU = Neonatal intensive care unit
| Study | Test Threshold | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|---|
| Arabin, Snyjders, Mohnhaupt, et al., 1993 | Multiple | Varies | C-section for fetal distress | Yes | 0.35 | NR | NR | NR | NR |
| Arabin, Snyjders, Mohnhaupt, et al., 1993 | Multiple | Varies | 1-min Apgar | < 7 | 0.09 | NR | NR | NR | NR |
| Arabin, Snyjders, Mohnhaupt, et al., 1993 | Multiple | Varies | PH | Yes | 0.08 | NR | NR | NR | NR |
| Gilson, O'Brien, Vera, et al., 1988 | < 8 | 0.11 | 1-min Apgar | < 7 | 0.20 | 0.08 | 0.88 | 0.14 | 0.79 |
| Gilson, O'Brien, Vera, et al., 1988 | < 8 | 0.20 | 5-min Apgar | < 7 | 0 | Undefined | 0.80 | 0 | 1 |
| Hann, McArdle, and Sachs, 1987 | < 6 | 0.06 | Any | Yes | 0.05 | 0.14 | 0.94 | 0.13 | 0.95 |
| Gilson, O'Brien, Vera, et al., 1988 | < 8 | 0.02 | C-section for fetal distress | Yes | 0.20 | 0.08 | 0.99 | 0.67 | 0.81 |
| Gilson, O'Brien, Vera, et al., 1988 | < 8 | 0.12 | Fetal distress | Yes | 0.20 | 0.15 | 0.89 | 0.27 | 0.81 |
| Gilson, O'Brien, Vera, et al., 1988 | < 8 | 0.10 | Postmaturity syndrome | Yes | 0.20 | 0.27 | 0.94 | 0.54 | 0.83 |
C-section = Cesarean section
NR = Not reported
Arabin, Snyjders, Mohnhaupt, et al. (1993) compared the predictive ability of a biophysical profile consisting of NST, amniotic fluid assessment, fetal tone, fetal movements, and fetal breathing to a novel fetal assessment score consisting of five components: FHR pattern, uterine artery resistance by Doppler ultrasound, carotid artery resistance index by Doppler ultrasound, fetal tone (movements) by ultrasound, and fetal reflexes (magnitude and speed of movements) by ultrasound. In receiver operating characteristic (ROC) analysis, the fetal assessment score provided better prediction of fetal distress and low Apgar score at 1 minute than did the biophysical profile (p < 0.001) but not better prediction of low umbilical artery pH. Qualitatively, the difference was greatest for prediction of fetal distress, with less difference noted for prediction of low Apgar scores and none for prediction of low pH. This suggests that the fetal prediction score is better at discriminating results that correlate directly with its component tests (such as fetal distress defined by abnormal fetal heart rate patterns) than at true physiological measures of fetal compromise. One possible explanation for this could be interpretation of intrapartum fetal monitoring based on prior knowledge of antepartum test results.
Hann, et al., reported the results of biophysical profile monitoring in 131 women at 41 completed weeks gestation (Hann, McArdle, and Sachs, 1987). Positive predictive values for "poor neonatal outcome" (neonatal distress requiring admission to the neonatal intensive care unit, endotracheal intubation, use of positive pressure ventilation for more than 6 hours, and/or persistent fetal circulation) for the composite biophysical profile at a threshold of < 6 was 14 percent; for individual components, positive predictive values were as follows: AFV, 17 percent; placental grading, 4 percent; fetal breathing movements, 5 percent; fetal tone/movements, 40 percent; and nonreactive NST, 14 percent. Negative predictive value for the composite biophysical profile was 94 percent; for individual components: AFV, 95 percent; placental grading, 91 percent; fetal breathing movements, 94 percent; fetal tone/movements, 95 percent; and reactive NST, 94 percent.
Gilson, O'Brien, Vera, et al. (1988) describe the association between twice weekly biophysical profile monitoring and low Apgar scores, fetal distress, and cesarean section for fetal distress among 178 women at greater than 42 weeks gestation. At the cut-point used (a score of 8), the test showed poor sensitivity across all outcomes, ranging from 0.08 to 0.27.
Two studies reported data on the predictive value of Doppler measurments of umbilical artery blood flow (Battaglia, Larocca, Lanzani, et al., 1991). This was performed as a battery of tests including NST; amnioscopy; AFV; Doppler velocimetry of the uterine, umbilical, descending thoracic aorta, renal, and middle cerebral arteries; and a series of maternal blood measurements, including hPL, estriol, hematocrit, platelets, mean platelet volume, and uric acid. The criteria for decisionmaking about induction and delivery were not described. Doppler velocimetry was strongly associated with adverse outcomes, including "poor condition" (both 1- and 5-minute Apgar scores < 7 or infant admitted to NICU for asphyxia and/or meconium aspiration syndrome), oligohydramnios (largest pocket < 2 cm), meconium staining, and cesarean sections for fetal distress. Of note, 4 of 16 of these infants had birthweights greater than 4,000 grams; it is unclear to what extent these infants, who presumably had normal uteroplacental function, affected the results.
Farmakides, et al., reported on 140 high-risk pregnancies (33 percent were postdate) that were followed with NST and Doppler velocimetry (Farmakides, Schulman, Winter, et al., 1988). "Most" of the cases of fetal distress and cesareans for fetal distress came from the postdate subgroup. Nonreactive NST was significantly more sensitive at predicting cesarean section for fetal distress than Doppler. Since management decisions were based on NST results, this again raises the possibility of biased decisionmaking based on prior knowledge of antepartum test results.
| Study | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|
| Battaglia, Larocca, Lanzani, et al., 1991 | 0.29 | Any | Yes | 0.01 | 1.00 | 0.72 | 0.04 | 1.00 |
| Battaglia, Larocca, Lanzani, et al., 1991 | 0.29 | Cesarean section | Yes | 0.29 | 0.58 | 0.83 | 0.58 | 0.83 |
| Battaglia, Larocca, Lanzani, et al., 1991 | 0.29 | Cesarean section for fetal distress | Yes | 0.12 | 0.80 | 0.78 | 0.33 | 0.97 |
| Farmakides, Schulman, Winter, et al., 1988 | 0.31 | Fetal distress | Abnormal | 0.43 | 0.27 | 0.65 | 0.36 | 0.54 |
| Battaglia, Larocca, Lanzani, et al., 1991 | 0.29 | Meconium staining | Yes | 0.29 | 0.92 | 0.97 | 0.92 | 0.97 |
| Battaglia, Larocca, Lanzani, et al., 1991 | 0.29 | Nonstress test | Abnormal | 0.16 | 1.00 | 0.84 | 0.54 | 1.00 |
| Battaglia, Larocca, Lanzani, et al., 1991 | 0.29 | Oligohydramnios | Yes | 0.30 | 0.64 | 0.86 | 0.67 | 0.84 |
There are no randomized trials comparing antepartum testing by any method to no testing in women with prolonged pregnancy only. Data from one relatively large retrospective cohort (Bochner, Williams, Castro, et al., 1988) suggest an increased risk of adverse outcomes to the fetus, although confounding cannot be eliminated as a possibility for this observed association. Evidence from large registries shows consistently elevated risks of antepartum stillbirth with increasing gestational age, even in health systems where testing is available (see the section on "Risk of Perinatal Mortality" in chapter 1). Given this elevated risk, it is highly unlikely that a randomized trial of testing versus no testing could be performed in the United States without, at the least, extreme difficulty with recruitment. The low absolute risk of stillbirth makes sample size requirements prohibitive as well. For example, the estimated perinatal mortality at 41 weeks in terms of deaths per 1,000 ongoing pregnancies is approximately 1.2. A randomized trial would need over 40,000 women in each arm to determine a two-fold difference in risk of stillbirth between two competing methods of antepartum surveillance.
Because of the numerous methodological issues involved in evaluating specific antepartum tests (see discussion below), we are unable to conclude that any test or combination of tests is clearly superior to another. Only one randomized trial directly compared a more complex test with a simpler test (Alfirevic and Walkinshaw, 1995); this trial showed that the more complex test resulted in more interventions with no difference in outcomes. As with most tests, there appear to be consistent tradeoffs between sensitivity and specificity-tests that are more sensitive are likely to be less specific. We did not identify published data on inter- or intraobserver variability of these tests in the specific context of monitoring prolonged pregnancy or on the medical and nonmedical costs associated with specific tests and testing regimens.
We did find that, qualitatively, specificity for most tests was considerably better than sensitivity, while negative predictive value also was considerably better than positive predictive value. This means that women with "normal" test results are highly unlikely to experience the adverse outcomes used to determine a true "positive" test result. The high specificities reported may reflect biases in study design-when outcomes are either directly related to test results (such as nonreassuring fetal heart rate tracings after abnormal antepartum NST) or likely to be influenced by knowledge about the test results (such as cesarean section for fetal distress), specificity is likely to be relatively high.
This pattern of high negative predictive value in the setting of relatively low sensitivities has interesting implications for future management strategies. By Bayes' Theorem, positive predictive value can be expressed as:
True Positives/(True Positives + False Positives), or [(Prevalence)*(Sensitivity)] /{[(Prevalence)*(Sensitivity)] + [(1-Prevalence)*(1-Specificity)]}, while negative predictive value is expressed as:
True Negatives/(True Negative + False Negatives), or [(1-Prevalence)*(Specificity)] /{[(1-Prevalence)*(Specificity)] + [(Prevalence)*(1-Sensitivity)]}.
In practice, this means that increasing test sensitivity results in a higher negative predictive value, since the false negative rate decreases. Increasing test specificity results in a higher positive predictive value, since false positives decrease. Given the consistent pattern observed for all of the reviewed antepartum tests that specificity is higher than sensitivity, one would expect that positive predictive value would be higher than negative predictive value. The fact that the pattern is consistently the opposite suggests that it is the relatively low prior probability of adverse outcomes, the "prevalence" in the equations above, that drives the predictive values.
If this is the case, then the following points need to be considered:
The main purpose of antepartum testing is primarily to avoid unexplained stillbirths and secondarily to avoid perinatal morbidity. In order to accomplish these things, tests with high negative predictive values are needed. One way to achieve this would be to improve the sensitivity of currently used antepartum testing technologies. Since it is unlikely that sensitivity can be increased without a subsequent decrease in specificity, this means that the positive predictive value of these tests will decrease further.
If, as the reviewed studies suggest, the probability of adverse outcomes is currently what determines predictive values, then this means that the positive predictive value of antepartum testing will improve and the negative predictive value decline as gestational age increases, since the risk of stillbirth and other adverse events increases with gestational age. This proposition is dependent on the assumptions that (1) sensitivity and specificity are independent of gestational age, and (2) the outcomes reported in these studies are reasonable surrogates for stillbirth risk. This proposition is consistent with the data reported by Bochner, Williams, Castro, et al. (1988), according to which the positive predictive value for all adverse outcomes was better when testing began at 42 weeks (21.1 percent vs. 11.9 percent when testing began at 41 weeks), but the negative predictive value was worse (98.5 percent at 42 weeks vs. 99.1 percent at 41 weeks).
Assuming that induction of labor does not carry increased perinatal risks compared with spontaneous labor, planned induction of labor at a given gestational age will always result in fewer expected adverse perinatal outcomes compared with testing strategies, since the negative predictive value of the tests will continue to decline as gestational age advances. At earlier gestational ages, where the risk is very low, the number of patients required to demonstrate this would be quite large.
These implications will be discussed further in the context of the trials of induction versus testing (Question 2).
Because both mother and infant are at risk of injury secondary to macrosomia, various methods for estimating fetal weight have been evaluated. Macrosomia is usually defined as a newborn weight of greater than 4,000 grams or 4,500 grams; the clinical significance of birthweights between 4,000 and 4,500 grams is unclear, since risk of shoulder dystocia is greatest for infants over 4,500 grams (ACOG, 2000).
Chauhan, et al., compared estimates of fetal weight by clinicians using Leopold maneuvers in early labor, sonographic measurements obtained by the same clinicians, and actual birthweight (Chauhan, Sullivan, Magann, et al., 1994). Clinical estimation was significantly more accurate than ultrasound estimation as measured by mean absolute error compared with actual weight (clinical, 322 ± 253 g; sonographic, 547 ± 425 g; p < 0.001), mean percentage absolute error (clinical, 8.9 ± 7.1 g/kg; sonographic, 14.8 ± 11.0 g/kg; p < 0.001), and percentage of estimates within 10 percent of actual birthweight (clinical, 65.4 percent; sonographic, 42.8 percent; p < 0.005).
The same group also compared maternal estimations by women with prior childbearing experience with clinical estimation (Chauhan, Sullivan, Lutton, et al., 1995). There were no significant differences in the accuracy of maternal estimates compared with clinical estimates.
Chauhan, et al. (Chauhan, Sullivan, Magann, et al., 1994) found that clinical estimation was more accurate than ultrasonographic estimation by the same clinician (see above). Ultrasound was slightly more sensitive at predicting birthweight greater than 4,000 grams (55 percent vs. 50 percent, based on 20 cases).
Chervenak, et al., compared 317 women followed for prolonged pregnancy with twice weekly NST and AFT with 100 control patients delivered between 38 and 40 weeks (Chervenak, Divon, Hirsch, et al., 1989). Fetal weights were also obtained, although it is unclear how often these measurements were performed. Overall incidence of birthweight greater than 4,000 grams was significantly higher in postdate patients (24 percent vs. 4 percent; p < 0.05), and cesarean section rates for arrest or protraction disorders were significantly higher when infants weighed more than 4,000 grams (22 percent vs. 10 percent; p < 0.01). Sensitivity of ultrasound for predicting birthweight greater than 4,000 grams was 61 percent, specificity 91 percent, positive predictive value 70 percent, and negative predictive value 87 percent. Morbidity associated with macrosomia was not reported. It is unclear to what extent clinicians managing the patients had access to the ultrasound reports. Since clinicians might have a lower threshold for diagnosing an arrest or protraction disorder in the setting of suspected macrosomia, this would result in a bias in favor of improved positive predictive value for ultrasound.
Gilby, et al., constructed ROC curves for the performance of two abdominal circumference cut-points (35 cm and 38 cm) for predicting macrosomia at two thresholds, 4,000 grams and 4,500 grams, from a series of 1,996 subjects who had ultrasounds within 7 days of delivery (Gilby, Williams, and Spellacy, 2000). At a cut-point of 35 cm, sensitivity for prediction of birthweight of 4,500 grams was 98.5 percent, specificity 64.6 percent, positive predictive value 9.1 percent, and negative predictive value 99.9 percent. At a cut-point of 38 cm, sensitivity was 53.6 percent, specificity 96.8 percent, positive predictive value 37.3 percent, and negative predictive value 98.3 percent. Morbidity associated with macrosomia was not reported. Whether these predictive values would be applicable in a different population is unclear.
O'Reilly-Green and Divon (1997) constructed ROC curves for ultrasonographic estimates of fetal weight, with an adjustment of 12.7 grams added to the estimated fetal weight (EFW) for each day elapsed between sonographic measurements and delivery. Areas under the ROC curve for prediction of birthweight greater than 4,000 grams were 0.85 and 0.93 to 0.95 for prediction of birthweight greater than 4,500 grams, indicating good discriminative ability. Relatively small relative increments in EFW had large impacts on sensitivity and specificity: for prediction of actual birthweight of greater than 4,000 grams, an EFW of 3,711 grams had a sensitivity of 85 percent and specificity of 72 percent, while an EFW of 4,000 grams had a sensitivity of 56 percent and a specificity of 91 percent. For prediction of birthweight greater than 4,500 grams, an EFW of 4,192 grams had sensitivity of 83 percent and specificity of 92 percent, while an EFW of 4,500 grams had a sensitivity of 22 percent and a specificity of 99 percent. Again, no correlation with outcomes associated with fetal macrosomia were reported.
| Study | Screening Test Threshold | Rate of Abnormal Tests | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|
| Chauhan, Sullivan, Lutton, et al., 1995 | > 4,000 gm | 0.21 | > 4,000 gm | 0.26 | 0.61 | 0.92 | 0.73 | 0.87 |
| O'Reilly-Green and Divon, 1997 | > 3,710 gm | 0.42 | > 4,000 gm | 0.24 | 0.85 | 0.72 | 0.49 | 0.94 |
| Chervenak, Divon, Hirsch, et al., 1989 | > 4,000 gm | 0.22 | > 4,000 gm | 0.26 | 0.60 | 0.91 | 0.69 | 0.87 |
| Pollack, Hauer-Pollack, and Divon, 1992 | > 4,000 gm | 0.20 | > 4,000 gm | 0.23 | 0.56 | 0.91 | 0.65 | 0.88 |
| Jazayeri, Heffron, Phillips, et al., 1999 | Abdominal circumference > 34 cm | 0.48 | > 4,000 gm | 0.50 | 0.89 | 0.93 | 0.93 | 0.90 |
| Gilby, Williams, and Spellacy, 2000 | Abdominal circumference > 38 cm | 0.05 | > 4,500 gm | 0.03 | 0.54 | 0.97 | 0.37 | 0.98 |
| O'Reilly-Green and Divon, 1997 | > 4500 gm | 0.02 | > 4,500 gm | 0.04 | 0.22 | 0.99 | 0.44 | 0.97 |
| O'Reilly-Green and Divon, 1997 | > 4,191 gm | 0.11 | > 4,500 gm | 0.04 | 0.83 | 0.92 | 0.30 | 0.99 |
| Pollack, Hauer-Pollack, and Divon, 1992 | > 4,500 gm | 0.02 | > 4,500 gm | 0.04 | 0.14 | 0.99 | 0.33 | 0.96 |
| Gilby, Williams, and Spellacy, 2000 | Abdominal circumference >34 cm | 0.38 | > 4,500 gm | 0.03 | 0.99 | 0.65 | 0.09 | 1.00 |
There is a clear tradeoff between sensitivity and specificity of markers for estimating fetal weight. The definition of macrosomia also plays a role. In studies in women with prolonged pregnancy, sensitivities for detection of birthweight greater than 4,000 grams range from 56-89 percent, with specificities of 72-93 percent; positive predictive values at this threshold range from 49-93 percent, with negative predictive values of 87-94 percent. At a threshold of 4,500 grams, sensitivity ranges from 14-99 percent and specificity from 65-99 percent, with positive predictive values of 9-44 percent and negative predictive values of 96-100 percent. Positive predictive value at the more clinically significant 4,500 gram threshold is worse than at 4,000 grams (not surprisingly, since the probability of a weight greater than 4,500 grams is much lower than for 4,000 grams). However, translation of even this diagnostic test accuracy into clinical strategies that significantly reduce injury risk to either mother or infant at an acceptable cost in terms of iatrogenic complications or resource use is difficult.
Prior suspicion of fetal macrosomia does not appear to result in improved outcomes for either mother or infant. Weeks, et al., reported a retrospective series of 504 infants with birthweight greater or equal to 4,200 grams (Weeks, Pitman, and Spinnato, 1995). In 102 patients, macrosomia was suspected, while it was not in the remaining 402. Cesarean delivery rates were significantly higher in the suspected group (52 percent) compared with the unsuspected group (30 percent), a difference attributable to a higher rate of labor induction and failed induction. Among patients undergoing vaginal delivery, shoulder dystocia occurred in 24.5 percent of the predicted group and 16.7 percent in the not predicted group, a difference that was not statistically significant (which may be due to lack of power).
Even better evidence of a lack of benefit comes from a trial in which women at 38 weeks or more with estimated birthweights between 4,000 and 4,500 grams based on ultrasound were randomized to either immediate induction or expectant management. There were no statistically significant differences in cesarean delivery rate, instrumental delivery rate, or incidence of shoulder dystocia between the two groups (Gonen, Rosen, Dolfin, et al., 1997). There were trends toward higher instrumental delivery rates in induced nulliparous women (26.2 percent vs. 15 percent in expectantly managed nulliparous women) and higher cesarean section rates in expectantly managed multiparous women (16.2 percent vs. 10.9 percent in induced multiparous women). Other maternal outcomes, such as perineal or vaginal trauma, were not reported. The study was underpowered to detect differences in neonatal morbidity; overall rates were low (9/134 in the induction group and 11/139 in the expectant group), with six or fewer cases of any single type of morbidity (cephalohematoma, with nine cases, was most common).
Rouse, Owen, Goldenberg, et al., (1996) estimated based on available data that a policy of elective cesarean section for an estimated fetal weight of 4,500 grams or more would result in 3,695 cesarean deliveries at a cost of over $8 million to prevent one permanent brachial plexus injury.
In summary, methods for detection of macrosomia defined as birthweight greater than 4,500 grams are imprecise. There is evidence that clinical measurements, including multiparous patients' own estimates, are as accurate as ultrasound. Available data suggest that there is no benefit to mother or infant from induction of labor for suspected macrosomia (when defined as estimated weights between 4,000 and 4,500 grams). While an estimate of fetal weight in theory may have some benefit in management of labor (such as avoidance of operative vaginal deliveries in settings where shoulder dystocia risk is higher), available observational data suggest that suspicion of macrosomia prior to labor does not improve outcomes. There is no evidence that ultrasonographic measurement of fetal weight to detect macrosomia in the setting of prolonged pregnancy improves maternal or neonatal outcomes.
| Exam Findings | Score | |||
|---|---|---|---|---|
| 0 | 1 | 2 | 3 | |
| Dilatation (cm) | Closed | 1-2 | 3-4 | > 5 |
| Effacement (%) | 0-30 | 40-50 | 60-70 | > 80 |
| Station | −3 | −2 | −1 or 0 | +1, +2 |
| Consistency | Firm | Medium | Soft | - |
| Position of cervix | Posterior | Midposition | Anterior | - |
In Bishop's original report (Bishop, 1964), induction was successful in 100 percent of cases (no denominator given) when the Bishop score was greater than 9. Data for lower scores were not given, and notably, all inductions were apparently in multiparous patients, since "[o]wing to the unpredictability of the duration of labor in the nullipara, even in the presence of apparently favorable circumstances, induction of labor brings little advantage for either obstetrician or patient." There was a statistically significant negative correlation between score and interval from examination to spontaneous delivery, but confidence intervals were quite wide (quantitative data were not provided, only a graphic representation).
| Study | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|
| Mouw, Egberts, Kragt, et al., 1998 | NR | Birth date | < 4 days | 0.5 | 0.67 | 0.77 | NR | NR |
| Witter and Weitz, 1989 | 0.41 | Cesarean section | Yes | 0.34 | 0.77 | 0.78 | 0.65 | 0.87 |
| Witter and Weitz, 1989 | 0.24 | Cesarean section | Yes | 0.34 | 0.46 | 0.88 | 0.67 | 0.76 |
| Witter and Weitz, 1989 | 0.54 | Cesarean section | Yes | 0.34 | 0.73 | 0.56 | 0.46 | 0.80 |
NR = Not reported
The relatively poor discrimination of the Bishop score in predicting either labor or subsequent successful induction in prolonged pregnancy is magnified by the inherent unreliability of many of its component measures. Significant interobserver variability has been reported in measurement of cervical effacement (Goldberg, Newman, and Rust, 1997; Holcomb and Smeltzer, 1991). Furthermore, significant intra- and interobserver variability has been described for assessment of cervical dilatation (Phelps, Higby, Smyth, et al., 1995; Tuffnell, Bryce, Johnson, et al., 1989)
| Study | Screening Test Threshold | Rate of Abnormal Tests | Outcome | Outcome Threshold | Rate of Outcome Event | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value |
|---|---|---|---|---|---|---|---|---|---|
| Tam, Tai, and Rogers, 1999 | Positive | 0.33 | Vaginal Delivery | Yes | 0.84 | 0.36 | 0.84 | 0.86 | 0.32 |
| Mouw, Egberts, Kragt, et al., 1998 | > 50 ng/ml | 0.54 | Birth date | < 4 days | 0.50 | 0.71 | 0.64 | 0.67 | 0.69 |
Mouw, et al., measured fetal fibronectin at 41 weeks (Mouw, Egberts, Kragt, et al., 1998). A positive fFN test (> 50 ng/ml) had sensitivity of 0.71 (95 percent CI, 0.58 to 0.86) and specificity of 0.64 (95 percent CI, 0.48 to 0.78) for predicting birth within 3 days. The change from negative to positive fFN values often occurred between 1 and 4 days before birth in women with a spontaneous onset of labor. The mean interval between positive test and birth was 2.5 ± 2.5 days (range, 0-11).
Imai and colleagues measured vaginal fFN and a panel of cytokines (interleukin 1-beta, interleukin-6, interleukin-8, and tumor necrosis factor alpha) weekly in 122 women from 36 through 42 weeks (Imai, Tani, Saito, et al., 2001). Vaginal fFN was inversely correlated with sampling to delivery interval (r = −0.40). At a threshold of > 50 ng/ml, fFN had a sensitivity of 90 percent, a specificity of 50 percent, a positive predictive value of 75 percent, and a negative predictive value of 75 percent for predicting delivery within 7 days. Interleukin 1-beta was the only cytokine with reasonable performance, but it was less able to discriminate than fFN (sensitivity 55 percent, specificity 76 percent). Results were not stratified by parity or gestational age.
The Bishop score has a long history in obstetric decisionmaking. Clearly, clinically detectable changes in the cervix take place prior to the onset of labor, and the likelihood of a successful induction should be greater the closer a given patient is to spontaneous labor. However, the documented substantial inter- and intraobserver variability in the components of the Bishop score suggest that its ability to discriminate between women likely to have a successful induction of labor and those unlikely to have a successful induction may be relatively poor. Certainly, given this inherent variability and the discrete nature of its components, changes in the global Bishop score are less than satisfactory primary outcomes for studies of induction or cervical ripening agents. Data on the clinical utility of fetal fibronectin as a decisionmaking tool in managing prolonged pregnancy are insufficient to draw conclusions. Fetal fibronectin may have potential as a tool for helping to identify women likely to deliver spontaneously within the next 7 days, which in turn may help guide decisionmaking about antepartum testing versus induction.
Choice of appropriate outcome measures: Many of the most important outcome measures, especially stillbirth, are so rare that studies using these outcomes are almost impossible to perform. Surrogate markers therefore are not inappropriate, but their clinical relevance is not always clear. For example, although meconium aspiration is a significant adverse outcome with potential for long-term negative sequelae, the presence of meconium-stained amniotic fluid alone is not. Intrapartum abnormal fetal heart rate tracings themselves are subject to significant observer variability (Ayres-de-Campos, Bernardes, Costa-Pereira, et al., 1999; Bernardes, Costa-Pereira, Ayres-de-Campos, et al., 1997; Donker, van Geijn, and Hasman, 1993; Lidegaard, Bottcher, and Weber, 1992), and interpretation may be influenced by prior knowledge of antepartum test results, making fetal heart rate patterns, or cesarean section decisions based on these patterns, less than ideal as surrogate markers of fetal compromise.
Bias: Many of the studies reviewed either did not state whether clinicians managing patients were aware of test results or definitely stated that these results were available. Since knowledge of these results could affect both interpretation of outcomes (as discussed above) or thresholds for decisionmaking (e.g., greater reluctance to use oxytocin to augment labor if prior antepartum testing was abnormal, or a lower cesarean section threshold for arrest of dilatation or descent if macrosomia were suspected), the ability of tests to predict these outcomes could be falsely elevated.
Resource use: Data on the medical and nonmedical costs of any of the tests reviewed are lacking.
Inappropriate summary measures and tests: Many studies used means or t-tests for variables such as Bishop scores, Apgar scores, or parity, where values other than integers are meaningless.
Sample size: Few studies discussed sample size issues.
Failure to account for variability: No study attempted to account for the effects of observer variation on the precision of estimates. For tests where quantitative values are used to establish a threshold for normal and abnormal, this variability will have implications for the precision of sensitivity and specificity.
The risk of antepartum stillbirth clearly increases with increasing gestational age. Although definitive evidence that antepartum testing at some point after 40 weeks reduces perinatal mortality is not available, there are some data consistent with an increased risk of adverse outcomes in women who do not get tested (Bochner, Williams, Castro, et al., 1988; Fleischer, Schulman, Farmakides, et al., 1985). The most appropriate time to begin antepartum testing in otherwise low-risk women is unclear. An excellent decision analysis of antepartum testing in high-risk women prior to 40 weeks illustrated that the trade-offs are between the risk of stillbirth, the risk of neonatal death, and the sensitivity and specificity of the test (Rouse, Owen, Goldenberg, et al., 1996). Since the risk of neonatal death in an otherwise uncomplicated pregnancy at term is quite low, the main issues are the stillbirth risk and test characteristics. Unfortunately, our review does not allow precise estimation of the test characteristics of any of these tests in detecting infants at greatest risk for stillbirth in otherwise uncomplicated pregnancies after term.
As the sensitivity of antepartum testing for predicting surrogate markers of fetal compromise increases, specificity decreases. Testing strategies involving a combination of fetal heart rate monitoring and ultrasonographic measurement of amniotic fluid volume appear to have the highest levels of sensitivity; however, methodological issues and variability in specific tests and testing strategies prohibit definitive conclusions about which test or combination of tests has the best performance.
Qualitatively, we found that specificity was much higher than sensitivity for most of the outcomes measured, but negative predictive values were much higher than positive predictive values, suggesting that outcome probability is currently the most important determinant of test performance. This in turn implies that the negative predictive value will decrease as gestational age advances, and rates of adverse outcomes due to false negative test results will increase, if sensitivity and specificity of antepartum tests are independent of gestational age. Identifying the most appropriate time to begin testing (or to consider induction) is ultimately dependent on identifying threshold risks of adverse outcomes when weighed against the risks and costs of intervention. We did not identify any data that would allow estimation of that threshold risk.
Low positive predictive values mean that intervention rates will be relatively high. The degree to which individual women, or society, are willing to trade off risk of adverse fetal outcomes due to prolonged pregnancy, versus the potential for iatrogenic adverse outcomes associated with interventions, is unclear. How variability in the value women place on the nature of the process of labor and delivery (minimal intervention vs. use of the full range of available obstetric, anesthetic, and pediatric technologies) factors into decisionmaking is also unclear.
Clinical assessment is equivalent to ultrasound in predicting macrosomia. However, there is no evidence that prior knowledge of estimated fetal weight improves outcomes for either infant or mother.
Clinical examination of the cervix may help predict successful induction. However, individual components of the examination exhibit substantial inter- and intraobserver variability.
Published data do not allow estimation of the cost-effectiveness of tests of fetal wellbeing.
Question 2: What is the direct evidence comparing the benefits, risks, and costs of planned induction versus expectant management at various gestational ages?
As with all of the questions addressed in this report, the issue of the appropriate gestational age to consider " postdate" or "postterm" was difficult to resolve. After extensive discussion with the project's advisory panel, a consensus was reached that we would include any articles where the proposed benefit of the planned induction was reduction in maternal or fetal risk associated with prolonged pregnancy, even at 40 weeks gestation. Active interventions performed prior to or shortly after term (such as nipple stimulation or membrane sweeping) that are designed to decrease the proportion of women who go beyond 41 or 42 weeks are discussed under Question 3, below.
Up to this point in the report, we have:
Found evidence from observational studies of an increasing risk of adverse perinatal events as gestational age advances beyond term. Although the precise degree of this risk is unclear and may be affected by confounding, the pattern is quite consistent.
Found in our review of antepartum tests of fetal well being in prolonged pregnancy that the sensitivity of such tests was much lower than the specificity, while the negative predictive value was much higher than the positive predictive value.
Discussed the fact that these two findings, when taken together, suggest that the negative predictive value of antepartum testing will decrease as gestational age advances.
If negative predictive value does decrease with advancing gestational age, then elective induction has the potential to improve outcomes by preventing adverse perinatal outcomes due to false negative test results. Whether this is the case, and whether elective induction is associated with an excess of other adverse maternal outcomes compared with expectant management and testing, is the focus of this section of the report.
Throughout this section, we use the term "expectant management," as defined by the authors of the studies reviewed, to refer to some form of ongoing assessment of fetal well being, with induction of labor based on the results of testing or upon reaching a specified gestational age in accordance with a predefined set of guidelines. As stated above, we did not identify any randomized trials that provided data on the specific population of interest where no intervention (induction or testing) was performed.
As with studies of testing, the outcomes assessed in these trials were quite variable. All studies reported on perinatal mortality and cesarean section rates, in some cases stratified by indication for induction (elective or based on abnormal test results). Additional markers of perinatal or maternal morbidity -- including Apgar scores at 1 and 5 minutes, umbilical arterial pH, the presence of meconium-stained amniotic fluid, abnormal fetal heart rate tracings during labor, instrumental deliveries, diagnosis of meconium aspiration, and admissions to neonatal intensive care units -- were inconsistently reported.
None of the included trials was able to blind physicians, midwives, and nurses to the allocated intervention or to the results of antepartum testing. Because of this, outcomes that are dependent on interpretation of fetal monitoring (such as the proportion of cesarean sections performed for fetal distress, or the overall incidence of abnormal fetal heart rate tracings) are unreliable. A diagnosis of fetal distress may be more likely in the setting of an induction performed in the expectant management arm after abnormal antepartum monitoring. Even with a normal intrapartum tracing, thresholds for performing cesarean section or operative vaginal delivery in the setting of prolonged second or third stages of labor might be different if the provider is aware of previous abnormal antepartum tests. Because of these difficulties, we focus on the overall cesarean section rate and neonatal outcomes less susceptible to bias, such as the Apgar score, pH, and admissions to the neonatal intensive care unit. Even these immediate outcomes do not provide information on the impact of maternal interventions on longer-term health outcomes of these children.
The included trials were published between 1983 and 1997. The number of subjects in each trial was fairly small, except for the Canadian trial (Hannah, Hannah, Hellmann, et al., 1992). The overall median number of subjects was 200, ranging from 22 (Martin, Sessums, Howard, et al., 1989) to 3,418 (Hannah, Hannah, Hellmann, et al., 1992).
NR = not reported
A meta-analysis performed as part of a recent Cochrane review (Crowley, 2000) showed that this reduction in perinatal mortality with induction is significant only at 41 weeks or later (summary odds ratio [OR], 0.13; 95 percent confidence interval [CI], 0.01 to 2.07 before 41 weeks vs. summary OR, 0.23; 95 percent CI, 0.06 to 0.90 at 41 weeks or later).
Other perinatal outcomes examined included Apgar scores. Of the 15 included trials, 14 evaluated Apgar scores, and all but one of these found substantially equal scores in the induction and monitoring groups. Dyson, Miller, and Armstrong (1987) reported that a higher proportion of babies in the monitoring group had Apgar scores < 7 at 1 minute (21 percent vs. 11 percent in the induction group); however, similar proportions of infants in the two groups had scores < 7 at 5 minutes. There is evidence, based on these trials, to conclude that Apgar scores do not change significantly when comparing induction versus monitoring of pregnancies.
Only one trial (Cardozo, Fysh, and Pearce, 1986) measured patient satisfaction, patient preferences, or quality of life. There were no significant differences in the proportion of patients "pleased" with (49 percent, planned induction; 53 percent, expectant management) or "disappointed" by (15 percent, planned induction; 11 percent, expectant management) their management.
Hyperstimulation of the uterus from induction agents can result in fetal compromise, leading to the need for cesarean section or even fetal death. Because fetal compromise in labor with subsequent need for cesarean section is also associated with prolonged gestation, differences in "risks" for fetal compromise between planned induction and expectant management are the inverse of differences in "benefits" and are discussed above.
Continued fetal growth during expectant management could conceivably lead to an increased risk of macrosomia and shoulder dystocia. In the study by Dyson, Miller, and Armstrong (1987), the proportion of infants with a birthweight greater than 4,000 grams was higher in the expectant management group (28.2 percent) than in the induction group (19.1 percent), though the difference did not reach statistical significance, and no correlation with shoulder dystocia or birth injury was reported. Katz, Yemini, Lancet, et al. (1983) also reported that the incidence of birthweight greater than 4,000 grams was higher in the expectant management group (29.5 percent vs. 7.9 percent; p < 0.05), but again no correlation with birth injury was reported. Ohel, Rahav, Rothbart, et al. (1996) found no difference in the proportion of infants with a birthweight greater than 4,000 grams (8.6 percent vs. 8.7 percent). Augensen, Bergsjø, Eikeland, et al. (1987) reported only one case of "difficult shoulder delivery" in the entire study.
In the two large multicenter trials comparing planned induction and expectant management, there were no significant differences in reported rates of macrosomia, shoulder dystocia, or birth injury to the fetus. In the National Institute of Child Health and Human Development (NICHD) Maternal-Fetal Network Trial (National Institute of Child Health and Human Development Network of Maternal-Fetal Medicine Units, 1994), the incidence of birthweight greater than 4,500 grams was similar in the two induction arms and the expectant management arm, and there was only one case of nerve injury (in one of the induction arms). In the even larger Canadian Multicenter Post-term Pregnancy Trial (Hannah, Hannah, Hellmann, et al., 1992), neither the proportion of infants with a birthweight greater than 4,500 grams (4.6 percent in the induction group vs. 5.5 percent in the expectant management group), nor the incidence of shoulder dystocia (1.4 percent in the induction group vs. 1.6 percent in the expectant group) was significantly different in the two groups.
These results suggest, as would be expected, that continued growth occurs in most infants managed expectantly, resulting in higher proportions of infants over 4,000 grams. Since there is debate as to whether weights between 4,000 and 4,500 grams have any clinical relevance (ACOG, 2000), it is not surprising that there are no reported differences in birth injury. The fact that trials that defined macrosomia as greater than 4,500 grams found no difference in either the proportion of babies weighing more than 4,500 grams or incidence of shoulder dystocia suggests that elective induction at a predefined gestational age does not have prophylactic benefit -- i.e., induction at a given gestational age prior to the development of "macrosomia" does not have an impact on shoulder dystocia.
| Study | Induction Group | Monitoring Group |
|---|---|---|
| Augensen, Bergsjø, Eikeland, et al., 1987 | 14/214 (6.5%) | 15/195 (7.7%) |
| Bergsjø, Huang, Yu, et al., 1989 | 27/94 (28.7%)* | 39/94 (41.5%)* |
| Cardozo, Fysh, and Pearce, 1986 | 25/195 (13%) | 18/207 (9%) |
| Dyson, Miller, and Armstrong, 1987 | 22/152 (14.5%)* | 41/150 (27.3%)* |
| Egarter, Kofler, Fitz, et al., 1989 | 2/180 (1.1%) | 3/165 (1.8%) |
| El-Torkey and Grant, 1992 | 5/33 (15%) | 4/32 (12.5%) |
| Hannah, Hannah, Hellmann, et al., 1992 | 360/1701 (21.2%)* | 418/1706 (24.5%)* |
| Heden, Ingemarsson, Ahlstrom, et al., 1991 | 10/109 (9.2%) | 9/127 (7.0%) |
| Herabutya, Prasertsawat, Tongyai, et al., 1992 | 27/57 (47.4%) | 24/51 (47.1%) |
| Katz, Yemini, Lancet, et al., 1983 | 16/78 (20.5%)* | 7/78 (8.8%)* |
| Martin, Sessums, Howard, et al., 1989 | 2/12 (17 %)* | 1/10 (10 %)* |
| National Institute of Child Health and Human Development Network of Maternal-Fetal Medicine Units, 1994 | 55/265 (20.8%) | 32/175 (18.3%) |
| Ohel, Rahav, Rothbart, et al., 1996 | 4/70 (5.7%) | 6/104 (5.8%) |
| Witter and Weitz, 1987 | 30/103 (29.1%) | 27/97 (27.8%) |
Rates given represent overall cesarean section rates except in the case of Cardozo, Fysh, and Pearce (1986), which reported only emergency cesarean section rates.
* Statistically significant difference
| Group | Odds Ratio | 95% Confidence Interval |
|---|---|---|
| By gestational age: | ||
| < 41 weeks | 0.6 | 0.35 to 1.03 |
| > 41 weeks | 0.87 | 0.77 to 0.99 |
| By parity: | ||
| Primigravid | 0.75 | 0.64 to 0.88 |
| Multigravid | ||
| Bishop Score < 6 | 1.02 | 0.75 to 1.38 |
| By overall cesarean section rate: | ||
| < 10% | 0.89 | 0.58 to 1.39 |
| > 10% | 0.87 | 0.77 to 1.00 |
| By class of induction agent: | ||
| Oxytocin | 0.85 | 0.74 to 0.98 |
| Prostaglandins | 0.84 | 0.65 to 1.08 |
Data from Crowley (2000); odds ratios < 1 indicate that the cesarean section rate was lower in the elective induction group.
Hannah, et al., published an interesting reanalysis of the Canadian study in 1996 (Hannah, Huh, Hewson, et al., 1996). In this new analysis, women who were randomized to induction or expectant management were stratified based on whether labor was ultimately induced or spontaneous. In the induction arm, 772/1,149 women (67.7 percent) were induced, while 377/1,149 (33.3 percent) went into spontaneous labor prior to scheduled induction. In the expectant management group, 405/1,128 (35.9 percent) were induced for various indications, while 723/1,128 (64.1 percent) went into spontaneous labor. There were no significant differences in cesarean section rates between women randomized to induction who were induced (29.5 percent), women randomized to induction who went into spontaneous labor (25.7 percent), and women who were managed expectantly who went into spontaneous labor (25.7 percent). However, the cesarean section rate was significantly increased in women randomized to expectant management who were induced (42.0 percent). These women were significantly more likely to be nulliparous, to have a closed cervix at the onset of labor, and to have a longer interval from induction to delivery. When compared with the expectantly managed women in spontaneous labor, they had significantly higher cesarean section rates for fetal distress or dystocia; such differences were not seen when the two subgroups in the induction arm were compared.
These differences are consistent with several findings discussed earlier in this report:
Women whose onset of labor is considerably later than average may represent a distinct subgroup with different physiological characteristics of the uterus and cervix. This is consistent with the higher proportion of women with closed cervices and may also explain the higher rates of cesarean section for dystocia. This also may be related to parity. Presumably, women are included in this group who reach a predefined date for induction without going into spontaneous labor and with normal antepartum testing.
Provider knowledge of antepartum testing results may affect thresholds for cesarean delivery. It seems likely that providers caring for women whose inductions were indicated because of abnormal antepartum tests would be less tolerant of intrapartum fetal heart rate abnormalities or less likely to tolerate labor progress that was slower than average. This would explain some of the differential rates by indication.
As Crowley (2000) points out, women induced in the expectant management arm were less likely to receive prostaglandins. This would be a bias in favor of induction. The reanalysis by Hannah and colleagues (Hannah, Huh, Hewson, et al., 1996) models this based on assumptions about prostaglandin efficacy, and finds that, at worst, there would be no difference in cesarean section rates between groups. In addition, our review of the literature on induction agents (discussed under Question 3) suggests that the effectiveness of prostaglandins in terms of expediting delivery may be proportional to risk of fetal heart rate abnormalities in labor. If this is the case, then any decrease in cesarean section rates for failed induction or dystocia might well be accompanied by an increase in cesarean sections for fetal distress.
In summary, the randomized trial literature consistently shows that elective induction does not result in increased cesarean section rates compared with management strategies based on antepartum testing. If anything, cesarean section rates are slightly lower in women who are electively induced.
No studies reported specifically on maternal trauma related to vaginal delivery. Because operative vaginal delivery is clearly associated with an increased risk of maternal injury (Johanson and Menon, 2001), evidence of a difference in the rates of operative vaginal delivery in one group or the other would be suggestive of an increased risk of trauma to the pelvic floor, vagina, or perineum. In seven of the eight studies where this outcome was reported (Bergsjø, Huang, Yu, et al., 1989; Cardozo, Fysh, and Pearce, 1986; Egarter, Kofler, Fitz, et al., 1989; El-Torkey and Grant, 1992; Hannah, Hannah, Hellmann, et al., 1992; Herabutya, Prasertsawat, Tongyai, et al., 1992; Martin, Sessums, Howard, et al., 1989), there were no significant differences between the induction and expectant management groups. In the remaining trial (Hedén, Ingemarsson, Ahlström, et al., 1991), there was a significant difference, with 2.8 percent of the induction group and 15.5 percent of the expectant management group undergoing operative vaginal delivery (p < 0.01); the majority of these deliveries in both groups were for "secondary arrest." There are no obvious reasons why the results of this study varied so dramatically from the others. Mean birthweight in the two groups was similar. The standard deviation of the preintervention Bishop score was slightly wider in the expectant management group, and the method of randomization was based on a registration number rather than on randomly generated numbers. One possible explanation for the study's finding on operative vaginal delivery is that the pseudorandomization scheme resulted in some systematic differences in the groups. Another possibility is that use of oxytocin for labor augmentation may have been less aggressive in the expectant management group for some reason.
Overall, the studies reviewed suggest that there is no difference in operative vaginal delivery rates between expectant management and planned induction protocols.
There were no differences in the risk of maternal infection or other morbidity in three of the four trials that reported these outcomes (El-Torkey and Grant, 1992; National Institute of Child Health and Human Development Network of Maternal-Fetal Medicine Units, 1994; Witter and Weitz, 1987). In the remaining, very small trial (Martin, Sessums, Howard, et al., 1989), the proportion of women with "maternal morbidity" was higher in the induction arm (4/12, or 33 percent) than in the expectant management arm (2/10, or 20 percent). No significance testing was reported.
Only two studies reported direct measures of cost, the Canadian Multicenter Post-term Pregnancy Trial (Hannah, Hannah, Hellmann, et al., 1992) and a smaller study by Witter and Weitz (1987). The Canadian study found that induction of labor was associated with a lower cost compared with monitoring. The mean cost per patient (in 1991 Canadian dollars) of a prolonged pregnancy managed through monitoring was $3,132 (95 percent CI, $3,090 to $3,174), compared with induction, which cost $2,939 (95 percent CI, $2,898 to $2,981) per patient. The difference between the two groups ($193 per patient) was statistically significant. The authors of the study estimated that switching to planned induction could save up to $8 million per year in Canada.
Witter and Weitz (1987) found, on the contrary, that mean costs were higher for planned induction than for monitoring by approximately $250 per patient. This study had a much smaller patient population (n = 200). Because costs frequently are not normally distributed, the effects of a few patients with complications or very long stays may be magnified compared with a larger study.
Several studies that did not report direct costs did report outcomes that are indirect measures of resource use, such as overall length of maternal or infant stay in the hospital. The extent to which these results are generalizable is limited, since length of stay varies internationally and has changed dramatically in the United States over recent years. Moreover, overall length of stay may not be entirely related to overall resource use (Tai-Seale, Rodwin, and Wedig, 1999). For women delivering in a hospital, the majority of resource use occurs during the time from admission to delivery, with a sharp decrease after delivery and even further decreases after the first 24 hours. Thus, even if the mean length of stay is equivalent between two groups, the resource use may vary widely depending on what proportion of the time was spent in the delivery suite. In addition, studies that report only hospital use and not outpatient use of resources (for antepartum testing, other office visits, etc.) will not reflect the overall medical costs of a particular strategy. Finally, none of the included studies addressed the nonmedical costs -- such as transportation, time lost from work, child care for women with other children, and so on -- associated with various strategies for managing prolonged pregnancy.
| Study | Mean Maternal Length of Stay (in Days) | P-value | |
|---|---|---|---|
| Induction | Expectant Management | ||
| Augensen, Bergsjø, Eikeland, et al., 1987 | 7.05 | 6.69 | < 0.02 |
| Martin, Sessums, Howard, et al., 1989 | 3.4 | 2.6 | NS |
| Witter and Weitz, 1987 | 4.74 | 4.06 | < 0.02 |
| Bergsjø, Huang, Yu, et al., 1989 | 7.9 | 8.1 | Not reported |
| Dyson, Miller, and Armstrong, 1987 | 3.2 | 3.5 | < 0.04 |
| Hannah, Hannah, Hellman, et al., 1992 | 3.9 | 4.0 | Not reported |
NS = Not significant
Only one study (Dyson, Miller, and Armstrong, 1987) reported data on mean neonatal length of stay, with no significant differences between the induction and expectant management groups (3.0 days vs. 3.3 days, respectively).
| Study | Gesta-tional age* | N | Induction Method | Monitoring Method | Perinatal Outcomes | ||
|---|---|---|---|---|---|---|---|
| Induction | Monitoring | P-value | |||||
| Ohel, Rahav, Rothbart, et al., 1996; Ohel, Rahav, Rothbart, et al., 1996 | 40 weeks | 200 | PGE2 | Fetal monitoring: 2x/week | Perinatal death: NR Apgar (mean @ 5 min) 9.5 NICU admission: NR | Perinatal death: NR Apgar (mean@ 5 min) 9.4 NICU admission: NR | NS |
| Egarter, Kofler, Fitz, et al., 1989 | 40 weeks | 356 | PGE2, repeated if necessary | Nonstress test: q 2-3 days | Perinatal death: 0 Apgar - low scores were not different between gps NICU admission: NR | Perinatal death: 1 Apgar (see induction) NICU admission: NR | NR |
| El-Torkey and Grant, 1992 | 41 weeks | 65 | Membrane sweeping | No vaginal examination | Perinatal death: 0 Apgar < 6 @ 5 min: 3% NICU admission: NR | Perinatal death:) Apgar < 6@5 min: 3% NICU admission: NR | NR 0.98 |
| Dyson, Miller, and Armstrong, 1987 | 41 weeks | 302 | PGE2 as outpatient, repeated if necessary, then oxytocin if needed | Nonstress test: 2x/week, Amniotic fluid index: once between 41 and 42 weeks, twice weekly after 42 weeks | Perinatal death: 0 Apgar < 7 @ 5 min: 11.2% NICU admission: NR | Perinatal death: 1 Apgar < 7 @ 5 min: 21.3% NICU admission: NR | NS <0.02 |
| National Institute of Child Health and Human Development Network of Maternal-Fetal Medicine Units, 1994 | 41 weeks | 440 | 1) PGE2 only 2) PGE2 plus oxytocin | Nonstress test: 2x/week AFI: 2x/week | Perinatal death: 0 Apgar < 4 @ 5 min: 0 NICU admission: NR | Perinatal death: 0 Apgar<4@5 min: <1% NICU admission: NR | NR NR |
| Hannah, Hannah, Hellmann, et al., 1992 | 41 weeks | 3418 | PGE2, oxytocin | Kick counts once a day Nonstress test: 3x/week Amniotic fluid index: 2-3x/week | Perinatal death: 0 Apgar < 7 @ 5 min: 1.1% NICU admission:4% | Perinatal death: 2 Apgar<7@5min: 1.2% NICU admission: 1% | NR NS NS |
| Cardozo, Fysh, and Pearce, 1986 | 41-5/7 weeks | 402 | PGE2 and oxytocin | Kick counts daily Ultrasound: AFI once, US also done to determine ratio of head circumference to abdominal circumference Fetal monitoring: NST every other day | Perinatal death: 0 Apgar < 5@ 5 min: 1% NICU admission: 3% | Perinatal death: 1 Apgar < 7@5 min: 2% NICU admission:1.5% | NR NS NS |
| Pearce and Cardozo, 1988 | 41-5/7 weeks | 281 | PGE2 and oxytocin | Kick counts daily Ultrasound: AFI once, US also done to determine ratio of head circumference to abdominal circumference Fetal monitoring: NST every other day | Perinatal death: 0 Apgar < 5 @ 5 min: 1% NICU admission: 4% | Perinatal death: 1 Apgar < 5@ 5 min:1% NICU admission: 1% | NR NS NS |
| Augensen, Bergsjø, Eikeland, et al., 1987 | 41-5/7 weeks | 409 | Oxytocin, amniotomy | Nonstress test: immediate NST, repeated 3-4 days if undelivered | Perinatal death: 0 Apgar scores were "evenly distributed" NICU admission: 5.6% | Perinatal death: 0 Apgar (see induction) NICU admission: 7.7% | NR NR NR |
| Martin, Sessums, Howard, et al., 1989 | 42 weeks | 22 | Laminaria, oxytocin | NST/CST weekly Amniotic fluid index: weekly | Perinatal death: 0 Apgar (mean) 9.75 NICU admission: NR | Perinatal death: 0 Apgar (mean): 9.7 NICU admission: NR | NR NS |
| Herabutya, Prasertsawat, Tongyai, et al., 1992 | 42 weeks | 108 | PGE2, amniotomy, oxytocin | NST: weekly until 43 weeks, then 2x/week | Perinatal death: 0 Apgar < 7 @ 5 min: 1.8% NICU admission:1.8% | Perinatal death: 0 Apgar < 7 @ 5 min: 7.8% NICU admission: 7.8% | NR 0.19 0.19 |
| Katz, Yemini, Lancet, et al., 1983 | 42 weeks | 156 | Amniotomy, oxytocin | Fetal monitoring: twice daily fetal movement counts Amnioscopy and OCT every 3 days | Perinatal death: 1.3% Apgar < 7 @ 5 min: 3.8% NICU admission: NR | Perinatal death: 1.3% Apgar< 7@ 5min: 1.3% NICU admission: NR | NS NR |
| Bergsjø, Huang, Yu, et al., 1989 | 42 weeks | 188 | Membrane stripping and oxytocin | Monitored group was admitted to hospital for one week w/close daily clinical surveillance "fetal movement test, atropine test, ultrasound and urinary estriol excretion tests were also employed" | Perinatal death: 0 Apgar no quantitative data "distributions were almost equal between the groups" NICU admission: NR | Perinatal death: 1 Apgar (see induction) NICU admission: NR | |
| Witter and Weitz, 1987 | 42 weeks | 200 | Oxytocin, amniotomy | Fetal movement counts 3x/day OCT if decreased FM Urinary estriol once (41-42 wks), twice during 42-43 wks, and 3x if >43 wks | Perinatal death: NR Apgar < 7 @ 5 min: 0 NICU admission: NR | Perinatal death: NR Apgar < 7 @ 5 min: 2.08% NICU admission: NR | NS |
| Hedén, Ingemarsson, Ahlström, et al., 1991 | 42 weeks | 238 | Amniotomy, oxytocin | Nonstress test every other day Amniotic fluid index: once weekly Oxytocin challenge test: if nonreactive nonstress test | Perinatal death: NR Apgar < 7 @ 5 min: 2.8% NICU admission: 9.2% | Perinatal death: NR Apgar <7@5 min: 0.8% NICU admission: 6.2% | NS NS |
*At randomization
NICU = Neonatal intensive care unit; NR = Not reported; NS = Not significant
OCT = Oxytocin challenge test; FM = Fetal movement
| Study | Gestational age* | N | Induction Method | Monitoring Method | Maternal Outcomes | ||
|---|---|---|---|---|---|---|---|
| Induction | Monitoring | P-value | |||||
| Ohel, Rahav, Rothbart, et al., 1996 | 40 weeks | 200 | PGE2 | Fetal monitoring: 2x/week | Cesarean section: 5.7% Operative vaginal delivery: NR | Cesarean section: 5.8% Operative vaginal delivery: NR | NS |
| Egarter, Kofler, Fitz, et al., 1989 | 40 weeks | 356 | PGE2, repeated if necessary | Nonstress test: q 2-3 days | Cesarean section: 1.1% Operative vaginal delivery: 2% | Cesarean section: 1.8% Operative vaginal delivery: 1.8% | NR NR |
| El-Torkey and Grant, 1992 | 41 weeks | 65 | Membrane sweeping | No vaginal examination | Cesarean section: 15% Operative vaginal delivery: 6% | Cesarean section: 12.5% Operative vaginal delivery: 9% | 0.76 0.62 |
| Dyson, Miller, and Armstrong, 1987 | 41 weeks | 302 | PGE2 as outpatient, repeated if necessary, then oxytocin if needed | Nonstress test: 2x/week, Amniotic fluid index: once between 41 and 42 weeks, twice weekly after 42 weeks | Cesarean section: 14.5% Operative vaginal delivery: NR | Cesarean section: 27.3% Operative vaginal delivery: NR | < 0.01 |
| National Institute of Child Health and Human Development Network of Maternal-Fetal Medicine Units, 1994 | 41 weeks | 440 | 1) PGE2 only 2) PGE2 plus oxytocin | Nonstress test: 2x/week AFI: 2x/week | Cesarean section: 20.8% Operative vaginal delivery: NR | Cesarean section: 18.3% Operative vaginal delivery: NR | NS |
| Hannah, Hannah, Hellmann, et al., 1992 | 41 weeks | 3418 | PGE2, oxytocin | Kick counts: once a day Nonstress test: 3x/week Amniotic fluid index: 2-3x/week | Cesarean section: 21.2% Operative vaginal delivery: 35.3% | Cesarean section: 24.5% Operative vaginal delivery: 34.9% | = 0.03 NR |
| Pearce and Cardozo, 1988 | 41-5/7 weeks | 281 | PGE2 and oxytocin | Kick counts daily Ultrasound: AFI once, US also done to determine ratio of head circumference to abdominal circumference Fetal monitoring: NST every other day | Cesarean section: 3% Operative vaginal delivery: 26% | Cesarean section: 1% Operative vaginal delivery: 28% | NS NS |
| Cardozo, Fysh, and Pearce, 1986 | 41-5/7 weeks | 402 | PGE2 and oxytocin | Kick counts daily Ultrasound: AFI once, US also done to determine ratio of head circumference to abdominal circumference Fetal monitoring: NST every other day | Cesarean section: 13% Operative vaginal delivery: 20% Only reported Emergency C/S | Cesarean section: 9% Operative vaginal delivery: 26% | NS NS |
| Augensen, Bergsjø, Eikeland, et al., 1987 | 41-5/7 weeks | 409 | Oxytocin, amniotomy | Nonstress test: immediate NST, repeated 3-4 days if undelivered | Cesarean section: 6.5% Operative vaginal delivery: 17.2% | Cesarean section: 7.7% Operative vaginal delivery: 20.5% | NR NR |
| Martin, Sessums, Howard, et al., 1989 | 42 weeks | 22 | Laminaria, oxytocin | NST/CST weekly Amniotic fluid index: weekly | Cesarean section: 17% Operative vaginal delivery: 25% | Cesarean section: 10% Operative vaginal delivery: 25% | NS NS |
| Herabutya, Prasertsawat, Tongyai, et al., 1992 | 42 weeks | 108 | PGE2, amniotomy, oxytocin | NST: weekly until 43 weeks, then 2x/week | Cesarean section: 47.4% Operative vaginal delivery: 19.3% | Cesarean section: 47.1% Operative vaginal delivery: 17.6% | 0.87 0.98 |
| Katz, Yemini, Lancet, et al., 1983 | 42 weeks | 156 | Amniotomy, oxytocin | Fetal monitoring: twice daily fetal mvmt counts Amnioscopy and OCT every 3 days | Cesarean section: 20.5% Operative vaginal delivery: NR | Cesarean section: 8.8% Operative vaginal delivery: NR | < 0.05 |
| Bergsjø, Huang, Yu, et al., 1989 | 42 weeks | 188 | Membrane stripping and oxytocin | Monitored gp was admitted to hospital for one week w/close daily clinical surveillance "fetal movement test, atropine test, ultrasound and urinary estriol excretion tests were also employed" | Cesarean section: 28.7% Operative vaginal delivery: 22.4% | Cesarean section: 41.5% Operative vaginal delivery: 26.6% *only p value reported compared operative deliveries (c-section. Forceps, and vacuum) | < 0.05 |
| Witter and Weitz, 1987 | 42 weeks | 200 | Oxytocin, amniotomy | Fetal movement counts 3x/day OCT if decreased FM Urinary estriol once (41-42 wks), twice during 42-43 wks, and 3x if >43 wks | Cesarean section: 29.13% Operative vaginal delivery: NR | Cesarean section: 27.83% Operative vaginal delivery: NR | NS |
| Hedén, Ingemarsson, Ahlström, et al., 1991 | 42 weeks | 238 | Amniotomy, oxytocin | Nonstress test every other day Amniotic fluid index: once weekly Oxytocin challenge test: if nonreactive nonstress test | Cesarean section: 9.2% Operative vaginal delivery: 2.8% | Cesarean section: 7.0% Operative vaginal delivery: 15.5% | NS < 0.01 |
*At randomization
NR = Not reported; NS = Not significant
OCT = Oxytocin challenge test; FM = Fetal movement
| Study | Gesta-tional age* | N | Induction Method | Monitoring Method | Resource Utilization | ||
|---|---|---|---|---|---|---|---|
| Induction | Monitoring | P-value | |||||
| Ohel, Rahav, Rothbart, et al., 1996 | 40 weeks | 200 | PGE2 | Fetal monitoring: 2x/week | Total costs: NR Length of stay: NR Time to delivery (days to delivery from entrance into trial = 1.6 ± 1.9 | Total costs: NR Length of stay: NR Time to delivery (see induction) = 5.2 ± 3.7 | NR < 0.001 |
| Egarter, Kofler, Fitz, et al., 1989 | 40 weeks | 356 | PGE2, repeated if necessary | Nonstress test: every 2-3 days | Total costs: NR Length of stay: NR Time to delivery (onset of contractions to delivery = 7.1 hours | Total costs: NR Length of stay: NR Time to delivery = 10.6 hrs p-value not reported, but authors report "statistically significant" | See text |
| El-Torkey and Grant, 1992 | 41 weeks | 65 | Membrane sweeping | No vaginal examination | Total costs: NR Length of stay: NR Time to delivery: NR | Total costs: NR Length of stay: NR Time to delivery: NR | |
| Dyson, Miller, and Armstrong, 1987 | 41 weeks | 302 | PGE2 as outpatient, repeated if necessary, then oxytocin if needed | Nonstress test: 2x/week, Amniotic fluid index: once between 41 and 42 weeks, twice weekly after 42 weeks | Total costs: NR Length of stay (maternal hospital stay/mean = 3.2 days Time to delivery: NR | Total costs: NR Length of stay: 3.5 days Time to delivery: NR | < 0.04 < 0.001 |
| National Institute of Child Health and Human Development Network of Maternal-Fetal Medicine Units, 1994 | 41 | 440 | 1) PGE2 only 2) PGE2 plus oxytocin | Nonstress test: 2x/week AFI: 2x/week | Total costs: NR Length of stay: NR Time to delivery: 36 hours and 35 hours (2 induction gps) | Total costs: NR Length of stay: NR Time to delivery: 85 hours | < 0.001 |
| Hannah, Hannah, Hellmann, et al., 1992 | 41 weeks | 3418 | PGE2, oxytocin | Kick counts once a day Nonstress test: 3x/week Amniotic fluid index: 2-3x/week | Total costs: $ 2502 Length of stay: 3.9 days Time to delivery: "women in the induction group were less likely not to have delivered their babies seven or more days after randomization" 5.1% vs 26% p<0.001 | Total costs: $2684 Length of stay: 4.0 days Time to delivery: see induction | |
| Pearce and Cardozo, 1988 | 41-5/7 weeks | 281 | PGE2 and oxytocin | Kick counts daily Ultrasound: AFI once, US also done to determine ratio of head circumference to abdominal circumference Fetal monitoring: NST every other day | Total costs: NR Length of stay: NR Time to delivery: NR | Total costs: NR Length of stay: NR Time to delivery: NR | |
| Cardozo, Fysh, and Pearce, 1986 | 41-5/7 weeks | 402 | PGE2 and oxytocin | Kick counts daily Ultrasound: AFI once, US also done to determine ratio of head circumference to abdominal circumference Fetal monitoring: NST every other day | Total costs: NR Length of stay: NR Time to delivery: NR | Total costs: NR Length of stay: NR Time to delivery: NR | |
| Augensen, Bergsjø, Eikeland, et al., 1987 | 41-5/7 weeks | 409 | Oxytocin, amniotomy | Nonstress test: immediate NST, repeated 3-4 days if undelivered | Total costs: NR Length of stay: 7.05 days Time to delivery: 1.4 days | Total costs: NR Length of stay: 6.69 days Time to delivery: 4.0 days | 0.02 NR |
| Martin, Sessums, Howard, et al., 1989 | 42 weeks | 22 | Laminaria, oxytocin | NST/CST weekly Amniotic fluid index: weekly | Total costs: NR Length of stay: 3.41 days Time to delivery (length of labor): 6.3 hours | Total costs: NR Length of stay: 2.6 days Time to delivery: 8.3 hours | NS NS |
| Herabutya, Prasertsawat, Tongyai, et al., 1992 | 42 weeks | 108 | PGE2, amniotomy, oxytocin | NST: weekly until 43 weeks, then 2x/week | Total costs: NR Length of stay: NR Time to delivery: NR | Total costs: NR Length of stay: NR Time to delivery: NR | |
| Katz, Yemini, Lancet, et al., 1983 | 42 weeks | 156 | Amniotomy, oxytocin | Fetal monitoring: twice daily fetal mvmt counts Amnioscopy and OCT every 3 days | Total costs: NR Length of stay: NR Time to delivery (duration of labor): 9.4 hrs | Total costs: NR Length of stay: NR Time to delivery: 6.7 hrs | < 0.01 |
| Bergsjø, Huang, Yu, et al., 1989 | 42 weeks | 188 | Membrane stripping and oxytocin | Monitored gp was admitted to hospital for one week w/close daily clinical surveillance "fetal movement test, atropine test, ultrasound and urinary estriol excretion tests were also employed" | Total costs: NR Length of stay: 7.9 days Time to delivery: -reported only in vaginal deliveries (no c/s), duration of labor from start of painful contractions of 10 min interval to birth was = 12 hrs 10 mins | Total costs: NR Length of stay: 8.1 days Time to delivery (see induction): 10 hrs 38 min | NR |
| Witter and Weitz, 1987 | 42 weeks | 200 | Oxytocin, amniotomy | Fetal movement counts 3x/day OCT if decreased FM Urinary estriol once (41-42 wks), twice during 42-43 wks, and 3x if >43 wks | Total costs: NR Length of stay: 4.06 Time to delivery: NR | Total costs: NR Length of stay: 4.74 Time to delivery: NR | < 0.05 |
| Hedén, Ingemarsson, Ahlström, et al., 1991 | 42 weeks | 238 | Amniotomy, oxytocin | Nonstress test every other day Amniotic fluid index: once weekly Oxytocin challenge test: if nonreactive nonstress test | Total costs: NR Length of stay: NR Time to delivery: NR | Total costs: NR Length of stay: NR Time to delivery: NR | |
* At randomization
NR = Not reported
OCT = Oxytocin challenge test; FM = Fetal movement
All of the included trials were described as "randomized." Four were in fact only pseudorandomized (i.e, treatment was allocated based alternate medical record numbers or birth dates, rather than by randomly generated numbers), which introduces the possibility of bias (Cardozo, Fysh, and Pearce, 1986; Hedén, Ingemarsson, Ahlström, et al., 1991; Katz, Yemini, Lancet, et al., 1983; Ohel, Rahav, Rothbart, et al., 1996). Two studies did not describe the method of randomization used (Egarter, Kofler, Fitz, et al., 1989; Herabutya, Prasertsawat, Tongyai, et al., 1992).
As discussed above and pointed out by Crowley (2000), the practical and ethical difficulties of blinding clinicians to either the target intervention or the results of antepartum testing results in an inherent bias against expectant management. Abnormal antenatal monitoring could influence a clinician's thresholds for performing a cesarean section, either by making the diagnosis of "fetal distress" more likely or by a decreased willingness to augment labor aggressively.
In any trial of planned induction versus expectant management with antepartum testing, a certain proportion of women randomized to planned induction will go into spontaneous labor, while a proportion of women randomized to expectant management will have abnormal antepartum testing results; or, as observed in the Canadian Multicenter Post-term Pregnancy Trial (Hannah, Hannah, Hellmann, et al., 1992), patients or providers may request induction. These subjects are quite correctly analyzed in the groups to which they are randomized, rather than in accordance with the "treatment" received, since the trial is not comparing spontaneous delivery to induction, but instead, management strategies undertaken with the knowledge that some women will deliver spontaneously prior to scheduled induction, and some women will require (or request) induction during expectant management.
All studies reported results for "hard" outcomes such as perinatal mortality and cesarean section rates. Reporting of other outcomes of interest was more variable. Many outcomes are subject to inherent difficulties with reproducibility and bias (e.g., the diagnosis of "fetal distress"), variability in operator preferences and skills (e.g., operative vaginal delivery rates), or are of uncertain long-term clinical significance (e.g., meconium-stained amniotic fluid in the absence of meconium aspiration, or Apgar scores). Other measures, such as patient preferences for different management strategies, longer-term neonatal outcomes, and vaginal and perineal trauma, would be of significant interest to patients, clinicians, and policymakers. We identified one cohort study published in 1991 which showed that patients' preferences for induction versus expectant management changed with advancing gestation: 45 percent of women preferred conservative management at 37 weeks, compared with 31 percent at 41 weeks (Roberts and Young, 1991). Measurement of these preferences in light of data published subsequent to this study, and using methods developed and refined in the past decade, is needed. Detailed measurement of both medical and nonmedical costs is also lacking in the studies reviewed.
The gestational age at which interventions were begun, as well as the methods used for induction and monitoring, varied between studies. Because variability in these methods may result in quite different outcomes, caution should be used when comparing outcomes that could possibly be affected by different methods of labor induction (such as cesarean section rates or time spent in labor) or different protocols for fetal monitoring (such as perinatal mortality) between studies. In addition, clinical management decisions may vary between practitioners. Especially in smaller trials, unequal distribution of different practitioners with different preferences and thresholds for management of labor may have resulted in some differences in outcomes.
Readers also must consider the degree to which these studies are generalizable to particular settings. If these methods or protocols are substantially different from those used in a particular setting, then the results may not be applicable. For example, the Canadian Multicenter Post-term Pregnancy Trial did not use prostaglandins for induction of women with abnormal antepartum testing (Crowley, 2000; Hannah, Hannah, Hellmann, et al., 1992). Use of prostaglandins could have changed the results by yielding lower cesarean rates in the induction arm through more successful inductions, as pointed out by Crowley (2000). On the other hand, the use of these agents in women with potentially compromised fetuses could have resulted in even higher cesarean section rates because of fetal compromise. A reanalysis of the Canadian trial using published success rates for prostaglandins found that more liberal use of these agents would still lead to a significantly higher cesarean section rate in the expectant management group because the cesarean section rate in the group induced because of abnormal testing would be substantially higher (Hannah, Huh, Hewson, et al., 1996).
Only the Canadian trial (Hannah, Hannah, Hellmann, et al., 1992) was sufficiently powered to detect differences in rare perinatal outcomes. Many of the remaining studies were also under-powered to detect differences in dichotomous outcomes.
Inappropriate summary measures and statistical tests were frequently used (e.g., mean parity or Bishop score, with comparison by t-test, when nonparametric statistics would be more appropriate). Variables that are frequently not normally distributed, such as length of stay and costs, also were not uniformly reported using medians, and the effect of a few outliers on comparisons was not evaluated.
Despite the methodological issues raised above, there is a consistent finding that perinatal mortality rates are lower with planned induction at 41 weeks or later compared with expectant management, a finding confirmed by a formal Cochrane meta-analysis (Crowley, 2000). Based on the observed absolute risk difference, the Cochrane meta-analysis estimated that 500 inductions were necessary to prevent one perinatal death.
It is interesting to consider these findings in light of our review of antepartum tests under Question 1. We found that there was a consistent qualitative pattern for the majority of tests studied, no matter what surrogate outcome for fetal compromise was used: sensitivity was lower than specificity, while negative predictive value was higher than positive predictive value. This implies that predictive values are driven by the relatively low rates of adverse outcomes associated with fetal compromise in prolonged pregnancy. If the measures used are valid surrogates for fetal compromise leading to stillbirth, then this should hold true for stillbirth as well: the negative predictive value of antepartum tests for stillbirth should be much greater than the positive predictive value. However, as the risk of stillbirth increases with increasing gestational age after 37 weeks, the negative predictive value should decrease, and the number of stillbirths in the setting of normal test results should increase.
Elective induction of labor results in a lower risk of stillbirth only after 41 weeks. One explanation for this, consistent with the findings on antepartum tests, is that the baseline risk of stillbirth is low enough prior to 41 weeks that the negative predictive value of antepartum tests is quite good. After 41 weeks, the increasing stillbirth risk results in poorer negative predictive value, so that one would expect excess stillbirths compared with elective induction.
Other perinatal outcomes did not appear to differ significantly between induction and expectant management groups.
Maternal outcomes did not differ between women managed with antepartum monitoring or with planned induction with the agents used in these studies. Specifically, overall cesarean section rates did not differ, either globally or in the subgroups analyzed by the Cochrane group (Crowley, 2000). If anything, cesarean section rates were lower in the induced groups.
Only one large trial reported costs, and based on 1992 costs and care provided, planned induction at 41 weeks was less expensive than expectant management with antepartum testing. However, because of significant changes in the technologies used and the economics of medicine in the interim, additional research is needed to better understand the cost implications of these two strategies. For example, if elective induction at 41 weeks is deemed to be preferable from a clinical standpoint for most patients, then a thorough analysis of the resources needed to institute such a policy would have to incorporate factors such as staffing on labor and delivery suites and postpartum units, since temporal patterns of patient flow may change.
Elective induction of labor at 41 weeks consistently appears to reduce the risk of stillbirth compared with management with antepartum testing, with no increase in maternal or neonatal risks, including no increase in cesarean section rates. At least 500 inductions would be needed to prevent one stillbirth. The societal tradeoffs in terms of economic resources used are unclear because of a lack of strong data applicable to current practice. Individual patients may have different values for these outcomes or perhaps for the "process" of childbirth -- some women may place a very high value on avoiding any medical intervention.
Question 3: What are the benefits, risks, and costs of currently available interventions for induction of labor?
The evidence reviewed so far in this report suggests:
The risk of perinatal death increases with advancing gestational age.
There is no direct evidence that antepartum surveillance in prolonged gestation reduces perinatal morbidity or mortality. When surrogate measures are used as outcomes, the consistent pattern of test characteristics for tests used in antepartum surveillance is for poor sensitivity but high negative predictive value, suggesting that false negative test results will become more likely as the underlying risk of adverse outcomes increases with advancing gestational age.
Randomized trials show a reduction in perinatal mortality in women induced at 41 weeks gestation compared with women followed with antepartum testing, a finding consistent with increasing risk with advancing gestational age and with the observed patterns of test characteristics. Cesarean section rates are not increased in the elective induction arms of these studies.
Given that induction at 41 weeks appears to be effective in reducing mortality, data about the safest and most effective method of induction are needed in order to determine the optimal management strategy.
This section considers interventions designed to induce labor, including prostaglandin E2 (PGE2, or dinoprostone) gel (Prepidil®), PGE2 tablets, PGE2 insert (Cervidil®), misoprostol tablets, misoprostol gel, oxytocin, mifepristone, membrane sweeping, nipple stimulation, and other treatments. These methods are used either as primary methods of induction or as adjunctive methods in oxytocin induction. We limited our review to studies where the induction method was randomly assigned and compared with either placebo or a different induction method, and where at least some of the subjects were induced for an indication related to prolonged pregnancy. In this section, we also consider active interventions performed in the ambulatory setting at or near term that are designed to reduce the proportion of women reaching "postdates" or "postterm."
In addition to the results of our review, we report summary conclusions based on meta-analyses performed for the Royal College of Obstetricians and Gynaecologists' (RCOG) recent guideline on induction of labor (Royal College of Obstetricians and Gynaecologists, 2001) in collaboration with the Cochrane Collaboration.
We identified one randomized trial of castor oil used at term to promote spontaneous labor. Garry, Figueroa, Guillaume, et al. (2000) randomized women to 60 mg castor oil given orally in apple or orange juice (n = 52) or no treatment (n = 48). Mean gestational age was 284.4 ± 4.2 days in the castor oil group and 284.7 ± 3.6 days in the no treatment group. In the castor oil group, 57.7 percent of the subjects were in labor within 24 hours compared with 4.2 percent in the no treatment group (p < 0.001). Cesarean section rates were 19.2 percent in the castor oil group and 8.3 percent in the no treatment group (p = 0.20), but the study was underpowered to detect this difference or differences in rare outcomes such as uterine rupture. Of note, all women in the castor oil group experienced nausea. Other outcomes, such as proportion of women induced for other reasons or neonatal outcomes, were not reported.
The RCOG guideline (Royal College of Obstetricians and Gynaecologists, 2001) did not address castor oil. The most recent Cochrane review on the topic (Kelly, Kavanagh, and Thomas, 2001) identified the article cited above (Garry, Figueroa, Guillaume, et al., 2000) and reached conclusions similar to our own.
We identified two studies that evaluated the use of breast stimulation in promoting the onset of labor near term and one that evaluated breast stimulation as a method of induction. Elliot and Flaherty (1984) randomized 100 women to either breast stimulation (manual stimulation of the nipple and areola for 15 minutes, alternating breasts, for a total of 1 hour at a time, three times daily) beginning at 39 weeks or a control pelvic examination; women in the control group were asked to abstain from sexual intercourse and avoid breast stimulation. Both groups were reevaluated at 42 weeks. Women with Bishop scores of 8 or greater were induced; others were followed with contraction stress tests. Five women in the breast stimulation group reached 42 weeks, compared with 17 in the control group; significance testing was not performed. Women in the breast stimulation group were significantly less likely to be induced after 42 weeks. The study was underpowered to detect differences in important outcomes, especially for the subgroup of women beyond 42 weeks.
Kadar, Tapp, and Wong (1990) randomized women at 39 weeks to either daily unilateral manual nipple stimulation "for as long as was practically feasible" (n = 60) or to no nipple stimulation (n = 76). There were no significant differences in any of the outcomes reported, including the proportion going into spontaneous labor, postterm deliveries, or median duration of pregnancy. Survival analysis showed that duration of pregnancy was related only to gestational age at enrollment and Bishop score. The authors also noted that adherence to the prescribed regimen was poor: 70 percent of the women assigned to the nipple stimulation group either failed to perform nipple stimulation at all or did so for less than 2 hours total during the entire study.
Chayen, et al., compared nipple stimulation using an electric breast pump to oxytocin as a method of induction (Chayen, Tejani, and Verma, 1986). In this study, only 29 percent of the inductions were for prolonged pregnancy. Thirty subjects were induced initially with a breast pump, while 32 received oxytocin. Time to achieve regular contractions and adequate labor as documented by intrauterine catheter were significantly less in the breast pump group. Cesarean section rates were also lower (26.7 percent vs. 43.7 percent in the oxytocin group), although this difference was not significant. Patients in the oxytocin group were more likely to have a higher Bishop score at baseline. Results were not reported separately by parity or for the subgroup of women induced for prolonged pregnancy.
In summary, because of lack of significance testing, poor compliance, or lack of power, the available randomized trials do not allow conclusions to be drawn about the effectiveness of breast stimulation in promoting labor or as a method of induction. The RCOG guideline (Royal College of Obstetricians and Gynaecologists, 2001) did not address this topic.
We identified three randomized trials of relaxin. Evans, Dougan, Moawad, et al. (1983) randomized women at 41 weeks gestation scheduled to undergo oxytocin induction of labor to intracervical or vaginal insertion of 4 mg relaxin (n = 10), 2 mg relaxin (n = 13), or placebo (n = 14); if the patient reached 42 weeks gestation, then labor was induced. No significant differences in any parameters, including days to admission, spontaneous labor, or time to delivery, were noted. There were trends towards a shorter time to delivery in the relaxin groups, but the study was underpowered to detect a difference for this outcome.
Bell, Permezel, MacLennan, et al. (1993) randomized women scheduled for induction for prolonged pregnancy to intravaginal 1.5 mg recombinant human relaxin (n = 18) or placebo (n = 22). No significant differences in any outcomes were reported. The authors noted that a low dose was deliberately chosen to help establish a safety profile for relaxin.
Brennand, et al., randomized women between 37 and 42 weeks, "most" of whom were being induced for pregnancy-induced hypertension or prolonged pregnancy, to placebo or 1 mg, 2 mg, or 4 mg of recombinant relaxin (Brennand, Calder, Leitch, et al., 1997). There were no significant differences in any outcome except for slightly elevated baseline fetal heart rates after relaxin.
In summary, there are insufficient data available on relaxin to draw any conclusions about its safety or efficacy in induction of labor in women with prolonged pregnancy.
| Study | Membrane stripping | Comparison | N | Statistically Significant Differences |
|---|---|---|---|---|
| Idrisa, Obisesan, and Adeleye, 1993 | 41 weeks, once | Not described | 200 | Spontaneous labor within one week: 92% sweeping vs. 33% control |
| Magann, McNamara, Whitworth, et al., 1998 | 39 weeks, stripping every 3 days | Vaginal examination | 65 | Inductions at 42 weeks: 0 stripping vs. 56% controls |
| Salamalekis, Vitoratos, Kassanos, et al., 2000 | 40-41 weeks, membrane stripping once | 6-hour oxytocin infusion or cervical examination | 104 | Spontaneous labor: 68% stripping, 51% oxytocin vs. 34% control Induction of labor: 2.9% stripping, 5.7% oxytocin vs. 20% control |
| Gupta, Vasishta, Sawhney, et al., 1998 | 38 weeks, not specified if repeated | Cervical examination | 100 | Mean gestational age at delivery: 38.7 weeks stripping vs. 39.8 control Mean days from randomization to delivery: 4.6 stripping vs. 11.9 control Pregnancies beyond 40 weeks: 4% stripping vs. 34% control Induction of labor: 2% stripping vs. 32% control |
| Wiriyasirivaj, Vutyavanich, and Ruangsri, 1996 | 38 weeks, repeated weekly | Cervical examination | 120 | Proportion delivering with 7 days: 41% stripping vs. 20.3% control |
| Cammu and Haitsma, 1998 | 39 weeks, repeated weekly | "Routine pelvic examination" | 278 | Proportion reaching 41 weeks: 19% stripping vs. 33% control Induction of labor: 11% stripping vs. 26% control |
| Berghella, Rogers, and Lescale, 1996 | 38 weeks, repeated weekly | "Gentle cervical examination" | 139 | Delivery after 41 weeks: 5% stripping vs. 22% control Mean days to delivery: 8.2 days stripping vs. 12.2 days control |
| McColgin, Hampton, McCaul et al., 1990 | 38 weeks, repeated weekly | "Atraumatic assessment of cervix" | 180 | Delivery beyond 42 weeks: 3.3% stripping vs. 15.6% control Mean days to delivery: 8.6 days stripping vs. 15.1 days control Delivery within 1 week: 54.5% stripping vs. 15.6% control |
| Allott and Palmer, 1993 | 40+ weeks, one time | Cervical examination for Bishop score | 195 | Induction of labor: 8.1% stripping vs. 18.8% control Time to delivery: 2.24 days stripping vs. 5.18 days control (No differences in primigravidas with high Bishop score) |
| El-Torkey and Grant. 1992 | 41-42 weeks, twice weekly | No vaginal examination | 65 | Spontaneous labor prior to scheduled induction at 42 weeks: 76% stripping vs. 38% control |
| Crane, Bennett, Young, et al., 1997 | 38-40 weeks, once | Cervical examination | 150 | Epidural anesthesia: 66% stripping vs. 43% control Premature rupture of membranes: 6.6% stripping vs. 22% control |
| McColgin, Patrissi, and Morrison, 1990 | 38-42 weeks, repeated weekly | Cervical examination for Bishop score | 99 | Days to delivery: 6.7 stripping vs. 13.3 control Proportion delivering after 1 week: 59% stripping vs. 21% control |
All studies except one consistently showed higher rates of labor within a predefined time period, usually 1 week, in women randomized to active membrane sweeping. The proportion of women induced was also consistently lower in groups randomized to membrane sweeping. No differences in adverse outcomes, including infection or bleeding, were noted in any study. Level of patient discomfort during the procedure was not assessed in any study.
The one study that did not show a difference in outcomes (Crane, Bennett, Young, et al., 1997) was different from the other trials in several ways. Membrane stripping was performed only once. Patients in the stripping group were more likely to be nulliparous and to have lower Bishop scores. Stratified analyses and logistic regression did not show significant effects, but it is possible that the smaller sample size in these subgroups limited power. In addition, a survival analysis showed a decrease in the median time from enrollment to delivery (6.5 days for stripping, compared with 8 days for controls), but this difference was not significant.
In the one study in which membrane sweeping was used as an adjunct to induction of labor, Boulvain, et al., randomized women to sweeping of the membranes (n = 99) or vaginal examination only (n = 99) prior to induction of labor for "nonurgent" indications (Boulvain, Fraser, Marcoux, et al., 1998). Eighty-five percent of the patient population was induced for prolonged pregnancy. Mean time from randomization to onset of labor was significantly shorter in the sweeping group (76 hours vs. 98 hours; p = 0.01), but no significant differences were seen in other outcomes except patient discomfort (odds ratio [stripping vs. control], 2.52; 95 percent confidence interval [CI], 1.60 to 3.99), bleeding, and painful contractions without labor.
In summary, in all but one study, sweeping the membranes consistently promoted labor at term and reduced the incidence of induction for prolonged pregnancy. As with the majority of the interventions reviewed in this report, there are no data on patient preferences for this intervention. One study found that women who undergo membrane stripping are more likely to experience discomfort, bleeding, and painful contractions without labor compared with controls. Another issue is that the majority of studies excluded women whose cervices would not allow introduction of the examiner's finger; thus, the conclusions described are applicable only to those pregnant women at term whose cervices are dilated enough to allow introduction of an examiner's finger.
Similar findings have been reported in a Cochrane review (Boulvain and Irion, 2001) and incorporated into the RCOG guidelines (Royal College of Obstetricians and Gynaecologists, 2001).
We identified two randomized trials of the use of mechanical devices such as Foley catheters, which are inserted into the cervix and then inflated. Atad, et al. (Atad, Hallak, Auslender, et al., 1996) compared 3 mg PGE2 gel (n = 30), oxytocin (n = 30), and a double-balloon catheter invented by one of the investigators (n = 35). Patients in the first two groups crossed over to the catheter arm if the Bishop score was < 4 at 12 hours, while patients in the catheter group received PGE2 if the Bishop score was < 4 at 12 hours. More patients in the catheter group had cervical dilation > 3 cm after 12 hours (86 percent vs. 23 percent in the oxytocin group and 50 percent in the PGE2 group; p < 0.01). Both PGE2 and the balloon device had higher rates of vaginal delivery (PGE2, 70 percent; catheter, 77 percent; oxytocin, 27 percent) and lower rates of cesarean section among patients with cervical dilation after the initial intervention (PGE2, 13 percent; catheter, 18 percent; oxytocin, 43 percent). Only 18 percent of the inductions in this study were for prolonged pregnancy.
Sciscione, et al., randomized 53 women to misoprostol and 58 to mechanical dilation with a 16 F Foley catheter with a 30 cc balloon (Sciscione, Nguyen, Manley, et al., 2001). There were no significant differences in change in Bishop score, vaginal delivery rates, or time to delivery in the two groups. Uterine tachysystole and passage of meconium were significantly more frequent in the misoprostol group. There was a trend towards higher cesarean section rates for nonreassuring fetal heart rate tracing in the misoprostol group (24 percent vs. 12 percent; p = 0.09), in a study where the sample size was determined based on change in Bishop score. Only 16 of 111 women in this study were induced for an indication of prolonged pregnancy.
In these two trials, mechanical devices appear to be comparable to prostaglandins in terms of delivery success, with lower rates of fetal heart rate tracing changes associated with frequent uterine contractions. As with membrane sweeping, applicability is limited to women whose cervix is dilated enough to allow introduction of a catheter. As with the majority of the other interventions reviewed, these studies also included relatively few women in the population of interest (prolonged pregnancy with no other risk factors) and were underpowered to detect differences in many important outcomes.
Mechanical devices alone are not addressed specifically in published Cochrane reviews or in the RCOG guideline (Royal College of Obstetricians and Gynaecologists, 2001).
We identified one randomized trial comparing two dosing regimens of oxytocin. Satin, Hankins, and Yeomans (1991) randomized women being induced for prolonged pregnancy to a "slow-dose" regimen (an initial dose of 2 mU/min, with increments of 1 mU/min at 30-minute intervals) or a "fast-dose" regimen (an initial dose of 2 mU minute with increases of 2 mU/min at 15-minute intervals). Induction failure was more likely in the slow-dose group (31 percent vs. 8 percent; p < 0.05). Time to delivery was shorter in the fast-dose group in both nulliparous women (9 hours vs. 15 hours; p < 0.05) and multiparous women (8 hours vs. 11 hours; p < 0.05). No significant differences were observed in other outcomes. There was a trend towards more hyperstimulation episodes requiring cessation of oxytocin in the fast-dose group, but the study was underpowered to detect a difference.
There is no formal comparison of oxytocin dosing regimens in published Cochrane reviews. The RCOG guideline development group reviewed dosing regimens in 11 trials of oxytocin with and without amniotomy. Their qualitative conclusions were: (1) lower dose regimens were not associated with an increase in operative delivery rates; (2) regimens with incremental rises in dose more frequently than every 30 minutes were associated with an increase in uterine hypercontractility; (3) lower dose regimens were not associated with an increase in specified delivery intervals; and 4) higher dose regimens were associated with an increase in the incidence of precipitous labor (Royal College of Obstetricians and Gynaecologists, 2001).
Five studies examined the effect of PGE2 gel versus placebo (Buttino and Garite, 1990; Doany and McCarty, 1997; Lien, Morgan, Garite, et al., 1998; O'Brien, Mercer, Cleary, et al., 1995; Sawai, Williams, O'Brien, et al., 1991). Doany and McCarty (1997) randomized patients to one of four arms:(1) no membrane stripping and placebo gel; (2) no membrane stripping and PGE2 gel; (3) membrane stripping and placebo gel; or (4) membrane stripping and PGE2 gel. Gel was placed in the posterior vaginal fornix. PGE2 gel without membrane stripping was not significantly different from placebo without stripping for any outcome. All patients in this study were 41 weeks or greater in gestational age.
Lien, et al., a randomized trial of intracervical PGE2 gel (n = 43) versus placebo (n = 47) begun after 40 weeks, found no significant differences between the two arms in the interval from admission to delivery, cesarean sections, or maximum oxytocin dosage (Lien, Morgan, Garite, et al., 1998). For patients who presented with a Bishop score between 3 and 6, those who were randomized to PGE2 gel were less likely to be induced than those treated with placebo gel.
Sawai, Williams, O'Brien, et al. (1991) randomized women at 41 weeks to either weekly PGE2 gel in the posterior fornix (n = 24) or weekly placebo gel. Induction occurred if the Bishop score was greater than 9, in the event of abnormal fetal heart rate testing, or at 44 weeks. There were no significant differences in neonatal outcomes, cesarean section rates, length of labor, or time from randomization to admission between the two groups, but the study was underpowered to identify differences in most categorical variables.
Buttino and Garite (1990) randomized women at 41-6/7 weeks to either intracervical PGE2 (n = 23) or placebo (n = 20). There were no significant differences in any outcome, including neonatal outcomes, cesarean section rate, or time to delivery. Cesarean section rates were lower in the PGE2 group (21.7 percent vs. 35.0 percent), but the study was underpowered to detect a difference. Gestational age at delivery and time from randomization to delivery were not significantly different in the two induction groups.
O'Brien, et al., randomized women at 38-39 weeks to intravaginal PGE2 gel (n = 50) or placebo (n = 50) daily for 5 days (O'Brien, Mercer, Cleary, et al., 1995). PGE2 gel resulted in significantly fewer pregnancies going beyond 40 weeks (40 percent vs. 66 percent; p < 0.016), although not in the proportion of pregnancies reaching 42 weeks (4 percent vs. 6 percent). Induction rates were lower in the PGE2 group (12 percent vs. 28 percent; p = 0.08).
A randomized trial conducted by the National Institute of Child Health and Human Development (NICHD) Network of Maternal-Fetal Medicine Units (1994) compared induction between 41 and 42 weeks and expectant management. The induction group in this trial was split into two arms: intracervical PGE2 gel plus oxytocin (n = 174) and placebo gel plus oxytocin (n = 174). No significant differences in neonatal or maternal outcomes, including cesarean section rates, were detected between the two groups. Sample size estimates for this trial were based on perinatal morbidity and mortality and maternal mortality.
Rayburn, et al., compared intracervical PGE2 gel (n = 55) to placebo (n = 63) prior to induction of labor with oxytocin at 42 weeks (Rayburn, Gosen, Ramadei, et al., 1988). Overall cesarean section rates (18 percent with PGE2 gel vs. 33 percent with placebo; p < 0.05) and mean time to delivery (5.5 hours vs. 9.5 hours with placebo; p < 0.01) were significantly lower with PGE2 gel.
Chatterjee, et al., compared 2 mg PGE2 gel to placebo (Chatterjee, Ramchandran, Ferlita, et al., 1991). Bishop scores were significantly improved in patients receiving the active gel; the study was underpowered to detect any other differences.
Voss, Cumminsky, Cook et al. (1996) compared the use of intracervical PGE2 gel in three different dosing regimens: 0.125 mg (n = 79), 0.25 mg (n = 70), and 0.5 mg (n = 80). For each of the outcomes described (fetal heart rate abnormality, cesarean sections, mean change in Bishop score, hyperstimulation, and time to active phase labor/complete dilation/delivery), there was no significant difference noted for the various doses of PGE2 gel. Only 31 percent of subjects in this study were induced for prolonged pregnancy.
MacKenzie and Burns (1997) compared a single vaginal dose of 2 mg PGE2 gel, with amniotomy and oxytocin if no labor occurred within 14-20 hours of treatment, with 2 mg of PGE2, followed by a second application in 6 hours if no labor occurred or if the Bishop score was less than 9. Sixty-eight percent of the patients in this trial were induced for prolonged pregnancy. The only significant difference noted was a shorter time to delivery in the two-dose group among multiparous women (mean 785 minutes vs. 927 minutes in the single-dose group).
Graves, et al., compared PGE2 gel in doses of 1 mg, 2 mg, and 3 mg to placebo prior to induction with oxytocin (Graves, Baskett, Gray, et al., 1985). Eighteen percent of the inductions were for prolonged pregnancy. There was a significant increase in Bishop score after the active gel compared with placebo, but this effect was not dose-related. There was a dose-related increase in the proportion of women entering spontaneous labor after insertion of the gel. There was a trend toward more uterine hypercontractility with higher doses of the gel, although the study was underpowered to detect a significant difference. Other outcomes were not significantly different between the active and placebo groups, although the study lacked power to detect many differences.
One study compared 3 mg PGE2 tablets to 2 mg PGE2 gel (Mahmood, 1989). The gel formulation required fewer applications and resulted in greater changes in Bishop score and shorter time to onset of labor than did tablets.
Two studies were identified that compared the administration of PGE2 gel to induction by oxytocin infusion. In the first study (Papageorgiou, Tsionou, Minaretzis, et al., 1992), cesarean section for cephalopelvic disproportion and fetal distress, vacuum suction, and hyperstimulation were not statistically different in women randomized to intracervical PGE2 (n = 83) or oxytocin (n = 82) for induction of labor after 41 weeks. Two outcomes did show benefit to the use of PGE2 gel. First, babies were less likely to have an Apgar score < 7 at 5 minutes when the cervices of the mother were ripened by PGE2 gel as opposed to those induced with oxytocin. Also, patients were more likely to be delivered vaginally if ripened by PGE2 gel (89 percent vs. 71 percent). All subjects in this study had a gestational age of at least 41 weeks.
The second study (Misra and Vavre, 1994) compared administration of intracervical PGE2 gel (n = 80) with oxytocin (n = 72). Rates of cesarean deliveries were decreased with PGE2 in primigravidas only (26.3 percent with PGE2 vs. 47.2 percent with oxytocin; p < 0.01). Women in this study were induced for a variety of indications, with a mean gestational age less than 40 weeks.
One study examined the effect of placement of PGE2 gel in the posterior vaginal fornix versus in the endocervical canal (Kemp, Winkler, and Rath, 2000). The outcomes that showed significance indicated that patients who received gel administered in the posterior vaginal fornix were more likely to deliver earlier (15.7 hours vs. 19.1 hours) and more likely to deliver in 24 hours (81.6 percent vs. 67.8 percent). In this study, 32.9 percent of the posterior fornix group were induced for prolonged pregnancy (more than 10 days past the estimated date of confinement), and 29.2 percent of the intracervical group were 10 days beyond term.
Two studies compared outcomes between PGE2 gel administration and membrane stripping. In Magann, et al., three groups were randomly assigned to treatment at 41 weeks (Magann, Chauhan, Nevils, et al., 1998). One group received daily intracervical administration of PGE2 gel, another received daily membrane stripping, and the third group received a daily "gentle cervical examination." Patients in all three groups were induced if the Bishop score became > 8, or at 42 weeks. Inductions at 42 weeks were significantly lower in the two active treatment groups (17 percent in the sweeping group and 20 percent in the PGE2 group, compared with 60 percent in the controls). Cesarean section rates were higher in the PGE2 group (8/35, or 23 percent, vs. 5/35, or 14 percent, in the other two groups), a relative risk of 1.6 (95 percent CI, 0.58 to 4.41).
In Doany and McCarty (1997), the effects of membrane stripping, PGE2 gel (placed in the posterior vaginal fornix), and a combination of the two therapies were evaluated. Patients were randomized at 41 weeks to one of 4 groups: (1) membrane stripping and placebo gel; (2) membrane stripping and PGE2 gel; (3) "control" cervical exams and placebo gel; or (4) "control" exams and PGE2 gel. Gestational age at delivery was significantly lower in the group with both active treatments (median, 290 days vs. 294 days in the two groups with one placebo and 297 days in the group with two placebos; p = 0.005). There was a trend towards a higher cesarean rate in the group with both active treatments (11 percent versus 8 percent in the two single-agent arms and 4 percent in the double-placebo group; p = 0.08).
These two studies suggest that PGE2 is equivalent to membrane stripping in terms of promoting labor. In both studies, PGE2 was associated with higher cesarean section rates, although these differences were not statistically significant. Larger studies would be needed to detect a difference in cesarean rates.
Only one study was identified that examined the efficacy of the Cervidil® vaginal insert (Wing, Ortiz-Omphroy, and Paul, 1997). This trial compared the Cervidil® insert (10 mg in a timed-release preparation) to 25 μg of misoprostol administered every 4 hours to a maximum of six doses. There were no significant differences between the two groups in neonatal or maternal outcomes. While the mean time to delivery was the same between the two groups, the misoprostol dosing every 4 hours showed a lower rate of tachysystole than the Cervidil® insert.
One study evaluated the use of 2 mg intravaginal PGE2 suppositories (n = 38) versus placebo suppositories (n = 42) self-administered by the patient on an outpatient basis beginning at 41 weeks (Sawai, O'Brien, Mastrogiannis, et al., 1994). The patients in the PGE2 arm used fewer suppositories and were admitted for delivery at earlier gestational ages. This resulted in lower antepartum testing charges (mean $477 vs. $647 with placebo; p = 0.001). There was a trend towards lower cesarean section rates in the PGE2 group (2.6 percent vs. 14.3 percent in the placebo group), although this difference was not significant.
In summary, vaginal or intracervical PGE2 was consistently more effective in achieving cervical ripening or delivery within a specified time period compared with placebo or oxytocin. Cesarean section rates were lower or similar in women treated with PGE2. There were no differences in perinatal or maternal morbidity or mortality.
Similar findings were reported in the review conducted for the RCOG guideline group. Based on their "conflated" analysis of trials comparing PGE2 with oxytocin with or without amniotomy, the guidelines recommended PGE2 as the treatment of choice for induction in women with intact membranes (Royal College of Obstetricians and Gynaecologists, 2001).
Only one study was identified that compared misoprostol with placebo prior to scheduled induction (Fletcher, Mitchell, Simeon, et al., 1993). A dose of 100 μg misoprostol (n = 32) was found to be more effective than placebo (n = 31). Time from induction to delivery was lower with misoprostol (22 hours vs. 32 hours), as was cesarean section rate (3 percent vs. 10 percent), although these differences were not statistically significant. The mean Bishop score was increased for patients treated with misoprostol. Only one-third of the randomized patients were induced for prolonged pregnancy.
| Study | Misoprostol Dose | PGE2 Dose | N | % Postterm | Statistically Significant Differences |
|---|---|---|---|---|---|
| Buser, Mora, and Arias, 1997 | 50 μg, posterior fornix | 0.5 mg, intracervical | 155 | 35% | Nonreassuring FHR tracing: 14% misoprostol vs. 0% PGE2 Time to delivery: 15.8 hours misoprostol vs. 24.2 hours PGE2 C-section for nonreassuring FHR: 25% misoprostol vs. 5% PGE2 |
| Chuck and Huffaker, 1995 | 50 μg, posterior fornix | 0.5 mg, intracervical | 99 | 18% | Vaginal delivery with 24 hours: 100% misoprostol vs. 68% PGE2 Time to vaginal delivery: 11.4 hours misoprostol vs. 18.9 hours PGE2 |
| Fletcher, Mitchell, Frederick, et al., 1994 | 100 μg, posterior fornix | 3 mg, posterior fornix | 63 | 33% | None |
| Gottschall, Borgida, Mihalek, et al., 1997 | 100 μg, posterior fornix | 5 mg, posterior fornix | 75 | 40% | Time to vaginal delivery: 31.3 hours misoprostol vs. 58.5 hours PGE2 |
| Herabutya, Prasertsawat, and Pokpirom, 1992 | 100 μg, posterior fornix | 1.5 mg, intracervical | 110 | 34% | None |
| Howarth, Funk, Steytler, et al., 1996 | 100 μg, posterior fornix | 1 mg, posterior fornix | 72 | 42% misoprostol, 25% PGE2 | C-sections: 17% misoprostol vs. 42% PGE2 Delivery within 12 hours: 83% misoprostol vs. 36% PGE2 Tachysystole: 39% misoprostol vs. 8% PGE2 |
| Kadanali, Küçüközkan, Zor, et al., 1996 | 100 μg, posterior fornix | Dose not specified | 224 | 41% | Time to delivery: 9.2 hours misoprostol vs. 15.2 PGE2 Cesarean section for failed induction: 6.3% misoprostol vs. 13.4% PGE2 |
| Mundle and Young, 1996 | 50 μg, upper vagina | 0.5 mg, intracervical or 1-2 mg, intravaginal | 222 | 78% | Time from induction to delivery: 753 minutes misoprostol vs. 941 minutes PGE2 |
| Varaklis, Gumina, and Stubblefield, 1995 | 25 μg, intravaginal | 0.5 mg, intracervical | 69 | Not stated | Time to vaginal delivery: 15.7 hours misoprostol vs. 20.7 hours PGE2 |
| Wing, Jones, Rahall, et al., 1995 | 50 μg, posterior fornix | 0.5 mg, intracervical | 135 | 10% | Time to vaginal delivery: 903 minutes misoprostol vs. 1411 minutes PGE2 Vaginal delivery within 24 hours: 70.6% misoprostol vs. 47.8% PGE2 Tachysystole: 36.7% misoprostol vs. 11.9% PGE2 Need for neonatal resuscitation: 22.1% misoprostol vs. 7.5% PGE2 Meconium aspiration: 4.4% misoprostol vs. 1.5% PGE2 |
| Wing, Rahall, Jones, et al., 1995 | 25 μg, posterior fornix | 0.5 mg, intracervical | 275 | 16% | Time to vaginal delivery: 1323 minutes misoprostol vs. 1532 minutes PGE2 Vaginal delivery with 24 hours: 65.5% misoprostol vs. 41.4% PGE2 |
The studies examined a range of doses and frequency of dosing with similar results. The time from induction to delivery was consistently shorter in patients treated with misoprostol, both for all patients and for those with vaginal delivery. With one exception, misoprostol was shown to cause higher frequency of uterine hyperstimulation, hypertonus, or tachysystole, although studies were often underpowered to detect significant differences in these outcomes. All studies indicated that misoprostol was an effective agent for cervical ripening and induction, often more effective than PGE2 gel, and showed no significant difference in the rates of cesarean section. One study (Buser, Mora, and Arias, 1997) showed an increase in cesarean section rates for patients treated with misoprostol; this was attributable to significantly higher rates of nonreassuring fetal heart rate patterns. Of note, the majority of subjects in these studies were not women being induced for prolonged pregnancy.
Two studies evaluated various dosing regimens for misoprostol. In Farah, et al., intravaginal administration of doses of 25 μg versus 50 μg every 3 hours was evaluated (Farah, Sanchez-Ramos, Rosa, et al., 1997). In this study, the incidences of hyperstimulation, tachysystole, and cord pH < 7.16 were greater in patients on the 50-μg regimen. In comparison, patients given 50 μg every 3 hours were more likely to have shorter start-to-delivery times and more vaginal deliveries.
In Wing and Paul (1996), the dosing regimen was 25 μg given either every 3 or 6 hours. Patients randomized to the 6-hour regimen had longer times to delivery, more frequently required oxytocin augmentation, and had more failed inductions than those on the 3-hour regimen.
Three studies compared the effect of intravenous oxytocin with intravaginal misoprostol (Escudero and Contreras, 1997; Kramer, Gilson, Morrison, et al., 1997; Sanchez-Ramos, Kaunitz, Del Valle, et al., 1993). Although the studies used varying dosages of misoprostol, the conclusions were similar. Patients treated with misoprostol had shorter induction-to-delivery times, more vaginal deliveries, and fewer cesarean deliveries for dystocia. Most studies also indicated that higher rates of uterine tachysystole were associated with misoprostol, and studies with higher doses of misoprostol had higher rates of tachysystole. Kramer, et al., found that patients treated with misoprostol also were less likely to use epidural anesthesia, and the costs associated with misoprostol induction were less than for patients induced by oxytocin (Kramer, Gilson, Morrison, et al., 1997). In this study, the costs associated with misoprostol treatment often excluded the cost of epidural anesthesia, longer length of stay (associated with induction), and fewer cesarean deliveries.
Two studies examined the effect of various methods of delivery for the dosing of misoprostol. Srisomboon, et al., evaluated the effect of 100 μg of misoprostol given intracervically versus intravaginally (after dissolution of the misoprostol pill into an inert gel) (Srisomboon, Piyamongkol, and Aiewsakul, 1997). There were no significant differences found between the two methods of administration in terms of change in Bishop score, interval from administration to delivery, route of delivery, or perinatal outcome. Rates of uterine tachysystole were similar in the two groups. This study noted that spillage of gel out of the cervix was observed in 70 percent of patients receiving intracervical misoprostol. The investigators concluded that the rates of efficacy between the two methods were similar, and that intravaginal administration was more convenient. Thirty-four percent of the inductions in this study were for prolonged gestation.
Toppozada, Anwar, Hassan, et al. (1997) evaluated the effects of oral versus vaginal misoprostol. Forty patients were randomized to 100 μg every 3 hours administered via the oral or vaginal route. Patients were more likely to be induced successfully via the vaginal route in a shorter interval at a lower dose but were also more likely to experience abnormal fetal heart rate patterns and higher rates of uterine hyperstimulation. The proportion of subjects induced for prolonged pregnancy was not reported in this study.
Four studies were identified that evaluated the effects of intravaginal PGE2 tablets to intravaginal misoprostol tablets (Chang and Chang, 1997; Fletcher, Mitchell, Frederick, et al., 1994; Lee, 1997; Surbek, Boesiger, Hoesli, et al., 1997). While the dosing regimens for the studies differed, the conclusions were similar. Patients treated with misoprostol were found to have shorter intervals between insertion and delivery, had higher mean Bishop scores 12 hours after administration, and were more likely to deliver in 24 hours. Three of the four studies concluded that misoprostol was a more effective and efficient drug for induction than PGE2. No significant differences in perinatal outcomes were noted.
One study compared the effects of the Cervidil® vaginal insert with misoprostol (Wing, Ortiz-Omphroy, and Paul, 1997). Patients randomized to treatment with Cervidil® had higher rates of tachysystole and abnormal fetal heart rate patterns. There were no significant differences in perinatal outcomes. Patients treated with misoprostol had shorter intervals from start to delivery than those treated with Cervidil®, but this difference was not significant. This study concluded that misoprostol was as effective as Cervidil®, but that the incidence of uterine tachysystole was significantly lower with misoprostol.
In summary, the majority of the randomized trials of misoprostol showed that misoprostol was more effective in achieving vaginal delivery within 24 hours than were other induction agents. However, misoprostol was also more likely to result in uterine hypercontractility, a not unsurprising correlate of efficacy. All the studies reviewed were underpowered to detect clinically relevant differences in many important outcomes, particularly those having to do with safety. Similar conclusions have been reached by recent Cochrane reviews on misoprostol (Alfirevic, Howarth, and Gaussmann, 2000; Hofmeyr and Gulmezoglu, 2001).
We identified five studies that compared the efficacy of the progesterone receptor antagonist mifepristone (RU-486) to placebo. Unlike many of the studies discussed above, three of the five focused on patients primarily induced for prolonged pregnancy. All five studies indicated that mifepristone was effective in ripening the cervix. Wing, et al., using 200 mg mifepristone, found significantly more deliveries and vaginal deliveries within 48 hours and a shorter time to delivery with mifepristone compared with placebo; subgroup analysis showed that these effects were primarily due to the effect in nulliparas (Wing, Fassett, and Mishell, 2000). There were trends towards more abnormal fetal heart rate tracings in labor and more infants with Apgar scores less than 7 at 1 and 5 minutes in the mifepristone group, but these trends did not reach statistical significance.
Three studies evaluated patients who were treated with 400 mg mifepristone versus placebo. In Stenlund, Ekman, Aedo, et al. (1999), the time to onset of labor was shorter and the proportion of patients in labor within 48 hours was significantly greater (81.8 percent vs. 27.3 percent) in the mifepristone group. Median Apgar scores at 1 minute were lower in the mifepristone group, but there were no differences in Apgar scores at 5 or 10 minutes. With only 36 subjects, this study was underpowered to detect differences in many outcomes.
In Giacalone, et al., time to onset of labor and time to vaginal delivery were significantly shorter in the mifepristone group (Giacalone, Targosz, Laffargue, et al., 1998). There were trends towards lower Apgar scores at 1 minute and lower cord pH values, but these were nonsignificant; again, the study was severely underpowered to detect differences in many important clinical outcomes, including cesarean section rate.
In Frydman, et al., the proportion of women going into spontaneous labor, the proportion with Bishop scores less than 4 at presentation for induction, and the mean randomization-to-delivery time were all significantly less in the mifepristone group (Frydman, Lelaidier, Baton-Saint-Mleux, et al., 1992). There were no significant differences in other outcomes and no other trends. Again, the study was underpowered to detect differences in safety-related outcomes. Forty-eight percent of the patients were induced for "postdate" pregnancy.
Elliott, et al., performed a dose-response study comparing placebo with 50 mg and 200 mg of mifepristone in nulliparous women, the "majority" of whom were being induced for prolonged pregnancy (Elliott, Brennand, and Calder, 1998). When a combined outcome measure of either spontaneous labor within 4 days or Bishop score of > 6 at induction was used as the measure of efficacy, there were significant improvements with mifepristone in a dose-related manner. However, mifepristone was also associated in a dose-related manner with significantly more cases of fetal distress in labor and neonatal jaundice. In addition, cesarean rates were significantly lower with 50 mg of mifepristone than with placebo but higher with 200 mg than with placebo (p = 0.07), a difference that appears to be attributable to a higher incidence of cesarean delivery for fetal distress in the 200-mg group.
In summary, mifepristone appears to be superior to placebo in terms of achieving labor or cervical ripening within a specified time, but there are consistent trends towards fetal compromise during labor in women who receive mifepristone. Inadequate power to detect potentially important differences in safety argue against the use of mifepristone for induction of labor in prolonged pregnancy outside of research protocols at the present time.
A Cochrane review on this topic found similar evidence of efficacy (Neilson, 2001). Neonatal outcomes were not reported in enough studies to allow conclusions about safety.
In reviewing the literature on induction agents, numerous methodological problems consistently reduced our ability to draw conclusions about the benefits and risks of these agents in managing women with prolonged pregnancy. Some of these problems concerned study design; others related to statistical issues.
The following observations may be made about study design:
Patient population: The majority of the studies evaluating the efficacy of different interventions for induction of labor included subjects with a range of indications for induction and did not report results separately for those women induced because of prolonged pregnancy. This has several implications. First, it is possible that the responsiveness of the uterus and cervix (even with comparable Bishop scores) to a given agent might be quite different between a woman at 37 weeks with preeclampsia and a woman at 42 weeks with no medical complications, leading to different estimates of efficacy. Second, risks for fetal compromise might also be quite different between a woman at 37 weeks with preeclampsia compared with a woman at 41 weeks with no medical complications compared with a woman at 42 weeks with oligohydramnios. The two groups of interest in this report are women induced solely because of prolonged gestation and women induced because of abnormal antepartum surveillance in prolonged gestation. The majority of the literature does not allow us to draw conclusions about the risks and benefits of particular induction agents in these two groups. Several studies also noted differences in outcomes between nulliparous and parous women; the majority failed to stratify results by parity.
Choice of primary outcomes: Of those studies that stated an a priori sample size estimation, most based it on time-related outcomes, such as time to delivery, time to vaginal delivery, or proportion of subjects delivering within 24 or 48 hours. Although these certainly are important outcomes, sample size estimates based on these types of outcomes will inevitably lead to studies that are underpowered to detect clinically relevant differences in other important outcomes, such as perinatal morbidity or cesarean section rates. This was found throughout the misoprostol literature, where there were consistent trends towards higher rates of uterine tachysystole, hyperstimulation, and nonreassuring fetal heart rate tracings, but most studies were underpowered to detect the differences. Studies that based their sample size estimates on changes in the Bishop score failed to account for the inherent intra- and interobserver variability of this measurement; accounting for this would have led to larger sample sizes.
Variability in clinical management: As with most of the studies reviewed for this report, variability in clinical management of labor may have resulted in differences in many outcomes, especially cesarean section rates, which make comparisons across studies difficult.
Patient preferences: Consistently, time to delivery was chosen as an important outcome variable. Not surprisingly, more rapid times to delivery were associated with intermediate markers of fetal compromise or potential fetal compromise. Time to delivery is an important resource use issue. However, given the potential tradeoffs, collection of patient-oriented outcomes (preferences for the tradeoff of time in labor vs. risk of fetal compromise, for example) would be a valuable adjunct to these studies.
Cost data: Few studies reported cost data. Those that did frequently failed to account for all medical costs and focused only on pharmacy-related costs. This lack of data prevents estimation of cost-effectiveness.
The following observations are made about statistical issues:
Sample size: As stated above, the choice of primary outcome variable often inhibited the ability of trials to detect potentially clinically relevant differences in important outcomes. This is particularly true for rare but clinically important outcomes such as uterine rupture. There are case reports of uterine rupture occurring in women without previous uterine surgery after induction with misoprostol (Bennett, 1997; Blanchette, Nayak, and Erasmus, 1999); whether the risk of this event is higher in women induced with misoprostol compared with other medications is unclear, since denominator data are not available. However, the lack of statistical power to detect categorical events in the majority of randomized trials of induction agents is a major limitation to interpretation of this literature.
Choice of statistical tests: Inappropriate statistical tests (e.g., means for integer variables such as parity, Apgar or Bishop score, or for nonnormally distributed variables, such as length of stay or time in labor) were frequently used. Use of these summary measures could potentially lead to false conclusions about the comparability of groups at either baseline or after intervention.
Based on the above review, we conclude the following:
The majority of randomized trials of induction agents where a priori sample size estimates were performed are powered based on detecting a difference in outcomes such as time to delivery. This results in a lack of power to detect clinically meaningful differences in categorical outcomes that are less common. This lack of power precludes drawing definite conclusions about the relative safety of different agents.
Castor oil given at term appears to be effective in promoting labor, with a consistent side effect of maternal nausea; whether other outcomes of interest are affected is unclear.
Manual nipple stimulation at term may promote labor; effectiveness may be dependent on the protocol used and patient ability to adhere to the protocol. Currently available data are insufficient to draw conclusions.
Data on the effectiveness of electrical breast stimulation as a method for inducing labor in prolonged gestation are inconclusive because of small sample size and a low proportion of subjects induced for an indication of prolonged pregnancy.
Data on the safety and effectiveness of relaxin are limited and no conclusions can be drawn.
Sweeping of the membranes at or near term is effective in promoting labor and reducing the incidence of induction for prolonged gestation.
In general, there is a tradeoff between the effectiveness of induction agents when effectiveness is defined in terms of achieving delivery and shortening the time to delivery on the one hand, and risks of uterine tachysystole, hyperstimulation, and potential fetal compromise on the other. In increasing order of effectiveness, slow-dose oxytocin is followed by fast-dose oxytocin; PGE2 appears more effective than oxytocin, and misoprostol is more effective than PGE2. The heterogeneity of the patient populations in the published literature prohibit definitive conclusions about the benefits and risks of these agents in the setting of induction of labor in prolonged pregnancy, either for women induced electively or for women with abnormal fetal surveillance.
Mifepristone (RU-486) is consistently effective in reducing the time to labor and the time to delivery in women after 41 weeks. However, all three published trials reported nonsignificant trends towards higher rates of intermediate markers of fetal compromise, including abnormal fetal heart rate tracings and low Apgar scores.
Data on costs are insufficient to allow conclusions about cost-effectiveness.
Question 4: Are the epidemiology and outcomes of prolonged pregnancy different for women in different ethnic groups, different socioeconomic groups, or in adolescent women?
We approached this question in two ways. First, in all the articles we reviewed, we searched for data on differences in either the epidemiology or outcomes of prolonged pregnancy in different ethnic groups, different socioeconomic groups, and different age groups. Second, we reviewed published data from birth certificates (Ventura, Martin, Curtin, et al., 2000) and from the 1997 Nationwide Inpatient Sample (NIS) (Nationwide Inpatient Sample [NIS], 1997). The NIS is part of the Agency for Healthcare Research and Quality's Healthcare Cost and Utilization Project (HCUP). HCUP collects discharge data from a stratified sample of approximately 20 percent of U.S. hospitals. Using ICD-9 codes, we divided all deliveries into "preterm" (644.2x), prolonged (645.x), and term (all other delivery codes). We examined differences in outcomes between coded ethnic groups (white, black, Hispanic, Asian/Pacific Islander, Native American, and "other") and by insurance status (Medicare, Medicaid, private/health maintenance organization [HMO], self-pay/no insurance, "no charge," and "other") within these categories.
We did not identify any articles that specifically addressed differences in the epidemiology or outcomes of prolonged pregnancy in different ethnic groups.
| Race | Total births | % ≥ 40 weeks | % ≥ 41 weeks | % ≥ 42 weeks | % ≥ 42 weeks and < 2,500 g | % ≥ 42 weeks and > 4,000 g |
|---|---|---|---|---|---|---|
| Total | 3,941,553 | 39.6% | 18.7% | 7.4% | 2.2% | 14.6% |
| White, non-Hispanic | 2,361,462 | 41.8% | 19.4% | 7.5% | 1.8% | 16.8% |
| Black, non-Hispanic | 593,127 | 35.3% | 16.6% | 7.2% | 4.0% | 8.0% |
| Hispanic | 734,661 | 40.2% | 18.7% | 7.7% | 2.0% | 12.8% |
Taking into account the limitations of birth certificate data, there are some interesting findings:
Live births between 40 and 42 weeks were less common for non-Hispanic black women than for non-Hispanic white women, which may be partly due to an increased risk of preterm birth among non-Hispanic blacks (17.5 percent vs. 10.2 percent in non-Hispanic whites). However, the proportion of births after 42 weeks is strikingly similar in all groups.
The weight distribution among infants born after 42 weeks is also strikingly different between groups, with non-Hispanic black women having a two-fold increase in low birthweight infants and a substantially lower incidence of macrosomic infants.
| Race | None | IUFD | IUGR | Fetal distress | Oligo-hydramnios | Macrosomia | Shoulder dystocia | Perineal trauma | Labor abnormalities | Failed induction |
|---|---|---|---|---|---|---|---|---|---|---|
| White | 36.38% | 0.02% | 0.38% | 8.55% | 2.30% | 3.80% | 1.86% | 28.26% | 14.57% | 3.87% |
| Black | 35.92% | 0.00% | 0.71% | 14.34% | 4.20% | 2.73% | 2.02% | 21.29% | 14.91% | 3.87% |
| Hispanic | 37.63% | 0.02% | 0.36% | 11.39% | 3.83% | 3.77% | 1.25% | 23.28% | 13.67% | 4.79% |
| Asian/Pacific Islander | 30.34% | 0.00% | 0.64% | 7.50% | 4.27% | 8.91% | 0.81% | 24.64% | 19.65% | 3.23% |
| Native American | 40.62% | 0.00% | 0.00% | 10.92% | 5.60% | 2.80% | 0.00% | 25.49% | 11.48% | 3.08% |
| Other | 33.53% | 0.00% | 0.48% | 10.15% | 3.10% | 2.61% | 1.90% | 29.52% | 14.24% | 4.47% |
IUFD = Intrauterine fetal demise
IUGR = Intrauterine growth restriction
Both the NIS data and birth certificate data suggest that black women are more likely to have low birthweight infants after 42 weeks than white or Hispanic women. Diagnoses such as oligohydramnios and fetal growth restriction are also more common in black women. All three of these diagnoses are consistent with declining uteroplacental function. There were a limited number of fetal deaths in the NIS data set, with racial data missing from over half.
We did not identify any articles that specifically addressed differences in the epidemiology or outcomes of prolonged pregnancy in different socioeconomic groups.
| Payer | None | IUFD | IUGR | Fetal distress | Oligo-hydramnios | Macrosomia | Shoulder dystocia | Perineal trauma | Labor abnormalities | Failed induction |
|---|---|---|---|---|---|---|---|---|---|---|
| Medicare | 37.14% | 0.00% | 0.00% | 13.69% | 0.83% | 5.60% | 0.00% | 22.61% | 17.22% | 2.90% |
| Medicaid | 37.53% | 0.06% | 0.58% | 10.50% | 3.46% | 2.79% | 1.66% | 25.75% | 13.77% | 3.90% |
| Private/HMO | 35.05% | 0.01% | 0.34% | 9.47% | 2.43% | 4.24% | 1.80% | 27.82% | 15.13% | 3.72% |
| Self-pay/No insurance | 40.22% | 0.00% | 0.61% | 10.16% | 4.18% | 2.47% | 1.90% | 26.71% | 11.02% | 2.73% |
| "No charge" | 40.72% | 0.00% | 0.00% | 0.52% | 1.55% | 6.19% | 0.00% | 33.51% | 7.73% | 9.79% |
| Other | 37.11% | 0.00% | 0.73% | 10.64% | 1.86% | 3.75% | 2.09% | 27.66% | 12.73% | 3.43% |
IUFD = Intrauterine fetal demise
IUGR = Intrauterine growth restriction
We did not identify any articles that specifically addressed differences in the epidemiology or outcomes of prolonged pregnancy in either adolescent women or women in their later reproductive years.
The accuracy of the dating recorded on birth certificates is unconfirmable, at best. Therefore, it is unclear whether the observed trends in racial differences in the distribution of birthweight after 42 weeks, and the observed lack of difference in the proportion of all pregnancies that reach 42 weeks, are real or simply random error introduced by variable quality of dating.
Similarly, criteria for a diagnosis of prolonged pregnancy, as well as for many of the other diagnosis codes, may vary between hospitals. Data for racial and payer codes were missing for many of the coded complication diagnoses. If codes are not recorded systematically in some hospitals, this may result in misleading patterns.
Because of concerns with data quality, we did not perform formal tests of significance or multivariate analyses. Given the consistent patterns for some observations seen in the two data sets, more detailed analysis of more complete data sets is warranted.
The current published literature on the epidemiology and management of prolonged pregnancy does not provide information on the potential effects of race and ethnicity, socioeconomic status, or age on the incidence and outcomes of prolonged pregnancy. Given that many of the strategies designed to minimize the risk of fetal compromise (such as frequent antepartum testing) may have different practical effects in populations with different levels of access to transportation, child care, and appropriate monitoring facilities, this lack of information is disappointing.
Review of national data from birth certificates and hospital discharges suggests that there may be differences in the clinical characteristics of prolonged pregnancy among women in different ethnic and socioeconomic groups. In spite of the multiple limitations of the data, it is striking that two different data sources both show that black women with prolonged pregnancy are more likely to have low birthweight infants than white or Hispanic women. Black women are consistently more likely to have low birthweight infants at other gestational ages as well. Black women also are more likely to have diagnoses of intrauterine growth restriction and oligohydramnios. Women with Medicaid or no insurance are also more likely to have growth restriction and oligohydramnios. We did not explore the degree to which the effects of race might be confounded by economic status, or vice versa, primarily because of problems caused by missing data. Other potential confounders include differences in the use of ultrasound for dating and differences in the use of antepartum testing for prolonged pregnancy. These findings should be investigated further using higher quality data and appropriate epidemiological and statistical methodologies.
In this section we summarize the main findings of the report and discuss the implications of the findings, the limitations of the current literature, the limitations of the report, and suggested strategies for using the report to develop quality improvement tools.
The major findings and conclusions for each of the four key research questions are as follows:
1. What are the test characteristics (reliability, sensitivity, specificity, predictive values) and costs of measures used in the management of prolonged pregnancy to (a) assess risks to the fetus and mother of prolonged pregnancy, and (b) assess the likelihood of a successful induction of labor?
Consistently, tests for the assessment of risks to the fetus have lower sensitivity than specificity but higher negative predictive values than positive predictive values. This implies that the low risk of adverse outcomes is the main "driver" of high negative predictive values, and if sensitivity and specificity do not change appreciably with gestational age, that negative predictive value -- the likelihood that a fetus with a normal test will have a normal outcome -- decreases with advancing gestational age. Thus, false negative results will increase with advancing gestational age.
The most sensitive tests to assess the risks to the fetus of prolonged pregnancy appear to be combinations of fetal heart rate monitoring and ultrasonographic measurement of amniotic fluid volume. Direct comparison of test results across studies is difficult because of differences in patient populations and reference standards used. Published data on costs were not available.
Both ultrasound and clinical examination can be reasonably sensitive at identifying macrosomic fetuses when macrosomia is defined as greater than 4,000 grams. However, prediction of birthweights greater than 4,500 grams, the clinically more relevant threshold, is less accurate, with sensitivity ranges from 14-99 percent. There is no evidence that early detection of macrosomic infants in prolonged pregnancy improves maternal or neonatal outcomes, and modeling studies suggest that the use of ultrasound to screen for macrosomia is not cost effective.
The components of the cervical examination used to determine the Bishop score have significant inter- and intraobserver variability. The uncertainty created by this variability affects the ability of the examination to discriminate between patients likely to have a successful induction and those likely to fail.
2. What is the direct evidence comparing the benefits, risks, and costs of planned induction versus expectant management at various gestational ages?
Although individual randomized trials do not show significant differences in perinatal mortality between women electively induced at specific gestational ages and women followed with antepartum testing, pooled data show a significant reduction in perinatal mortality in women electively induced after 41 weeks compared with women managed with antepartum testing. At least 500 inductions are needed to prevent one perinatal death. Cesarean section rates do not appear to differ between electively induced and expectantly managed women, either overall or in specific subgroups. In some groups, elective induction actually decreases the overall risk of cesarean section. Other maternal and perinatal outcomes do not appear to differ between groups.
Data on patient preferences for management options are lacking. Analysis of costs in the largest trial suggested that costs were reduced with elective induction; more detailed analysis based on currently used interventions and current obstetric management is needed.
3. What are the benefits, risks, and costs of currently available interventions for the induction of labor?
The majority of studies of interventions for induction of labor involved women induced for a variety of indications at a wide range of gestational ages. Whether summary results from these groups are applicable to women with prolonged pregnancy is unclear.
Sweeping or "stripping" of the membranes at 38-40 weeks consistently promotes spontaneous labor and reduces the number of women requiring induction at 41 or 42 weeks.
Many studies of agents for induction are powered based on detecting differences in time to induction or differences in the proportion of women delivered within a predetermined period of time. Most do not have sufficient power to detect differences in categorical outcomes, such as cesarean section rates and adverse maternal or perinatal outcomes.
There is a consistent pattern of tradeoffs between efficacy of interventions for induction, especially as measured by time to induction or delivery within a predetermined period of time, and uterine hyperactivity, with possible increased risks of surrogate markers of fetal compromise, such as nonreassuring fetal heart rate tracings. Misoprostol appears most consistently to result in vaginal delivery within a predefined time period; however, it also appears most likely to result in very frequent uterine contractions, which may lead to fetal heart rate abnormalities.
Data are lacking on both medical and nonmedical costs of different intervention strategies.
4. Are the epidemiology and outcomes of prolonged pregnancy different for women in different ethnic groups, different socioeconomic groups, or in adolescent women?
We identified no published literature that showed differences among important ethnic, socioeconomic, or other subgroups.
Review of administrative data suggests that the proportion of all pregnancies extending beyond 42 weeks is similar among all racial and ethnic groups. Black women are more likely to have low birthweight infants after 42 weeks than other groups, a finding similar to observations at other gestational ages. Confirmation of these observations with more detailed data sets is needed.
Currently available literature on interventions in prolonged gestation does not address issues such as access to care or practical difficulties (for example, transportation or arranging child care) which might affect effectiveness (as opposed to efficacy) in different populations.
The primary research implication of our review of the literature is that much remains to be learned about the optimal management of pregnancy in women who go beyond 40 weeks gestation with otherwise normal pregnancies. It is clear that the risks of adverse outcomes increases with advancing gestational age, but the point at which this risk justifies more intensive interventions is unclear. Currently available antepartum testing strategies have good negative predictive value but poor positive predictive value. This appears to be largely due to the overall low absolute risk of adverse outcomes, since test specificity is generally better than sensitivity. The optimal test or combination of tests and the optimal timing of test initiation among women in the United States that would minimize the risk of complications associated with prolonged gestation and complications of interventions at an acceptable cost are unclear. Several interventions are available for the effective induction of labor; however, the populations studied in the published literature are heterogeneous in terms of indications for induction. Whether the benefit/risk profile of this diverse population is equivalent to that in women induced solely because of prolonged gestation, or because of abnormal antepartum testing in prolonged gestation, is unclear. Pooled results from randomized trials comparing scheduled induction and expectant management with antepartum testing show a reduced risk of perinatal mortality in women with scheduled induction after 41 weeks, with at least 500 inductions needed to prevent one death. However, the cost-effectiveness of these strategies needs to be compared using more recent data. Administrative data suggest that there are racial and ethnic differences in the epidemiology and outcomes of prolonged pregnancy; these differences need to be explored using more detailed data sets. Finally, given the complexity of decisionmaking in settings where there often are competing risks between mother and fetus, and where patients clearly have strong preferences for the process of labor and delivery, the lack of scientific data on patient preferences, quality of life, and other "subjective" measures is impressive.
Although there are a large number of randomized trials available that provide evidence addressing the key questions identified in this report, there are numerous limitations to the current literature:
Heterogeneity of patient populations: A consistent problem with much of the literature on specific intervention agents is inclusion of women being induced for a variety of indications. Both the benefits (in terms of successful induction) and risks (in terms of fetal compromise) of induction agents might be quite different in different populations of patients. Studies either should be performed exclusively in patients with prolonged pregnancy, or subgroup analyses should be reported so that pooled estimates of efficacy in different populations can be generated.
Appropriate endpoints: Stillbirth is, fortunately, a rare outcome even in "high-risk" populations. Most feasible studies of tests or interventions will not have sufficient power to detect differences in mortality rates. However, the clinical utility of commonly used endpoints is compromised because of inherent unreliability and susceptibility to bias (changes in fetal heart rate pattern or cervical examination), uncertainty about long-term clinical significance (presence of meconium in amniotic fluid or Apgar scores), and the effect of variability in knowledge of preintervention test results or local practice patterns (cesarean section rates). Finally, the lack of data on patient preferences and quality-of-life measures is striking.
Statistical issues: Even well-done studies with a priori sample size estimates often are underpowered to detect potentially clinically relevant differences in outcomes, especially when sample size estimates are based on continuous variables (such as time to delivery) and other outcomes are categorical (such as cesarean section rates). Inappropriate measures of central tendency and statistical tests are often used (for example, treating variables such as Bishop score or parity as continuous variables). This may also lead to erroneous conclusions about differences between groups.
We used standard methods for identifying, reviewing, and abstracting published studies focused on the management of prolonged pregnancy. We used predefined study characteristics to identify those studies most likely to provide unbiased estimates of efficacy and test performance. We did not search the literature prior to 1980, primarily because we assumed that the lack of general availability of ultrasound for both dating and management of prolonged gestation would limit the applicability of these results to current practice. We also limited our search to articles published in English, primarily for reasons of convenience and resource constraints. It is possible that including older studies, or studies published in other languages, would have identified additional evidence that would have substantially changed our conclusions. This may be especially true for alternative or complementary therapies.
Another limitation of our exclusion criteria is that rare but severe complications of treatments may have been overlooked because they were published in case reports or small case series. Although these study designs are useful for identifying potential problems, it is difficult to quantify these risks when only numerator values are available.
We did not use one of the currently available quality scoring systems to grade the articles we reviewed. However, we believe that the rationale for each criterion we used is reasonable, and that the operational definitions are clear and reproducible. In addition, we used these grading criteria primarily to provide additional detail to other researchers. We did not use them to establish a threshold for including or excluding articles or to weight the results of a quantitative evidence synthesis such as a meta-analysis.
We used one additional data source in preparing this report, the Nationwide Inpatient Sample (NIS) (Nationwide Inpatient Sample [NIS], 1997). The NIS, like most administrative databases, is limited by a lack of clinically relevant detail. In addition, even the data recorded in these discharge abstracts were incomplete, limiting our ability to analyze them in great detail. Variability in definitions between hospitals also may lead to incorrect conclusions. The primary value of these data in the context of this report is to identify potentially important differences in outcomes between ethnic and socioeconomic groups that need to be explored further in data sets with better documentation and more complete data.
The state of the currently available evidence probably does not allow for the creation of highly specific clinical guidelines or performance measures for many aspects of managing prolonged pregnancy. Consistent conclusions from the report include:
Sweeping of the membranes consistently promotes labor. However, given the lack of data on patient preferences for undergoing this procedure or on the value of promoting labor, using performance of membrane sweeping as a quality measure is premature. However, discussion of this option with women during the late third trimester is certainly reasonable.
Surveillance with tests that include fetal heart rate monitoring and assessment of amniotic fluid volume or elective induction both appear to be reasonable strategies beyond 41 weeks. Patients and providers should be informed that the best current evidence strongly suggests that there is a significant increase in the risk of perinatal mortality in women managed with antepartum testing compared with women who are electively induced at 41 weeks. Because this risk is small in absolute terms, and patients may have different preferences for both the outcomes and processes of labor and delivery, both options should be discussed.
There is no evidence to justify induction of labor solely for the indication of macrosomia (defined as estimated fetal weight greater than 4,000 grams) in prolonged pregnancy.
According to national birth certificate data, almost 18 percent of pregnancies (702,000 women) in the United States extend beyond 41 weeks, and over 7 percent (288,000 women) extend beyond 42 weeks (Ventura, Martin, Curtin, et al., 2000). Better data on optimal management of these women would have significant public health benefit.
The most precise data available come from the United Kingdom. Estimates in U.S. populations, preferably with the ability to control for the presence of other risk factors for mortality and the use of antepartum testing, are needed. Potential studies include:
Detailed analysis of U.S. birth certificate data.
Detailed analysis of U.S. hospital discharge data, although this will necessarily miss deliveries performed outside the hospital, such as those performed at freestanding birth centers and home births.
Detailed analysis of administrative or computerized clinical data from large provider organizations, such as health maintenance organizations.
Because of the inherent limitations of these data sources, validation with detailed clinical records ultimately will be needed to systematically determine and describe causes of death. These data also would allow determination of the impact of various methods of dating pregnancy on perinatal mortality.
Similar methods need to be applied to estimations of the risks of perinatal morbidity:
Careful attention should be given to case definitions; again, validation of the accuracy of administrative data is needed.
We did not identify any recent publications providing followup data on infants born after prolonged gestation. Ultimately, long-term outcomes are most important, and better data on the long-term consequences of various management strategies are needed.
Again, better estimation of the risks, given current obstetric practice, is needed.
Recently, attention has been drawn to the risks of long-term maternal consequences of labor and delivery, especially pelvic floor dysfunction. It is unclear if any of the management strategies used for prolonged pregnancy have any impact on the risks of subsequent development of pelvic floor dysfunction.
Because many outcomes associated with prolonged gestation are rare, evaluations of individual tests and testing strategies will always be either limited in power or forced to rely on surrogate measures. Further research is needed on:
Identification of surrogate measures of fetal compromise that are less susceptible to bias or observer variation.
Study designs that could eliminate or substantially reduce the potential for verification bias because of clinician knowledge of antepartum test results.
The optimal timing of antepartum testing.
Data on currently available tests strongly suggest that test specificity is much better than test sensitivity. In order for expectant management to compare more favorably to elective induction, research into new testing strategies should focus on improving the negative predictive value of tests by improving test sensitivity.
In addition, detailed data are needed on the medical and nonmedical costs associated with specific tests and testing strategies.
Based on the available trial data, planned induction after 41 weeks appears to reduce the risk of perinatal mortality at lower cost and at no risk of increased cesarean section rates compared with expectant management. The strongest and largest trial was completed a decade ago. Whether these conclusions are still valid given current management strategies and interventions (such as misoprostol) is unclear. It also is unclear whether the extra knowledge to be gained by yet another large trial justifies the costs of such a trial. The following points should be considered:
Decision analysis and cost-effectiveness analysis may help quantify our current degree of uncertainty. In order to be useful, modeling will require more precise data on risks, test characteristics, the effectiveness of induction, and costs in the specific population of interest. Some of these data could be provided by the research agenda discussed above. Decision and cost-effectiveness analyses will also need to consider subtle issues such as the potential effects of increased induction rates on staffing needs for labor-and-delivery and postpartum units.
Again, data on patient preferences for both outcomes and process are needed. For some women, the degree of certainty provided by a scheduled induction may be preferable to repeated visits for antepartum testing and uncertainty about when labor may begin. For other women, the desire to minimize intervention in the pregnancy may take precedence. How these preferences interact with patients' attitudes and preferences about risks to both themselves and their babies is an unexplored area of research with substantial implications for individual patients, clinicians, and policymakers.
Despite a number of randomized trials of methods for inducing labor, our ability to draw conclusions about the efficacy of various agents in women with prolonged pregnancy is limited because of the diversity of indications for induction and the diversity of gestational ages in these trials. Data on outcomes specific to the two groups of interest -- women induced electively at a specific gestational age and women with prolonged pregnancy induced because of abnormal fetal heart rate testing -- are needed. These data could be obtained either by performing a meta-analysis using pooled data from previous, ongoing, or future trials in these specific subgroups or by performing trials limited to these two groups.
Sample size estimates for trials should be based on clinically relevant outcomes. Although time from beginning of induction to delivery is an important resource outcome, there are no data available on how women value this outcome compared with others. When sample size estimation is based on time-related variables, power to detect clinically relevant differences in other outcomes is diminished.
Use of primary outcomes limited by inherent lack of reliability, such as Bishop score or abnormal fetal heart rate tracings, should be avoided. If used as secondary outcomes, consideration should be given when feasible to the use of research techniques designed to minimize the effects of observer variation, such as review by blinded outside experts (an approach often used in trials where data sources such as electrocardiograms, radiology films, or pathology slides are required).
Patient preferences and quality-of-life measures, using standard techniques and methods for measuring these attributes, should be included in all studies. Attention should be focused not only on patient preferences for outcomes, but on process as well. All women value a healthy baby, but there may be strong preferences for the way in which this outcome is achieved.
Detailed data are needed on medical and nonmedical costs associated with different interventions for the induction of labor in prolonged gestation and for promoting labor in women at term.
Given that from some perspectives elective induction of labor may be preferable to expectant management, research on establishing reliable estimates of the relative safety, effectiveness, and costs of available induction agents in this particular patient population should be a high priority.
Preliminary analysis of administrative data suggests that additional research into possible differences in the epidemiology and outcomes of prolonged pregnancy in different ethnic and socioeconomic groups is warranted:
Confirmation of the lack of ethnic differences in the proportion of pregnancies extending beyond 42 weeks -- despite higher rates of preterm birth in black women -- using data sources where confirmation of gestational age is available, would be important.
Confirmation of the higher rate of low birthweight and other diagnoses consistent with uteroplacental insufficiency in black women with prolonged gestation is needed. If confirmed, clinical, epidemiological, basic science, and genetic studies might provide insight into the causes of this association.
Further exploration of the potential interaction of ethnicity and economic status is needed.
| Abd C | Abdominal circumference |
| abn | Abnormal |
| ACOG | American College of Obstetricians and Gynecologists |
| AFI | Amniotic fluid index |
| AFV | Amniotic fluid volume |
| AHRQ | Agency for Healthcare Research and Quality |
| APT | Antepartum testing |
| ARD | Atad Ripener Device |
| AROM | Artificial rupture of the membranes |
| BP | Biophysical profile |
| bpm | Beats per minute |
| BPS | Biophysical profile score |
| BW | birthweight |
| cc | Cubic centimeter(s) |
| CDSR | Cochrane Database of Systematic Reviews |
| CE | Cost-effectiveness |
| CI | Confidence interval |
| cm | Centimeter |
| C-section | Cesarean section |
| CST | Contraction stress test |
| CTG | Cardiotocography |
| DARE | Database of Abstracts of Reviews of Effectiveness |
| EBW | Estimated birthweight |
| E:C | Estrogen-to-creatinine ratio |
| EFW | Estimated fetal weight |
| FB | Fetal breathing |
| FBM | Fetal breathing movements |
| fFN | Fetal fibronectin |
| FHR | Fetal heart rate |
| FM | Fetal movement |
| f/u | Followup |
| g | Gram(s) |
| GP | General practitioner |
| HCUP | Healthcare Cost and Utilization Project |
| HMO | Health maintenance organization |
| hr | Hour(s) |
| IQ | Interquartile |
| IU | International Units(s) |
| IUGR | Intrauterine growth retardation |
| kg | Kilogram(s) |
| LGA | Large for gestational age |
| LMP | Last menstrual period |
| MBP | Modified biophysical profile |
| MFM | Maternal and family medicine |
| µg | Microgram(s) |
| mg | Milligram |
| min | Minute(s) |
| mIU | Milli-Inerantional Unit(s) |
| ml | Milliliter(s) |
| mm | Millimeter(s) |
| mmHg | Millimeters of mercury |
| MPD | Maximum pool depth |
| mU | Milliunit(s) |
| NA | Not applicable |
| NCHS | National Center for Health Statistics |
| ng | Nanogram(s) |
| NICHD | National Institute of Child Health and Human Development |
| NICU | Neonatal intensive care unit |
| NIS | Nationwide Inpatient Sample |
| nl | Normal |
| No. | Number |
| NR | Not reported |
| NS | Nipple stimulation |
| NST | Nonstress test |
| OB/GYN | Obstetrician/gynecologist |
| OCP | Oral contraceptive pill |
| OCT | Oxytocin challenge test |
| OST | Oxytocin stress test |
| OR | Odds ratio |
| PGE2 | Prostaglandin E2(dinoprostone) |
| PROM | Premature rupture of the membranes |
| RCOG | Royal College of Obstetricians and Gynaecologists |
| RCT(s) | Randomized controlled trial(s) |
| ROC | Receiver operating characteristic |
| RR | Relative risk |
| SD | Standard deviation |
| S:D | Systolic-to-diastolic ratio |
| sec | Second(s) |
| SEM | Standard error of the mean |
| SGA | Small for gestational age |
| SROM | Spontaneous rupture of the membranes |
| U/S | Ultrasound |
| UTI | Urinary tract infection |
| vs. | Versus |
| wk | Week(s) |
Reviewer:_________________First Author:___________________________Year:___________Procite #:___________
ARTICLE FOCUS (circle one): Testing / Management / Both
STUDY DESIGN (check one):
| _______RCT - Randomization method: | ______Sealed envelope |
| ______Date/Chart # | |
| ______Not described | |
| ______Other - describe:_____________________________________ |
______Cohort
______Case series, no controls, n = ______
______Case series, historical controls, n = ______
______Case series, concomitant controls, n = ______
______Not specified or unable to classify
REASSESSMENT:
| Recode article as:____________________ | Exclude (give reason):__________________ Note: All non-RCTs should be excluded from the management review |
KEY QUESTIONS ADDRESSED (check all that apply):
_____1. What are the test characteristics (reliability, sensitivity, specificity, predictive values) and costs of measures used in the management of postdates pregnancy: (a) to assess risks to the fetus of postdates pregnancy, and (b) to assess the likelihood of a successful induction?
_____2. What are the benefits, risks, and costs of currently available interventions for induction of labor?
_____3. What is the direct evidence comparing the benefits, risks, and costs of planned induction versus expectant management at various gestational ages?
_____4. Are the epidemiology and outcomes of postdates pregnancy different for women in different ethnic groups, different socioeconomic groups, or in adolescent women?
STUDY LOGISTICS:
Inclusive dates of data collection (give month and year): from________________________to_____________________
Multicenter study? (circle one): Yes / No If "Yes," no. of sites:_________
Geographic location (in US, give city and state; outside of US, give city and country. If multicenter trial or network, give name, e.g., NICHD MFM Network, RADIUS):___________________________________________________
| TYPES OF PROVIDERS (check all that apply): _______Unspecified OB/GYN _______General OB/GYN _______MFM _______Family practice _______Nurse midwives _______Other midwives _______Other - describe:___________ _______Not specified | STUDY SETTING (check all that apply): _______University hospital _______Community hospital _______Unspecified hospital _______Freestanding birthing center _______Outpatient clinic/physician office _______Not specified or unable to determine _______Other - describe:_______________ |
GESTATIONAL AGE DETERMINED BY (check all that apply):
______LMP
______1st trimester U/S
______2nd trimester U/S
______Other - specify: __________________________________________________________________________
| INCLUSION CRITERIA: | EXCLUSION CRITERIA: |
SUBJECT CHARACTERISTICS:
Identify interventions A, B, and C, and indicate which (if any) served as control
Use "NR" to indicate "Not reported"
| Intervention A = | Intervention B = | Intervention C = | Overall | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AGE (specify summary statistic [mean, median] and measure of dispersion [standard deviation, range, etc.]; if age not described in these terms, then enter as reported): | ||||||||||||||||
| Mean: | ||||||||||||||||
| Median: | ||||||||||||||||
| SD: | ||||||||||||||||
| Range: | ||||||||||||||||
| RACE (specify distribution): | ||||||||||||||||
| White: |
|
|
|
| ||||||||||||
| Black: |
|
|
|
| ||||||||||||
| Hispanic: |
|
|
|
| ||||||||||||
| Other: |
|
|
|
| ||||||||||||
| GESTATIONAL AGE AT ENTRY INTO STUDY (specify either summary statistic [mean, median] and measure of dispersion [SD, range] or percent in each category; indicate whether measured in days or weeks) | ||||||||||||||||
| PARITY (specify either summary statistic [mean, median] and measure of dispersion [SD, range] or percentage in each category): | ||||||||||||||||
| BISHOP SCORE (specify either summary statistic [mean, median] and measure of dispersion [SD, range] percentage in each category): | ||||||||||||||||
| OTHER measure of cervical dilatation or effacement (specify): | ||||||||||||||||
INTERVENTIONS
Describe the testing and management interventions used in each study group. Include all information necessary to reproduce the treatment/monitoring/testing algorithms used. For example:
| Sample Intervention A = Induction If cervix < 3 cm dilated and < 50% effaced and fetal heart rate normal, then pt given PGE2 gel (Prepidil) 0.5 mg intracervically - max of 3 doses at 6-hr intervals - fetus monitored continuously for min of 1 hr after insertion of gel If gel not used or did not induce labor within 12 hrs of insertion of last dose, then labor induced by IV oxytocin or amniotomy or both |
Interventions to be considered include:
Tests of fetal well-being: No tests, nonstress test, biophysical profile, contraction stress test, amniotic fluid volume, uterine vessel Doppler flow, other, combinations of the preceding
Tests of fetal size: Physical exam, ultrasound, other
Tests of readiness for delivery: Bishop score, fetal fibronectin, other, combinations of the preceding
Interventions: Monitoring/conservative care, stripping of membranes, oxytocin, prostaglandin gel, misoprostil, mechanical interventions
| Intervention A = |
| Intervention B = |
| Intervention C = |
PATIENT NUMBERS, DROPOUTS AND LOSS TO FOLLOW-UP:
| Outcome | Intervention A = | Intervention B = | Intervention C = | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. of subjects at start: | |||||||||||||||
No. of subjects who did not receive allocated intervention due to:
|
|
|
| ||||||||||||
|
|
| |||||||||||||
|
|
| |||||||||||||
| No. of subjects at end who had received allocated intervention: |
|
|
| ||||||||||||
| Any post-discharge follow-up? (circle one) | Yes / No | Yes / No | Yes / No | ||||||||||||
| No. of subjects lost to post-discharge follow-up: |
|
|
|
MANAGEMENT OUTCOMES:
| Outcome Measured (Describe) | How measured, (e.g., scale/units used, %) | Intervention A = | Intervention B = | Intervention C = | P value |
|---|---|---|---|---|---|
| FETAL OUTCOMES (e.g., stillbirth, Apgar scores, admission to NICU, shoulder dystocia, weight, etc.): | |||||
| 1) | |||||
| 2) | |||||
| 3) | |||||
| 4) | |||||
| 5) | |||||
| 6) | |||||
| 7) | |||||
| MATERNAL OUTCOMES (e.g., maternal trauma, C-section rate [with causes], infection, etc.): | |||||
| 1) | |||||
| 2) | |||||
| 3) | |||||
| 4) | |||||
| 5) | |||||
| 6) | |||||
| 7) | |||||
| OTHER OUTCOMES | |||||
| 1) | |||||
| 2) | |||||
TEST PERFORMANCE OUTCOMES (Testing Articles Only):
| Comparison 1 | ||||
|---|---|---|---|---|
| Reference standard/outcome = | ||||
| Screening test = | Ref standard result 1 = | Ref standard result 2 = | Ref standard result 3 = | Totals: |
| Screen test result 1 = | ||||
| Screen test result 2 = | ||||
| Screen test result 3 = | ||||
| Totals: | ||||
| Comparison 2 | ||||
|---|---|---|---|---|
| Reference standard/outcome = | ||||
| Screening test = | Ref standard result 1 = | Ref standard result 2 = | Ref standard result 3 = | Totals: |
| Screen test result 1 = | ||||
| Screen test result 2 = | ||||
| Screen test result 3 = | ||||
| Totals: | ||||
| Comparison 3 | ||||
|---|---|---|---|---|
| Reference standard/outcome = | ||||
| Screening test = | Ref standard result 1 = | Ref standard result 2 = | Ref standard result 3 = | Totals: |
| Screen test result 1 = | ||||
| Screen test result 2 = | ||||
| Screen test result 3 = | ||||
| Totals: | ||||
| Other test performance results (including sensitivity and specificity and qualitative results): |
COST/CHARGES/RESOURCE UTILIZATION OUTCOMES:
| Outcome Measured | How measured, (e.g., scale/units used, %) | Intervention A = | Intervention B = | Intervention C = | P value |
|---|---|---|---|---|---|
| Total costs/intervention: | |||||
| Mean: | |||||
| Median: | |||||
| SD: | |||||
| Range: | |||||
| Other cost/resource outcome (specify): |
QUALITY SCORE:
(Check "Yes" or "No" for each item)
| Type of Article | Yes | No |
|---|---|---|
| MANAGEMENT ARTICLES | ||
| Randomized assignment to intervention? | ||
| Randomization method clearly described and appropriate? | ||
| Study population similar to likely patient population? | ||
| Intervention protocols clearly described or referenced? | ||
| Description provided of how decisions made about mode of delivery? | ||
| Statistical issues addressed/discussed: | ||
| Sample size? | ||
| Use of appropriate tests? | ||
| Study population characterized by: | ||
| Gestational age? | ||
| Dating criteria specified? | ||
| Bishop score or other measure of cervical ripeness? | ||
| TESTING ARTICLES | ||
| Reference standard defined? | ||
| Randomized assignment to test? | ||
| Randomization method clearly described and appropriate? | ||
| Verification bias assessed or discussed? | ||
| Test reliability/variability addressed or discussed? | ||
| Study population well characterized by: | ||
| Gestational age? | ||
| Dating criteria specified? | ||
| Absence of other risk factors (diabetes, HTN, etc.)? | ||
| Study population similar to likely patient population? | ||
| Testing protocol clearly described or referenced? | ||
| Statistical issues addressed/discussed: | ||
| Sample size? | ||
| Use of appropriate tests? | ||
| Template for Evidence Table 1 | |||||
|---|---|---|---|---|---|
| Study | Design and Interventions | Patient Population | Outcomes Reported | Results | Quality Score/Notes |
| Author and Pro-Cite # | Design: [RCT, etc., including description of method of randomization] Test(s) studied: 1) 2) 3) etc. Reference standard(s): 1) 2) etc. Dates: Location: Setting: [including whether single- or multicenter] Type(s) of providers: Length of follow-up: | No. of subjects at start: Dropouts: Loss to follow-up: No. of subjects at end: Inclusion criteria: Exclusion criteria: Age: Race: Gestational age at entry: Dating criteria: Parity: Bishop score: Other: [including other measures of cervical ripeness] | 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) | 1) Outcome1: 2) Outcome2: 3) Outcome3: 4) Outcome4: 5) Outcome5: 6) Outcome6: 7) Outcome7: 8) Outcome8: 9) Outcome9: 10) Outcome10: 11) Outcome11: 12) Outcome12: 13) Outcome13: 14) Outcome14: 15) Outcome15: | QUALITY SCORES: TESTING Reference standard: Randomized: Method of randomization: Verification bias: Test reliability/variability: Gestational age: Dating criteria: Other risk factors absent: Similar to likely pt pop: Testing protocol described: Sample size: Statistical tests: MANAGEMENT Randomized: Method of randomization: Similar to likely pt pop: Interventions described: Mode of delivery: Sample size: Statistical tests: Gestational age: Dating criteria: Bishop score: |
| Template for Evidence Tables 2 and 3 | |||||
| Study | Design and Interventions | Patient Population | Outcomes Reported | Results | Quality Score/Notes |
| Author and Pro-Cite # | Design: [RCT, etc., including description of method of randomization] Interventions: 1) 2) 3) etc. Dates: Location: Setting: [including whether single- or multicenter] Type(s) of providers: Length of follow-up: | No. of subjects at start: Dropouts: Loss to follow-up: No. of subjects at end: Inclusion criteria: Exclusion criteria: Age: Race: Gestational age at entry: Dating criteria: Parity: Bishop score: Other: [including other measures of cervical ripeness] | 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) | 1) Outcome1: 2) Outcome2: 3) Outcome3: 4) Outcome4: 5) Outcome5: 6) Outcome6: 7) Outcome7: 8) Outcome8: 9) Outcome9: 10) Outcome10: 11) Outcome11: 12) Outcome12: 13) Outcome13: 14) Outcome14: 15) Outcome15: | QUALITY SCORE: Randomized: Method of randomization: Similar to likely pt pop: Interventions described: Mode of delivery: Sample size: Statistical tests: Gestational age: Dating criteria: Bishop score: |
Free Full text in PMC]
Free Full text in PMC].
Free Full text in PMC]
[PubMed].
Free Full text in PMC].
Free Full text in PMC]
[PubMed].
Free Full text in PMC].
Free Full text in PMC].
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC].
Free Full text in PMC].
Free Full text in PMC]
[PubMed].
Free Full text in PMC].
Free Full text in PMC].
Free Full text in PMC]
[PubMed].
Free Full text in PMC].
Free Full text in PMC].
Free Full text in PMC]
Free Full text in PMC].
Free Full text in PMC]
[PubMed].
Free Full text in PMC]
Free Full text in PMC].
Free Full text in PMC].