U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Gaudet L, Singh K, Weeks L, et al. Terbutaline Pump for the Prevention of Preterm Birth [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2011 Sep. (Comparative Effectiveness Reviews, No. 35.)

Cover of Terbutaline Pump for the Prevention of Preterm Birth

Terbutaline Pump for the Prevention of Preterm Birth [Internet].

Show details

Methods

Topic Development and Refinement

With input from key informants, we developed the PICOTS (population, intervention, comparator, outcome, timing, setting), conceptual framework, and key questions during the topic refinement stage. The Key Questions were posted to the Effective Health Care Web site. The public was invited to comment on the Key Questions. After reviewing the public commentary, we drafted the final Key Questions and submitted them to the Agency for Healthcare Research and Quality for approval. The Technical Expert Panel (TEP) reviewed the protocol and provided additional clinical and methodological input. The analytic framework (Figure 1), which was developed by the review team in consultation with the TEP, outlines the main elements of each Key Question.

Figure 1 is an analytical framework of the key questions. The figure illustrates current theoretical understanding of how subcutaneous terbutaline pump may improve surrogate and/or clinical neonatal outcomes, and/or increase maternal and/or neonatal side effects compared with placebo, conservative treatment, or other interventions in women with arrested preterm labor. Subcutaneous terbutaline pump may also lead to outcomes of pump failure. Important confounders to surrogate and clinical neonatal outcomes and maternal and neonatal side effects include level of care and level of maternal activity. The analytic framework also illustrates the importance of various subgroups in evaluating the efficacy, effectiveness, and safety of this intervention, including women of <28 weeks, <32 weeks, <34 weeks and <37 weeks gestation, as well as multiple gestations, racial subgroups and women with a history of preterm birth and preeclampsia.

Figure 1

Analytical framework of terbutaline pump for maintenance tocolysis.

Search Strategy

In consultation with the rest of the team, our medical information specialist developed and tested electronic search strategies through an iterative process. Following published recommendations, MEDLINE and Embase strategies were peer reviewed by another information specialist using the PRESS Checklist and any amendments were subsequently applied to all databases.39 A combination of controlled vocabulary and keywords were used in the search strategies and no restrictions were placed by date or language. The following databases were searched: Ovid MEDLINE In-Process & Other Non-Indexed Citations and Ovid MEDLINE (1950 to April 1, 2011); Ovid Embase (1980 to April 1, 2011); Cumulative Index to Nursing and Allied Health Literature (CINAHL) via EBSCOhost (1985 to December 7, 2009), the Cochrane Library (April 1, 2011) via Wiley interface (including CENTRAL, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effects – DARE, Health Technology Assessment – HTA, and the National Health Service Economic Evaluation Database – NHS EED), and the Centre for Reviews and Dissemination (CRD) databases.(January 2,2010) Details of the search strategies are outlined in Appendix A.

We obtained additional references by hand-searching the bibliographies and text of review articles, letters to the editor, and commentaries identified during the screening of titles, abstracts, and full texts and with input from members of the TEP. We also hand-searched the reference lists of included studies for relevant citations.

We conducted a grey (unpublished) literature search by scanning the Web sites of relevant specialty societies and organizations, health technology assessment agencies, guideline collections, regulatory agencies, and trial registries (see Appendix B). The Scientific Resource Center (SRC) also conducted a grey literature search of regulatory information, clinical trial registries, abstracts and conference papers, grants and federally funded research, and N.Y, Academy of Medicine's Grey Literature Index (see Appendix B). Materials obtained from the grey literature searches were evaluated by one reviewer for additional relevant references. In February 2011, the US Food and Drug Administration (FDA) issued new warnings against the use of terbutaline to treat preterm labor, so we also accessed a summary of the FDA postmarketing surveillance results. This decision was made post hoc.

The SRC requested information about published and unpublished randomized controlled trials (RCTs) and observational studies from pharmaceutical companies (see Appendix C). We screened the Scientific Information Packages that were submitted by industries, and sought unpublished information from Matria (now called Alere) Healthcare about their perinatal program and associated database.

The searches yielded a total of 431 citations after removal of duplicates. All citations were imported into an electronic database for screening and data extraction (Distiller Systematic Review Software; an Internet-based software program intended to facilitate collaboration among reviewers during the screening of abstracts and full texts, data extraction, exclusion reports, and table construction).

Study Selection

We developed inclusion and exclusion criteria based on the patient population, intervention, outcome measures, and study designs specified for the Key Questions. We screened titles and abstracts at Level 1 and full texts at Levels 2 and 3. The full-text articles of relevant abstracts, as assessed at Level 1 screening, were retrieved and assessed for relevancy by reapplying the inclusion criteria at Level 2 and Level 3 screening. The purpose of Level 3 screening was to further classify studies based on outcome and study design. Articles that passed through Level 1 to Level 3 screening were included in the review (see Appendix D for Level 1, 2, and 3 screening forms). Non-English language records without an English abstract were excluded. Results published only in abstract form were considered for inclusion only if sufficient information was presented to assess eligibility and validity. Two reviewers independently screened abstracts and full-text articles. Conflicts were resolved by consensus or by third-party adjudication.

Studies with the following population, intervention, comparators, and outcomes were included:

  • Population: Pregnant women 24–36 weeks' gestation and with preterm labor that had been arrested with primary tocolytic therapy
  • Intervention: Subcutaneous terbutaline (SQ terbutaline) delivered by infusion pump
  • Comparators: Either placebo, conservative treatment, or any other intervention
  • Outcomes:
    1. Primary (neonatal) outcomes included bronchopulmonary dysplasia, necrotizing enterocolitis, significant intraventricular hemorrhage (grade III/IV), periventricular leukomalacia, seizures, sepsis, stillbirth, retinopathy of prematurity, death within initial hospitalization, and neonatal death.
    2. Secondary (surrogate) outcomes included gestational age at delivery (continuous variable), incidence of delivery at various gestational ages (<28, <32, <34, <37 weeks), mean prolongation of pregnancy (days), need for assisted ventilation, need for oxygen per nasal cannula, neonatal intensive care unit admission, birth weight, ratio of birth weight/gestational age at delivery, and mean pregnancy prolongation index. Although not specified in the protocol, prolongation of pregnancy was also extracted as a dichotomous variable (i.e., prolongation > 7 days and prolongation > 14 days).
    3. Maternal side effects included pulmonary edema, heart failure, arrhythmia, myocardial infarction, refractory hypotension, hypokalemia, hyperglycemia, maternal withdrawal due to adverse effects (withdrawal-AE), maternal discontinuation of therapy, and death. Neonatal side effects included hypoglycemia, hypocalcemia, and ileus.
    4. Outcomes of pump failure included missed doses, dislodgment, and overdose.
    5. Long-term childhood outcomes included childhood development, neurobehavioral testing, long-term lung function, and long-term vision.

We also included observational studies because very few RCTs were available on this topic. We considered prospective and retrospective cohort studies, case-control studies, cross-sectional studies, and case series (exclusively for outcomes related to pump failure) as eligible study designs. As a post hoc decision, we sought FDA summaries of postmarketing data highlighting serious harms.

We did not undertake indirect comparisons of RCTs of other tocolytics because, based on a scoping literature search, sparse evidence was anticipated for maintenance tocolytic therapies— mostly single RCTs of various tocolytics, such as atosiban, nifedipinie, and ritodrine. 40-42 Comparisons from such scant indirect evidence would likely have been inconclusive. Furthermore, indirect comparisons are premature at this point because the efficacy of maintenance tocolysis versus no maintenance tocolysis or placebo remains to be clearly established. Indirect comparisons are helpful when direct comparisons of otherwise efficacious treatments are not available.

Data Extraction

We extracted the following items using the online program Distiller Systematic Review Software: general study characteristics (e.g., year of publication, country of origin, study design, setting, number screened, number included), population characteristics (e.g., inclusion/exclusion criteria, age, race, ratings for maternal level of activity), intervention characteristics (e.g., dose, duration, ratings for maternal level of care, details about comparators), outcomes (definitions and results), risk of bias, and applicability. One reviewer provided ratings for maternal level of activity, maternal level of care, and summarized applicability characteristics. Level of activity was rated as low, normal, or high based on a composite assessment of the following variables: marital status, working status, caring for other children in the home, available social support, bed rest, and restriction of maternal activities. Level of care was rated as low, moderate, or high based on the following variables: nursing assessments, home uterine activity monitoring, home visits, education about preterm labor, telephone support, restriction of maternal activities, and other cointerventions. Each variable was provided a rating based on predefined criteria (Tables F12 and F14 in Appendix F). We categorized responses into three tier levels and compared each level with another to decide the ratings of low, moderate/normal, and high. These assessments were verified by a clinical expert, with consensus reached by discussion. All other data were extracted by one reviewer, and outcome data was verified by a second reviewer.

When there were multiple reports of the same study, we referenced the most relevant record as the primary identifying study and extracted additional data as available from the companion report(s).

Risk of Bias Assessment

We evaluated risk of bias for each relevant outcome in individual studies using generic criteria for controlled trials and observational studies. Selected items from the McMaster Quality Assessment Scale of Harms were also incorporated into the assessment for those studies that evaluated treatment harms.8 Two reviewers assessed risk of bias and consensus was reached by discussion or involvement of a third team member. Appendix D presents the risk of bias form used to evaluate studies.

The following risk of bias criteria were evaluated for all included study designs (RCTs, nonrandomized trials, and observational studies including case series):

  • Extent to which valid primary outcomes were described. We considered both an explicit and implicit description as adequate, and assessed this only for the stated primary outcomes.
  • Differential loss to followup between the compared groups or overall high loss to followup.
  • Selective outcome reporting.
  • Data quality (i.e., consistency of measurements across outcome assessors and consistency in outcome definitions across data sources – the latter point pertained only to retrospective cohorts).
  • Adequacy of sample size.
  • Compliance with treatment regimen.
  • Selected criteria from the McHarm checklist for studies assessing treatment harms (definition of harms, mode of harms collection, and training/background of personnel collecting harms data).

The following criteria were assessed for all included study designs aside from case series:

  • Similarity of groups in terms of baseline characteristics and prognostic factors
  • Similarity of groups in terms of administration of primary tocolytic regimen to control acute episodes of preterm labor
  • Intention-to-treat analysis
  • Differential level of care between the compared groups

Blinding of patients, health care providers, and outcome assessors to treatment allocation and maternal contractions was assessed only for experimental designs (i.e., RCTs and nonrandomized trials). Based on the outcomes of interest, the outcome assessor was assumed to be the same as the health care provider. Two criteria, which pertained exclusively to RCTs, were generation and concealment of the allocation sequence. Two additional criteria were applied only to observational studies (excluding case series) and nonrandomized controlled trials. These included an assessment of whether the same population was used to sample intervention and comparison groups and methods used to control for confounders.

We evaluated intention-to-treat by examining both loss to followup/discontinuation of treatment and unintended crossover to opposite intervention group(s). Loss to followup was assessed either by what was reported in the study or, if not clearly reported, by comparing the number of participants who entered the study with the number of participants reported in outcome table(s). Unlike randomized controlled trials, for which numbers randomized are reported, the reported sample size of nonrandomized studies could be a posthoc determination depending upon the number of participants left for analysis. Therefore, comparing the number of study participants with the number of participants analyzed as reported in tables may not truly reflect those who were lost to followup or dropped out for nonrandomized studies. For such study designs, assessment of intention-to-treat analysis required that the study reports the number of participants who met inclusion/exclusion criteria.

For each relevant outcome in a study, we provided an overall risk of bias rating, designated as high, medium, or low (Table 1). We made these summary ratings within a study design. In order to be classified as high risk of bias, a study must have demonstrated some apparent and major flaw (within that study design category) that would invalidate results.

Table 1. Overall risk of bias ratings.

Table 1

Overall risk of bias ratings.

Grading the Body of Evidence

The strength of a body of evidence was graded based on the following four domains as per previously published guidance: overall risk of bias by outcome, consistency, directness, and precision.9 Optional domains such as dose-response association and existence of confounders were considered as not relevant to this comparative effectiveness review. Publication bias was also not considered as an important concern because we searched for grey literature, scientific information packets from industries, and had many experts in this field participate as Key Informants, Technical Expert panelists and peer reviewers. No concerns about additional unpublished studies were raised. Furthermore, as we had few studies per outcome, publication bias could not be statistically investigated.43

In consultation with the TEP, the review team chose the following outcomes for grading: incidence of delivery at various gestational ages (<28 weeks, <32 weeks, <34 weeks, <37 weeks); mean prolongation of pregnancy; bronchopulmonary dysplasia; significant intraventricular hemorrhage (grade III/IV); neonatal death and/or death within initial hospitalization; and, maternal Withdrawal-AE. These outcomes were chosen based on importance to patients and clinicians. Each domain was graded by two reviewers, and consensus was reached by discussion.

We used four domains to grade outcomes: overall risk of bias, consistency, directness, and precision. For the body of evidence from observational studies, an initial grade of “low” could be upgraded across the domains where possible. We took care not to double count the inherent limitations of observational studies, so we did not factor in study design when assessing risk of bias. The overall risk of bias of an observational study, therefore, could potentially be “low.” We took into account the inherent limitations of observational study designs when we graded the strength of evidence.

Applicability

We considered several factors to assess the applicability of the body of evidence. Population factors included breadth of inclusion/exclusion criteria, exclusion rate, patient demographics, and attrition rate. Intervention factors included dosing and treatment schedules, cointerventions, level of care, pump training, and dose of comparative agent. Outcomes were judged based on clinical utility, definition of harms, and timing of measurement. Geographic and clinical settings were also assessed.

One reviewer summarized the applicability of the body of evidence using the determinants of PICOTS (population, intervention, comparison, outcome, timing, and setting) and a clinical expert provided verification (see Appendix D for applicability form).

Important determinants of applicability (population, intervention, and comparator) are presented by outcome for the available evidence. However, this information was not presented if the strength of evidence for an outcome was graded as insufficient (i.e., absent or inconclusive evidence).

Data Synthesis and Analysis

We used a random effects model, following a DerSimonian and Laird approach, to meta-analyze study estimates if they met the following criteria of clinical and methodological homogeneity: (1) same study design, (2) no important differences in the following factors: demographic and obstetrical characteristics; level of care; intervention; comparator type, dose, and frequency of administration; definition of outcome; timing; and clinical setting, and (3) similar risk of bias ratings. We compared SQ terbutaline pump with no treatment, saline infusion, or another tocolytic. If observational studies presented adjusted odds ratios (ORs), then we extracted and used these values in analyses. Otherwise, ORs and 95 percent confidence intervals were calculated for relevant outcomes in each included study. If a study group had no events, we added 0.5 to both event and nonevent cells. An OR of less than one indicates a smaller event rate in the SQ terbutaline pump group. Exact central confidence intervals were calculated for incidence rates presented in case series. These estimates were not meta-analyzed because only single studies were available by outcome. Statistical heterogeneity was assessed using Cochran's Q (α=0.10) and I2 statistic was calculated to quantify the magnitude of heterogeneity. All analyses were performed using Comprehensive Meta Analysis version 2.2.046 or version 2.2.055 (New Jersey, USA).

We considered observational studies for meta-analysis only if the reports made it clear that they were similar with respect to major confounding factors (e.g., age, race, comorbidities, history of preterm birth, cervical length, cervical dilation, and fetal fibronectin). Although some studies matched for either one or more variables (e.g., by gestational age)18-20 in no case was it apparent that there was equivalency in all or even most of these confounders. Therefore, observational studies were not pooled for any of the key questions, even if they were similar with respect to the PICOTS domains.

Studies that were exclusively of women with singletons were not pooled with studies exclusively of women with multiple gestation to avoid the unit of analysis error due to the cluster effect. Clustering may arise in studies of women with multiple gestation because the unit of randomization or allocation is the mother rather than infant. Also, pooling was not carried when we could not rule out the probability that participants were double counted (i.e., use of the same participants and their outcomes data in different studies). A qualitative analysis was conducted on those studies that could not be synthesized quantitatively.

We needed a minimum of six studies to explore statistical heterogeneity in effect estimates through meta-regression. Since we could pool only a small number of studies, meta-regression was not possible.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...